No, this is not about the new JSON Type added in Postgres 9.2. This is about how you can get a record set from a Postgres database into a JSON string the best way possible using Python.

Here's the traditional way:

>>> import json
>>> import psycopg2
>>>
>>> conn = psycopg2.connect('dbname=peterbecom')
>>> cur = conn.cursor()
>>> cur.execute("""
...   SELECT
...     id, oid, root, approved, name
...   FROM blogcomments
...   LIMIT 10
... """)
>>> columns = (
...     'id', 'oid', 'root', 'approved', 'name'
... )
>>> results = []
>>> for row in cur.fetchall():
...     results.append(dict(zip(columns, row)))
...
>>> print json.dumps(results, indent=2)
[
  {
    "oid": "comment-20030707-161847",
    "root": true,
    "id": 5662,
    "name": "Peter",
    "approved": true
  },
  {
    "oid": "comment-20040219-r4cf",
    "root": true,
    "id": 5663,
    "name": "silconscave",
    "approved": true
  },
  {
    "oid": "c091011r86x",
    "root": true,
    "id": 5664,
    "name": "Rachel Jay",
    "approved": true
  },
...

This is plain and nice but it's kinda annoying that you have to write down the columns you're selecting twice.
Also, it's annoying that you have to convert the results of fetchall() into a list of dicts in an extra loop.

So, there's a trick to the rescue! You can use the cursor_factory parameter. See below:

>>> import json
>>> import psycopg2
>>> from psycopg2.extras import RealDictCursor
>>>
>>> conn = psycopg2.connect('dbname=peterbecom')
>>> cur = conn.cursor(cursor_factory=RealDictCursor)
>>> cur.execute("""
...   SELECT
...     id, oid, root, approved, name
...   FROM blogcomments
...   LIMIT 10
... """)
>>>
>>> print json.dumps(cur.fetchall(), indent=2)
[
  {
    "oid": "comment-20030707-161847",
    "root": true,
    "id": 5662,
    "name": "Peter",
    "approved": true
  },
  {
    "oid": "comment-20040219-r4cf",
    "root": true,
    "id": 5663,
    "name": "silconscave",
    "approved": true
  },
  {
    "oid": "c091011r86x",
    "root": true,
    "id": 5664,
    "name": "Rachel Jay",
    "approved": true
  },
...

Isn't that much nicer? It's shorter and only lists the columns once.

But is it much faster? Sadly, no it's not. Not much faster. I ran various benchmarks comparing various ways of doing this and basically concluded that there's no significant difference. The latter one using RealDictCursor is around 5% faster. But I suspect all the time in the benchmark is spent doing things (the I/O) that is not different between the various versions.

Anyway. It's a keeper. I think it just looks nicer.

Joe Abbate - 13 November 2013 [«« Reply to this]
It isn't really necessary to use a RealDictCursor. Any DB-API cursor ought to return the column name, in cursor.description[0] after an execute() (and psycopg complies with this). See http://www.python.org/dev/peps/pep-0249/#description.
Peter Bengtsson - 13 November 2013 [«« Reply to this]
Tried that. It works. It solves the problem of having to manually write down all the columns. However, it doesn't feel as neat as the RealDictCursor solution?
Joe Abbate - 13 November 2013 [«« Reply to this]
Not sure why you phrased the last sentence interrogatively. "Neat" depends on what you're used to. I tend to use a DictConnection as connection_factory on the psycopg2.connect, which makes all my cursors DictCursors. Haven't tested it, but I believe gives that the same capability as RealDictCursors (as far as json.dumps is concerned), but with added flexibility.
Peter Bengtsson - 13 November 2013 [«« Reply to this]
Using a DictConnection implies a DictCursor. But I really want a RealDictCursor otherwise, when you run `json.dumps(cur.fetchall())` it doesn't work. I think when json.dumps() interrogates a DictRow object it, for some reason, serializes it to a list.

That's why RealDictCursor is more impressive to me.
Simon - 13 November 2013 [«« Reply to this]
Wouldn't expect it to be faster, really. There's a little bit of CPU saving by not doing the redundant copy, but as you say, most of the activity is I/O. I came to realise years ago that with database application, there's very little point in trying to optimise CPU use in your own code, because it's typically dwarfed by what goes on in the database driver, never mind the actual I/O element.

That said, the change is more than worth it, just for how it simplifies the code.
Ross Reedstrom - 13 November 2013 [«« Reply to this]
Right, the biggest optimizations will come from limiting IO, so doing as much as possible on the other end of that pipe, in the DB engine itself: nothing worse than effectively doing a join on the wrong end of the io, not to mention sorts.
Ross Reedstrom - 13 November 2013 [«« Reply to this]
As to the posts topic, there is a generic way to take advantage of the native json support in both postgresql and psycopg2: try something like this:

cur.execute('select array_agg(row_to_json(blogcomments)) from blogcomments limit 10')
res=cur.fetchall()
print json.dumps(res[0][0], indent=2)

Aside from the somewhat obscure reasons for the two extra list wrappers, this keeps you columns names/json keys in one place: the db table definition.
Jough Dempsey - 13 November 2013 [«« Reply to this]
MySQLdb.cursors.DictCursor works pretty much the same way if you're using MySQL instead of PostgreSQL. See: http://mysql-python.sourceforge.net/MySQLdb-1.2.2/public/MySQLdb.cursors.DictCursor-class.html
jvidin - 29 November 2014 [«« Reply to this]
Excellent method thank for sharing this clean way to workaround of adding the column names.
Is there a similar way to do it in reverse from json to a postgres table predefined ?
Thanks Jorge


Your email will never ever be published