From Postgres to JSON strings

12 November 2013   11 comments   Python

Powered by Fusion×

No, this is not about the new JSON Type added in Postgres 9.2. This is about how you can get a record set from a Postgres database into a JSON string the best way possible using Python.

Here's the traditional way:

>>> import json
>>> import psycopg2
>>>
>>> conn = psycopg2.connect('dbname=peterbecom')
>>> cur = conn.cursor()
>>> cur.execute("""
...   SELECT
...     id, oid, root, approved, name
...   FROM blogcomments
...   LIMIT 10
... """)
>>> columns = (
...     'id', 'oid', 'root', 'approved', 'name'
... )
>>> results = []
>>> for row in cur.fetchall():
...     results.append(dict(zip(columns, row)))
...
>>> print json.dumps(results, indent=2)
[
  {
    "oid": "comment-20030707-161847",
    "root": true,
    "id": 5662,
    "name": "Peter",
    "approved": true
  },
  {
    "oid": "comment-20040219-r4cf",
    "root": true,
    "id": 5663,
    "name": "silconscave",
    "approved": true
  },
  {
    "oid": "c091011r86x",
    "root": true,
    "id": 5664,
    "name": "Rachel Jay",
    "approved": true
  },
...

This is plain and nice but it's kinda annoying that you have to write down the columns you're selecting twice.
Also, it's annoying that you have to convert the results of fetchall() into a list of dicts in an extra loop.

So, there's a trick to the rescue! You can use the cursor_factory parameter. See below:

>>> import json
>>> import psycopg2
>>> from psycopg2.extras import RealDictCursor
>>>
>>> conn = psycopg2.connect('dbname=peterbecom')
>>> cur = conn.cursor(cursor_factory=RealDictCursor)
>>> cur.execute("""
...   SELECT
...     id, oid, root, approved, name
...   FROM blogcomments
...   LIMIT 10
... """)
>>>
>>> print json.dumps(cur.fetchall(), indent=2)
[
  {
    "oid": "comment-20030707-161847",
    "root": true,
    "id": 5662,
    "name": "Peter",
    "approved": true
  },
  {
    "oid": "comment-20040219-r4cf",
    "root": true,
    "id": 5663,
    "name": "silconscave",
    "approved": true
  },
  {
    "oid": "c091011r86x",
    "root": true,
    "id": 5664,
    "name": "Rachel Jay",
    "approved": true
  },
...

Isn't that much nicer? It's shorter and only lists the columns once.

But is it much faster? Sadly, no it's not. Not much faster. I ran various benchmarks comparing various ways of doing this and basically concluded that there's no significant difference. The latter one using RealDictCursor is around 5% faster. But I suspect all the time in the benchmark is spent doing things (the I/O) that is not different between the various versions.

Anyway. It's a keeper. I think it just looks nicer.

Comments

Joe Abbate
It isn't really necessary to use a RealDictCursor. Any DB-API cursor ought to return the column name, in cursor.description[0] after an execute() (and psycopg complies with this). See http://www.python.org/dev/peps/pep-0249/#description.
Peter Bengtsson
Tried that. It works. It solves the problem of having to manually write down all the columns. However, it doesn't feel as neat as the RealDictCursor solution?
Joe Abbate
Not sure why you phrased the last sentence interrogatively. "Neat" depends on what you're used to. I tend to use a DictConnection as connection_factory on the psycopg2.connect, which makes all my cursors DictCursors. Haven't tested it, but I believe gives that the same capability as RealDictCursors (as far as json.dumps is concerned), but with added flexibility.
Peter Bengtsson
Using a DictConnection implies a DictCursor. But I really want a RealDictCursor otherwise, when you run `json.dumps(cur.fetchall())` it doesn't work. I think when json.dumps() interrogates a DictRow object it, for some reason, serializes it to a list.

That's why RealDictCursor is more impressive to me.
Simon
Wouldn't expect it to be faster, really. There's a little bit of CPU saving by not doing the redundant copy, but as you say, most of the activity is I/O. I came to realise years ago that with database application, there's very little point in trying to optimise CPU use in your own code, because it's typically dwarfed by what goes on in the database driver, never mind the actual I/O element.

That said, the change is more than worth it, just for how it simplifies the code.
Ross Reedstrom
Right, the biggest optimizations will come from limiting IO, so doing as much as possible on the other end of that pipe, in the DB engine itself: nothing worse than effectively doing a join on the wrong end of the io, not to mention sorts.
Ross Reedstrom
As to the posts topic, there is a generic way to take advantage of the native json support in both postgresql and psycopg2: try something like this:

cur.execute('select array_agg(row_to_json(blogcomments)) from blogcomments limit 10')
res=cur.fetchall()
print json.dumps(res[0][0], indent=2)

Aside from the somewhat obscure reasons for the two extra list wrappers, this keeps you columns names/json keys in one place: the db table definition.
Jough Dempsey
MySQLdb.cursors.DictCursor works pretty much the same way if you're using MySQL instead of PostgreSQL. See: http://mysql-python.sourceforge.net/MySQLdb-1.2.2/public/MySQLdb.cursors.DictCursor-class.html
jvidin
Excellent method thank for sharing this clean way to workaround of adding the column names.
Is there a similar way to do it in reverse from json to a postgres table predefined ?
Thanks Jorge
Tamer
When I ran your sample on my db table I got;
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: datetime.date(2014, 3, 24) is not JSON serializable
Peter Bengtsson
You need to take care of things like datetime.date.
E.g. http://stackoverflow.com/a/27058505
Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
All my apps are now running on one EC2 server 03 November 2013
Next:
A Django base class for all your Forms 16 November 2013
Related:
json-schema-reducer 02 August 2016
Gzip rules the world of optimization, often 09 August 2014
Migration of Postgres 9.2 to 9.3 with Homebrew and json_enhancements 30 April 2014
Connecting with psycopg2 without a username and password 24 February 2011
jsonpprint - a Python script to format JSON data nicely 21 November 2010
How I made my MongoDB based web app 10 times faster 21 October 2010
To JSON, Pickle or Marshal in Python 08 May 2009