To JSON, Pickle or Marshal in Python

08 May 2009   4 comments   Python

Mind That Age!

This blog post is 8 years old! Most likely, its content is outdated. Especially if it's technical.

Powered by Fusion×

To JSON, Pickle or Marshal in Python I was reading David Cramer's tip to use JSONField in Django to be able to store arbitrary fields in a SQL database. Nice. But is it fast enough? Well, I can't answer that but I did look into the difference in read/write performance between simplejson, cPickle and marshal.

Only reading:

JSON 0.00593531370163
PICKLE 0.0109532237053
MARSHAL 0.00413788318634

Reading and writing:

JSON 0.0434390544891
PICKLE 0.0289686655998
MARSHAL 0.00728442907333

Clearly marshal is faster but to quote the documentation:

"Warning: The marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source."

Clearly simplejson is a very fast reader and the JSON format has the delicious advantage that it's "human readable" (compared to the others).

NOTE! I spent about 5 minutes putting together the script and about 10 minutes writing this so feel free to doubt it's scientific accuracy.

Also, just because JSON wrote slowest here doesn't mean it's slow. Look at this code for example:

>>> import simplejson
>>> d=simplejson.load(open('classes.json'))
>>> len(open('classes.json').read())
114254
>>> from time import time
>>> def test():
...     t0=time(); simplejson.dump(d, open('/tmp/write.json','w')); t1=time()
...     return t1-t0
... 
>>> test()
0.06772303581237793
>>> test()
0.076719999313354492
>>> test()
0.081094026565551758

That's right! Less than a tenth of a second to write more than 100Kb of data.

Follow @peterbe on Twitter

Comments

Marius Gedminas
By the way, the same security warning applies to Pickle/cPickle: if you can supply arbitrary input, you can execute arbitrary code.

Marshal is also unsafe for long-term data storage: the format is intentionally undocumented and may change between Python versions.
Karl
You're including disk access and whatnot in your speed comparison. Using dumps and loads would probably be more indicative.

This prompted me to speed test between cJSON and simplejson because I'd heard that cJSON was faster. Turns out that it's faster on reads and slower on writes:

raw length: 5182477
simplejson: {'write': 0.29880690574645996, 'read': 0.37422609329223633}
cjson: {'write': 0.37676501274108887, 'read': 0.21609997749328613}

That's an average of 10 runs for the encoding/decoding speed to a string for a 5MB array of 21k json objects on a 2.2GHz MBP.
Peter Hoffmann
See http://kbyanc.blogspot.com/2007/07/python-serializer-benchmarks.html for a similar comparison. With cjson you are in the range of pickle with the adventage of a readable format. And If you want to save space you can just zip the content before saving.
John Paulett
Good post! One point is that marshal & (c)pickle can handle more complex object graphs than what the standard JSON modules (simplejson, demjson, cjson, etc.) can encode. For instance json will not handle a list of arbitrary classes. A library I work on, jsonpickle (http://jsonpickle.github.com), can help encode complex object graphs into JSON.
Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
Never seen before Google Server Error 07 May 2009
Next:
Most unusual letters in English language 12 May 2009
Related by Keyword:
Fastest Redis configuration for Django 11 May 2017
Cope with JSONDecodeError in requests.get().json() in Python 2 and 3 16 November 2016
json-schema-reducer 02 August 2016
Gzip rules the world of optimization, often 09 August 2014
Migration of Postgres 9.2 to 9.3 with Homebrew and json_enhancements 30 April 2014
Related by Text:
Python optimization anecdote 11 February 2005
Goodies from tornado-utils - part 3: send_mail 24 September 2011
Fastest Redis configuration for Django 11 May 2017
Visual speed comparison of AngularJS and ReactJS 20 July 2015
localForage vs. XHR 22 October 2014