Python optimization anecdote

11 February 2005   4 comments   Python

Powered by Fusion×

I've learned something today. The cPickle module in Python can be boosted with very little effort. I've also learnt that there's something even faster than a hotted 'cPickle': marshal.

The code in question is the CheckoutableTemplates which saves information about the state of templates in Zope to a file on the file system. The first thing I did was to insert a little timer which looked something like this:

def write2config(...):
    result = _write2configPickle(...)
    debug("_write2configPickle() took %s seconds"%t1)
    return result

I ran it many times over to be able to generate some sort of average time for writing a config item to file. The first result was: 0.0035016271.

The second thing I did was that I rewrote the algorithm at which it does the writing. I managed to prevent one avoidable read from the pickled file. This was timed in the same fashion again and the second result was: 0.00175877291 which is twice as fast already!

Now the cPickle.dump() function has an optional parameter called proto. I let the code explain itself:

>>> print cPickle.Pickler.__doc__
Pickler(file, proto=0) -- Create a pickler.

This takes a file-like object for writing a pickle data stream.
The optional proto argument tells the pickler to use the given
protocol; supported protocols are 0, 1, 2.  The default
protocol is 0, to be backwards compatible.  (Protocol 0 is the
only protocol that can be written to a file opened in text
mode and read back successfully.  When using a protocol higher
than 0, make sure the file is opened in binary mode, both when
pickling and unpickling.)

Protocol 1 is more efficient than protocol 0; protocol 2 is
more efficient than protocol 1.

Specifying a negative protocol version selects the highest
protocol version supported.  The higher the protocol used, the
more recent the version of Python needed to read the pickle

The file parameter must have a write() method that accepts a single
string argument.  It can thus be an open file object, a StringIO
object, or any other custom object that meets this interface.

So I tried writing the pickle file in a binary mode with proto=-1 which boosted the average time down to: 0.000777201219 which is more than twice as fast as the improved algorithm.

Lastly. I had completely forgotten about the marshal module. It basically does was pickle and cPickle does but is much more primitive. This is what Fredik Lundh writes in the book Python Standard Library about the pickle module:

"It's a bit slower than marshal, but it can handle class instances, shared elements, and recursive data structures, among other things."

But for my particular problem, all I had to serialize was a simple but long list and a dictionary; so I can use the marshal module without any problems. Rewriting the code to use marshal instead of cPickle get it another boost so the fourth and last result was: 0.000445931848. That's less than twice as fast as the previous solution. But, the difference between the beginning and the end is from 0.00350162718 to 0.000445931848 the difference is roughly a factor of 8! Pretty neat, huh?


0.00175877291747        _write2Picklconfig()
0.000445931848854       _write2marshalconfig()
0.00350162718031        OLD_write2Pickleconfig()
0.000777201219039       _write2Pickleconfig(proto=-1)


Instead of writing your own timing stuff, why not use the timeit module in the standard library?
Isn't timeit just for strings of code?
In my setup I was able to "listen in" on a running program.
Joe Wreschnig
The problem with the marshal module is that the underlying format is subject to change without notice. This can make it useful for passing objects around as strings between Python instances on the same machine, but not very useful for any kind of long-term storage. Mostly it's designed to generate pyc files.
Fortunately my program doesn't need to do long term storage. I also don't need to open the serialized file from any other perspective other than debugging.

Your email will never ever be published

Related posts

Gmail account giveaway 09 February 2005
Optimize with 15 February 2005
Related by keywords:
Fastest way to uniqify a list in Python 14 August 2006
mincss "Clears the junk out of your CSS" 21 January 2013
Gzip rules the world of optimization, often 09 August 2014
Optimization of getting random rows out of a PostgreSQL in Django 23 February 2011
Fastest way to thousands-commafy large numbers in Python/PyPy 13 October 2012
To JSON, Pickle or Marshal in Python 08 May 2009
mincss in action - sample report from the wild 22 January 2013
Optimizing MozTrap 04 June 2014
XHTML, HTML and CSS compressor 07 April 2004
mincss version 0.8 is much much faster 27 February 2013
DoneCal homepage now able to do 10,000 requests/second 13 February 2011
More optimization of - CSS sprites 05 August 2009