
Do you train Kung Fu?
Or know someone who does?
Then check out KungFuPeople.com
Mobile version of this pageGmail account giveaway
Next:
Optimize Plone.org with slimmer.py
Related blogs
XHTML, HTML and CSS compressorOptimized stylesheets
The problem with CSS
Quick PostgreSQL optimization story
Date formatting in Python or in PostgreSQL (part II)
Fastest way to uniqify a list in Python
More optimization of Peterbe.com - CSS sprites
DoneCal homepage now able to do 10,000 requests/second
Optimization of getting random rows out of a PostgreSQL in Django
Optimization story involving something silly I call "dict+"
To JSON, Pickle or Marshal in Python
Corp Calendar 0.0.5
CorpCalendar review on ZopeMag.com
Related by category
Python optimization anecdote
11th of February 2005
I've learned something today. The cPickle module in Python can be boosted with very little effort. I've also learnt that there's something even faster than a hotted 'cPickle': marshal.
The code in question is the CheckoutableTemplates which saves information about the state of templates in Zope to a file on the file system. The first thing I did was to insert a little timer which looked something like this:
def write2config(...):
t0=time()
result = _write2configPickle(...)
t1=time()-t0
debug("_write2configPickle() took %s seconds"%t1)
return result
I ran it many times over to be able to generate some sort of average time for writing a config item to file. The first result was: 0.0035016271.
The second thing I did was that I rewrote the algorithm at which it does the writing. I managed to prevent one avoidable read from the pickled file. This was timed in the same fashion again and the second result was: 0.00175877291 which is twice as fast already!
Now the cPickle.dump() function has an optional parameter called proto. I let the code explain itself:
>>> print cPickle.Pickler.__doc__ Pickler(file, proto=0) -- Create a pickler. This takes a file-like object for writing a pickle data stream. The optional proto argument tells the pickler to use the given protocol; supported protocols are 0, 1, 2. The default protocol is 0, to be backwards compatible. (Protocol 0 is the only protocol that can be written to a file opened in text mode and read back successfully. When using a protocol higher than 0, make sure the file is opened in binary mode, both when pickling and unpickling.) Protocol 1 is more efficient than protocol 0; protocol 2 is more efficient than protocol 1. Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced. The file parameter must have a write() method that accepts a single string argument. It can thus be an open file object, a StringIO object, or any other custom object that meets this interface.
So I tried writing the pickle file in a binary mode with proto=-1 which boosted the average time down to: 0.000777201219 which is more than twice as fast as the improved algorithm.
Lastly. I had completely forgotten about the marshal module. It basically does was pickle and cPickle does but is much more primitive. This is what Fredik Lundh writes in the book Python Standard Library about the pickle module:
"It's a bit slower than marshal, but it can handle class instances, shared elements, and recursive data structures, among other things."
But for my particular problem, all I had to serialize was a simple but long list and a dictionary; so I can use the marshal module without any problems. Rewriting the code to use marshal instead of cPickle get it another boost so the fourth and last result was: 0.000445931848. That's less than twice as fast as the previous solution. But, the difference between the beginning and the end is from 0.00350162718 to 0.000445931848 the difference is roughly a factor of 8! Pretty neat, huh?
Results:
0.00175877291747 _write2Picklconfig() 0.000445931848854 _write2marshalconfig() 0.00350162718031 OLD_write2Pickleconfig() 0.000777201219039 _write2Pickleconfig(proto=-1)
Comment
The problem with the marshal module is that the underlying format is subject to change without notice. This can make it useful for passing objects around as strings between Python instances on the same machine, but not very useful for any kind of long-term storage. Mostly it's designed to generate pyc files.
Interesting.
Fortunately my program doesn't need to do long term storage. I also don't need to open the serialized file from any other perspective other than debugging.


Instead of writing your own timing stuff, why not use the timeit module in the standard library?
Isn't timeit just for strings of code?
In my setup I was able to "listen in" on a running program.