I've learned something today. The
cPickle module in Python can be boosted with very little effort. I've also learnt that there's something even faster than a hotted 'cPickle':
The code in question is the CheckoutableTemplates which saves information about the state of templates in Zope to a file on the file system. The first thing I did was to insert a little timer which looked something like this:
result = _write2configPickle(...)
debug("_write2configPickle() took %s seconds"%t1)
I ran it many times over to be able to generate some sort of average time for writing a config item to file. The first result was: 0.0035016271.
The second thing I did was that I rewrote the algorithm at which it does the writing. I managed to prevent one avoidable read from the pickled file. This was timed in the same fashion again and the second result was: 0.00175877291 which is twice as fast already!
cPickle.dump() function has an optional parameter called
proto. I let the code explain itself:
>>> print cPickle.Pickler.__doc__
Pickler(file, proto=0) -- Create a pickler.
This takes a file-like object for writing a pickle data stream.
The optional proto argument tells the pickler to use the given
protocol; supported protocols are 0, 1, 2. The default
protocol is 0, to be backwards compatible. (Protocol 0 is the
only protocol that can be written to a file opened in text
mode and read back successfully. When using a protocol higher
than 0, make sure the file is opened in binary mode, both when
pickling and unpickling.)
Protocol 1 is more efficient than protocol 0; protocol 2 is
more efficient than protocol 1.
Specifying a negative protocol version selects the highest
protocol version supported. The higher the protocol used, the
more recent the version of Python needed to read the pickle
The file parameter must have a write() method that accepts a single
string argument. It can thus be an open file object, a StringIO
object, or any other custom object that meets this interface.
So I tried writing the pickle file in a binary mode with
proto=-1 which boosted the average time down to: 0.000777201219 which is more than twice as fast as the improved algorithm.
Lastly. I had completely forgotten about the
marshal module. It basically does was
cPickle does but is much more primitive. This is what Fredik Lundh writes in the book Python Standard Library about the
"It's a bit slower than marshal, but it can handle class instances, shared elements, and recursive data structures, among other things."
But for my particular problem, all I had to serialize was a simple but long list and a dictionary; so I can use the
marshal module without any problems. Rewriting the code to use
marshal instead of
cPickle get it another boost so the fourth and last result was: 0.000445931848. That's less than twice as fast as the previous solution. But, the difference between the beginning and the end is from 0.00350162718 to 0.000445931848 the difference is roughly a factor of 8! Pretty neat, huh?