Optimization story involving something silly I call "dict+"

13 June 2011   0 comments   Python, MongoDB

https://gist.github.com/1021777

Powered by Fusion×

Here's a little interesting story about using MongoKit to quickly draw items from a MongoDB

So I had a piece of code that was doing a big batch update. It was slow. It took about 0.5 seconds per user and I sometimes had a lot of users to run it for.

The code looked something like this:

 for play in db.PlayedQuestion.find({'user.$id': user._id}):
    if play.winner == user:
         bla()
    elif play.draw:
         ble()
    else:
         blu()

Because the model PlayedQuestion contains DBRefs MongoKit will automatically look them up for every iteration in the main loop. Individually very fast (thank you indexes) but because of the number of operations very slow in total. Here's how to make it much faster:

   for play in db.PlayedQuestion.collection.find({'user.$id': user._id}):

The problem with this is that you get dict instances for each which is more awkward to work with. I.e. instead of `play.winner` you have use `play['winner'].id`. Here's my solution that makes this a lot easier:

class dict_plus(dict):

  def __init__(self, *args, **kwargs):
       if 'collection' in kwargs:  # excess we don't need
           kwargs.pop('collection')
       dict.__init__(self, *args, **kwargs)
       self._wrap_internal_dicts()

   def _wrap_internal_dicts(self):
       for key, value in self.items():
           if isinstance(value, dict):
               self[key] = dict_plus(value)

   def __getattr__(self, key):
       if key.startswith('__'):
           raise AttributeError(key)
       return self[key]

  ...

 for play in db.PlayedQuestion.collection.find({'user.$id': user._id}):
    play = dict_plus(play)
    if play.winner.id == user._id:
         bla()
    elif play.draw:
         ble()
    else:
         blu()

Now, the whole thing takes 0.01 seconds instead of 0.5. 50 times faster!!

Comments

Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
Launching Kwissle.com 04 June 2011
Next:
Chinese tea sampler pack now on sale 16 June 2011
Related by keywords:
Fastest way to uniqify a list in Python 14 August 2006
mincss "Clears the junk out of your CSS" 21 January 2013
Fastest database for Tornado 09 October 2013
Optimization of getting random rows out of a PostgreSQL in Django 23 February 2011
Gzip rules the world of optimization, often 09 August 2014
Fastest way to thousands-commafy large numbers in Python/PyPy 13 October 2012
mincss in action - sample report from the wild 22 January 2013
mincss version 0.8 is much much faster 27 February 2013
Optimizing MozTrap 04 June 2014
mongoengine vs. django-mongokit 24 May 2010
Speed test between django_mongokit and postgresql_psycopg2 09 March 2010
How I made my MongoDB based web app 10 times faster 21 October 2010