Optimization story involving something silly I call "dict+"

13 June 2011   0 comments   Python, MongoDB

https://gist.github.com/1021777

Mind That Age!

This blog post is 6 years old! Most likely, its content is outdated. Especially if it's technical.

Powered by Fusion×

Here's a little interesting story about using MongoKit to quickly draw items from a MongoDB

So I had a piece of code that was doing a big batch update. It was slow. It took about 0.5 seconds per user and I sometimes had a lot of users to run it for.

The code looked something like this:

 for play in db.PlayedQuestion.find({'user.$id': user._id}):
    if play.winner == user:
         bla()
    elif play.draw:
         ble()
    else:
         blu()

Because the model PlayedQuestion contains DBRefs MongoKit will automatically look them up for every iteration in the main loop. Individually very fast (thank you indexes) but because of the number of operations very slow in total. Here's how to make it much faster:

   for play in db.PlayedQuestion.collection.find({'user.$id': user._id}):

The problem with this is that you get dict instances for each which is more awkward to work with. I.e. instead of `play.winner` you have use `play['winner'].id`. Here's my solution that makes this a lot easier:

class dict_plus(dict):

  def __init__(self, *args, **kwargs):
       if 'collection' in kwargs:  # excess we don't need
           kwargs.pop('collection')
       dict.__init__(self, *args, **kwargs)
       self._wrap_internal_dicts()

   def _wrap_internal_dicts(self):
       for key, value in self.items():
           if isinstance(value, dict):
               self[key] = dict_plus(value)

   def __getattr__(self, key):
       if key.startswith('__'):
           raise AttributeError(key)
       return self[key]

  ...

 for play in db.PlayedQuestion.collection.find({'user.$id': user._id}):
    play = dict_plus(play)
    if play.winner.id == user._id:
         bla()
    elif play.draw:
         ble()
    else:
         blu()

Now, the whole thing takes 0.01 seconds instead of 0.5. 50 times faster!!

Follow @peterbe on Twitter

Comments

Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
Launching Kwissle.com 04 June 2011
Next:
Chinese tea sampler pack now on sale 16 June 2011
Related by Keyword:
Fastest way to match a filename's extension in Python 31 August 2017
Don't forget your sets in Python! 10 March 2017
Optimization of QuerySet.get() with or without select_related 03 November 2016
How to no-mincss links with django-pipeline 03 February 2016
mozjpeg installation and sample 10 October 2015
Related by Text:
When to __deepcopy__ classes in Python 14 March 2012
Custom CacheMiddleware that tells Javascript a page is cached in Django 24 August 2009
My dislike for booleans and that impact on the Django Admin 01 June 2009
Hosting Django static images with Amazon Cloudfront (CDN) using django-static 09 July 2010