In Django, how much faster is it to aggregate?

27 October 2010   5 comments   Django

Powered by Fusion×

Being able to do aggregate functions with Django's QuerySet API is really useful. Not because it's difficult to write your own loop but because the summation is then done inside the SQL database. I had this piece of code:

t = Decimal('0')
for each in some_queryset:
   t += each.cost

Which can be rewritten like this instead:

t = qs.aggregate(Sum('cost'))['cost__sum']

For my 6,000+ records in the database the first one takes about 0.7 seconds. The aggregate takes 0.02 seconds. Blimey! That's over 30 fold difference in speed for practically the same thing.

Granted, when doing the loop you can do some other stuff such as counting or additional function calls but that difference is quite significant. In my current application those 0.7 seconds isn't really a problem but it quickly becomes when it has to be done over and over for multiple sets.


Dave Arter
What database backend are you using? Have you tried it with a variety of different backends?
Isn't that mainly the cost of creating model instances ? Can you try:

s = Decimal('0')
for value in some_queryset.values_list('cost', flat=True):
s += value

or even:

s = sum(qs.values_list('cost', flat=True), Decimal('0'))

Should be much faster then the first solution.

PS. Is there a way to indent code in comments ?
Peter Bengtsson
Ok. Benchmarked that too quickly. Using `values_list` is about 10 times slower than doing the aggregate in the database.

Faster than doing the full ORM but still not as fast as letting the database do it.
I'm kind of disappointed with that post. I like your blog and your entries about various things referring Django, benchmarks etc. and I enjoy reading it. But this?
lqc wrote some nice examples of how it should be done to have more reliable results. Nevertheless, in my opinion it's a quite odd comparison in general.

Peter Bengtsson
I'm sorry about that. This was not a scientific blog. Anybody knows that pushing the work into the database is going to be quicker. I just wanted to share some rough numbers of how much quicker.

The conclusion to take away is that if you want some performance you need to be ready to step out of the abstraction that something like an ORM offers.

Your email will never ever be published

Related posts

How I made my MongoDB based web app 10 times faster 21 October 2010
Javascript tip: nifty use of the console.log function in Firebug 07 November 2010
Related by keywords:
To readline() or readlines() 12 March 2004
bool is instance of int in Python 05 December 2008
Reciprocal lesson about gender perspectives 02 September 2011
Nginx vs. Squid 17 March 2009
IssueTrackerProduct now officially abandoned 30 March 2012
How and why to use django-mongokit (aka. Django to MongoDB) 08 March 2010
On the command line no one can hear you screen. Or can they? 03 May 2012
Nasty surprise of Django cache 09 December 2008
Random ID generator for Zope 02 September 2005
Google Calendar, iCalendar Validator but not bloody Apple iCal 09 April 2009
tempfile in Python standard library 07 February 2006
Google and Python code 22 February 2006