Optimization of getting random rows out of a PostgreSQL in Django

Wednesday, Feb 23, 2011

⬅︎ Back to Optimization of getting random rows out of a PostgreSQL in Django

Comment

Peter Bengtsson February 25, 2011

Do it on a proper database like postgres and copy and paste the SQL that Django generates then run it like this:

mydatabase# EXPLAIN ANALYZE <the big sql statement>;

Parent comment

hutch February 25, 2011

on c, taking a slice rather than get(pk=X) i just tried it (this is on sqlite though, which just occurred to me) could someone with a real db test this? t0 = time() blah = Model.objects.all()[3].pk print '%f seconds' % (time() - t0) t0 = time() blah = model.objects.model.objects.get(pk=2).pk print '%f seconds' % (time() - t0) yielded these three trials 0.000898 seconds 0.001472 seconds 0.001804 seconds 0.001447 seconds 0.001948 seconds 0.001504 seconds when i write out the function, similar to the using_max function, i get these results (first is using count and slices, second is the using_max function 0.007093 seconds 0.010980 seconds 0.006881 seconds 0.011418 seconds 0.006947 seconds 0.011399 seconds count = model.objects.all().count() i = 0 while i < TIMES: try: yield model.objects.all()[random.randint(0, count-1)].pk i += 1 except model.DoesNotExist: pass like i said though, his is on sqlite, with probably not a representative dataset.

Replies

hutch February 25, 2011

no need, now that i've filled up the database with faked data, we get this:

0.862994 seconds
0.104061 seconds

0.894336 seconds
0.114008 seconds

0.809722 seconds
0.102276 seconds

Peter Bengtsson February 25, 2011

Are you sure you're doing that right? I just tried it myself and got this:

COUNT 84482
using_max() took 0.613966941833 seconds
using_max2() took 2.08254098892 seconds
using_count_and_slice() took 14.112842083 seconds

Code here: http://www.peterbe.com/plog/getting-random-rows-postgresql-django/get_random_ones.py

hutch February 25, 2011

no, i think the numbers jive, and you're right.

it seems to be a 7x factor using the count/slice method i was talking about.

Peter Bengtsson February 25, 2011

Don't think so, out of the 14 seconds about 0.4 seconds is spent getting the counts.