Optimization of getting random rows out of a PostgreSQL in Django

Wednesday, Feb 23, 2011

⬅︎ Back to Optimization of getting random rows out of a PostgreSQL in Django

Comment

hutch February 25, 2011

on c, taking a slice rather than get(pk=X)

i just tried it (this is on sqlite though, which just occurred to me) could someone with a real db test this?

t0 = time()
blah = Model.objects.all()[3].pk
print '%f seconds' % (time() - t0)

t0 = time()
blah = model.objects.model.objects.get(pk=2).pk
print '%f seconds' % (time() - t0)

yielded these three trials

0.000898 seconds
0.001472 seconds

0.001804 seconds
0.001447 seconds

0.001948 seconds
0.001504 seconds

when i write out the function, similar to the using_max function, i get these results (first is using count and slices, second is the using_max function

0.007093 seconds
0.010980 seconds

0.006881 seconds
0.011418 seconds

0.006947 seconds
0.011399 seconds

count = model.objects.all().count()
i = 0
while i < TIMES:
try:
yield model.objects.all()[random.randint(0, count-1)].pk
i += 1
except model.DoesNotExist:
pass

like i said though, his is on sqlite, with probably not a representative dataset.

Parent comment

Peter Bengtsson February 24, 2011

a) trust me, it is. considerably c) have you tried it? It's ultra-slow

Replies

Peter Bengtsson February 25, 2011

Do it on a proper database like postgres and copy and paste the SQL that Django generates then run it like this:

mydatabase# EXPLAIN ANALYZE <the big sql statement>;

hutch February 25, 2011

no need, now that i've filled up the database with faked data, we get this:

0.862994 seconds
0.104061 seconds

0.894336 seconds
0.114008 seconds

0.809722 seconds
0.102276 seconds

Peter Bengtsson February 25, 2011

Are you sure you're doing that right? I just tried it myself and got this:

COUNT 84482
using_max() took 0.613966941833 seconds
using_max2() took 2.08254098892 seconds
using_count_and_slice() took 14.112842083 seconds

Code here: http://www.peterbe.com/plog/getting-random-rows-postgresql-django/get_random_ones.py

hutch February 25, 2011

no, i think the numbers jive, and you're right.

it seems to be a 7x factor using the count/slice method i was talking about.

Peter Bengtsson February 25, 2011

Don't think so, out of the 14 seconds about 0.4 seconds is spent getting the counts.