when i write out the function, similar to the using_max function, i get these results (first is using count and slices, second is the using_max function
0.007093 seconds 0.010980 seconds
0.006881 seconds 0.011418 seconds
0.006947 seconds 0.011399 seconds
count = model.objects.all().count() i = 0 while i < TIMES: try: yield model.objects.all()[random.randint(0, count-1)].pk i += 1 except model.DoesNotExist: pass
like i said though, his is on sqlite, with probably not a representative dataset.
Comment
on c, taking a slice rather than get(pk=X)
i just tried it (this is on sqlite though, which just occurred to me) could someone with a real db test this?
t0 = time()
blah = Model.objects.all()[3].pk
print '%f seconds' % (time() - t0)
t0 = time()
blah = model.objects.model.objects.get(pk=2).pk
print '%f seconds' % (time() - t0)
yielded these three trials
0.000898 seconds
0.001472 seconds
0.001804 seconds
0.001447 seconds
0.001948 seconds
0.001504 seconds
when i write out the function, similar to the using_max function, i get these results (first is using count and slices, second is the using_max function
0.007093 seconds
0.010980 seconds
0.006881 seconds
0.011418 seconds
0.006947 seconds
0.011399 seconds
count = model.objects.all().count()
i = 0
while i < TIMES:
try:
yield model.objects.all()[random.randint(0, count-1)].pk
i += 1
except model.DoesNotExist:
pass
like i said though, his is on sqlite, with probably not a representative dataset.
Parent comment
a) trust me, it is. considerably c) have you tried it? It's ultra-slow
Replies
Do it on a proper database like postgres and copy and paste the SQL that Django generates then run it like this:
mydatabase# EXPLAIN ANALYZE <the big sql statement>;
no need, now that i've filled up the database with faked data, we get this:
0.862994 seconds
0.104061 seconds
0.894336 seconds
0.114008 seconds
0.809722 seconds
0.102276 seconds
Are you sure you're doing that right? I just tried it myself and got this:
COUNT 84482
using_max() took 0.613966941833 seconds
using_max2() took 2.08254098892 seconds
using_count_and_slice() took 14.112842083 seconds
Code here: http://www.peterbe.com/plog/getting-random-rows-postgresql-django/get_random_ones.py
no, i think the numbers jive, and you're right.
it seems to be a 7x factor using the count/slice method i was talking about.
Don't think so, out of the 14 seconds about 0.4 seconds is spent getting the counts.