09 March 2010 15 comments Python, Django
Following on from yesterday's blog about How and why to use django-mongokit I extended the exampleproject which is inside the django-mongokit project with another app called exampleapp_sql
which does the same thing as the exampleapp
but does it with SQL instead. Then I added a very simple benchmarker app in the same project and wrote three functions:
The results can speak for themselves:
# 10
mongokit django_mongokit.mongodb
Creating 10 talks took 0.0108649730682 seconds
Editing 10 talks took 0.0238521099091 seconds
Deleting 10 talks took 0.0241661071777 seconds
IN TOTAL 0.058883190155 seconds
sql django.db.backends.postgresql_psycopg2
Creating 10 talks took 0.0994439125061 seconds
Editing 10 talks took 0.088721036911 seconds
Deleting 10 talks took 0.0888710021973 seconds
IN TOTAL 0.277035951614 seconds
# 100
mongokit django_mongokit.mongodb
Creating 100 talks took 0.114995002747 seconds
Editing 100 talks took 0.181537866592 seconds
Deleting 100 talks took 0.13414812088 seconds
IN TOTAL 0.430680990219 seconds
sql django.db.backends.postgresql_psycopg2
Creating 100 talks took 0.856637954712 seconds
Editing 100 talks took 1.16229200363 seconds
Deleting 100 talks took 0.879518032074 seconds
IN TOTAL 2.89844799042 seconds
# 500
mongokit django_mongokit.mongodb
Creating 500 talks took 0.505300998688 seconds
Editing 500 talks took 0.809900999069 seconds
Deleting 500 talks took 0.65673494339 seconds
IN TOTAL 1.97193694115 seconds
sql django.db.backends.postgresql_psycopg2
Creating 500 talks took 4.4399368763 seconds
Editing 500 talks took 5.72280597687 seconds
Deleting 500 talks took 4.34039878845 seconds
IN TOTAL 14.5031416416 seconds
# 1000
mongokit django_mongokit.mongodb
Creating 1000 talks took 0.957674026489 seconds
Editing 1000 talks took 1.60552191734 seconds
Deleting 1000 talks took 1.28869891167 seconds
IN TOTAL 3.8518948555 seconds
sql django.db.backends.postgresql_psycopg2
Creating 1000 talks took 8.57405209541 seconds
Editing 1000 talks took 14.8357069492 seconds
Deleting 1000 talks took 11.9729249477 seconds
IN TOTAL 35.3826839924 seconds
On average, MongoDB is 7 times faster.
All in all it doesn't really mean that much. We expect MongoDB to be faster than PostgreSQL because what it lacks for in features it makes up for in speed. It's interesting to see it in action and nice to see that MongoKit is fast enough to benefit from the database's speed.
As always with benchmarks: Lies, lies and more damn lies! This doesn't really compare apples for apples but hopefully with django-mongokit the comparison is becoming more fair. Also, you're free to fork the project on github and do your optimizations and re-run the tests yourself.
Single table/document queries, and then joined tables/simulated joins. I wonder how the benefits from I/O surpass the query times, and vice-versa.
It seems, that "good old SQL" is often slower in those "create 1000 objects" benchmarks. MySQL has some benchmarks, that make PostgreSQL look like a slow database. SQLite is also very fast, when compared to MySQL. BerkeleyDB may be even faster ;) unfortunatley, some OSS projects I know stopped using it at some point. But well then, again, creating as many objects as possible quickly may be really your app model, so...
How about other benchmarks, like "update every record, that matches 3 - 4 foreign keys AND an IN() range query"?
What about data reliability?
Default PostgreSQL installation is also far from perfect. If speed is your goal, then you can tune PostgreSQL not to use disk that often - disable fsync, set a large COMMIT delay and so on. These settings make the database unreliable in case of power-loss, but for me - I run django tests on local machine - they make testing way faster.
Just my $0.05. For some time in my life, I neglected SQL databases. This didn't turn out to be as good as I thought.
Truth is, when speed matters it's probably because your project matters. And if your project matters your data matters and then you'll need to take reliability and durability to a whole new level and you'd have to start over with the benchmarks.
Try running the tests when you wrap the inserts into *one* transaction.
But do note that the view in the benchmark does not use transactions. So I *am* comparing 1000 inserts vs. 1000 inserts.
No you dont.
Postgresql will add an implicit transactions if no one opened an explicit one.
In your special case you are using django without transactions in your code. Djangos documentation states that per default it will wrap every call in one transaction.
So basically you are dealing with 1000 explicit transactions and 1000 inserts on PGs side.
Sorry to say that but your benchmark results are misleading.
Also note that this is a syntetic micro benchmark. In real world there would be many queries within one transaction and also bulk queries to get rid of the client<>db roundtrip.
Also, like I said before, both "sides" of this benchmark can be optimized further.
Basically, you tell postgres to wait to see if any other transactions complete within a given time window so it can log them all at once. The cost is some potential latency as some transactions may not return as quickly. Look at commit_delay and commit_siblings. You may also need to tweak wal_buffers.
I hear that Tokyo Cabinet is fast to write but dog slow to *index* when the total number goes up.
I wonder how fast MongoDB would be to insert 24k entries where you already have 1m, 10m or 100m entries in there already.
1. user PREPARE .. EXECUTE in PostgreSQL to skip part of sql parser and all of optimizer part:
PREPARE query AS INSERT INTO table (id, name, value) VALUES ($1, $2, $3);
EXECUTE query('11', 'new', 'item');
2. Use Copy. This will show true PG performance by inserting CSV values into database table.
COPY FROM '/var/tmp/file.csv' INTO table;
150mb not in use
read data from memory and output on list?