Fastest database for Tornado

09 October 2013   9 comments   Python, Tornado

Powered by Fusion×

When you use a web framework like Tornado, which is single threaded with an event loop (like nodejs familiar with that), and you need persistency (ie. a database) there is one important questions you need to ask yourself:

Is the query fast enough that I don't need to do it asynchronously?

If it's going to be a really fast query (for example, selecting a small recordset by key (which is indexed)) it'll be quicker to just do it in a blocking fashion. It means less CPU work to jump between the events.

However, if the query is going to be potentially slow (like a complex and data intensive report) it's better to execute the query asynchronously, do something else and continue once the database gets back a result. If you don't all other requests to your web server might time out.

Another important question whenever you work with a database is:

Would it be a disaster if you intend to store something that ends up not getting stored on disk?

This question is related to the D in ACID and doesn't have anything specific to do with Tornado. However, the reason you're using Tornado is probably because it's much more performant that more convenient alternatives like Django. So, if performance is so important, is durable writes important too?

Let's cut to the chase... I wanted to see how different databases perform when integrating them in Tornado. But let's not just look at different databases, let's also evaluate different ways of using them; either blocking or non-blocking.

What the benchmark does is:

I can vary the number of records ("X") and sum the total wall clock time it takes for each database engine to complete all of these tasks. That way you get an insert, a select, an update and a delete. Realistically, it's likely you'll get a lot more selects than any of the other operations.

And the winner is:

pymongo!! Using the blocking version without doing safe writes.

Fastest database for Tornado

Let me explain some of those engines

You can run the benchmark yourself

The code is here on github. The following steps should work:

$ virtualenv fastestdb
$ source fastestdb/bin/activate
$ git clone
$ cd fastestdb
$ pip install -r requirements.txt
$ python

Then fire up http://localhost:8000/benchmark?how_many=10 and see if you can get it running.

Note: You might need to mess around with some of the hardcoded connection details in the file


Before the lynch mob of HackerNews kill me for saying something positive about MongoDB; I'm perfectly aware of the discussions about large datasets and the complexities of managing them. Any flametroll comments about "web scale" will be deleted.

I think MongoDB does a really good job here. It's faster than Redis and Memcache but unlike those key-value stores, with MongoDB you can, if you need to, do actual queries (e.g. select all talks where the duration is greater than 0.5). MongoDB does its serialization between python and the database using a binary wrapper called BSON but mind you, the Redis and Memcache drivers also go to use a binary JSON encoding/decoder.

The conclusion is; be aware what you want to do with your data and what and where performance versus durability matters.

What's next

Some of those drivers will work on PyPy which I'm looking forward to testing. It should work with cffi like psycopg2cffi for example for PostgreSQL.

Also, an asynchronous version of elasticsearch should be interesting.


Today I installed RethinkDB 2.0 and included it in the test.

With RethinkDB 2.0

It was added in this commit and improved in this one.

I've been talking to the core team at RethinkDB to try to fix this.


Too bad it does not have much support as Django
Peter Bengtsson
Nice post!
But I think it would be nice if you do it with a Vagrant, this way the hole benchmark is kept (including the machine).
Bernardo Heynemann
Hi Peter,

My name is Bernardo Heynemann and I'm the creator and (currently) only committer on MotorEngine.

I did some benchmarks (although anecdotal) on Motor, PyMongo, MongoEngine and MotorEngine.

What I found out is that when performing database operations, pyMongo and MongoEngine are WAY faster than Motor and MotorEngine (which I kind of expected, due to the way tornado asyncs stuff enqueueing it in the ioloop).

That said, when using Tornado under some stress, they all yielded the same number of requests per second (roughly), which I concluded means not blocking actually pays off.

I haven't pursued this more because MotorEngine is still very much work in progress. Once it's more stable I'll try to get a much more comprehensive benchmark suite. I still believe that even if the requests are SLOWER (which they will be) it does pay off releasing the IoLoop to make sure you get to accept the next request.

What do you think?

Bernardo Heynemann
Peter Bengtsson
Testing the asynchronous nature is a whole different beast.
I think a decent test would be to write a REST api so that each client can do something like this:

 ids = []
 for i in range(HOW_MANY_TIMES):
    r ='http://localhost:8000/benchmark/create', topic=random_topic(), ...)
    assert r.status_code == 200
for id in ids:
    r ='http://localhost:8000/benchmark/edit', id=id, topic=random_topic(), ...
    assert r.status_code == 200
 for id in ids:
    r ='http://localhost:8000/benchmark/delete', id=id, topic=random_topic(), ...
    assert r.status_code == 200

Then, you run that concurrently, once for each database engine and count the total time it took to complete everything.
Eric Radman
For me this project actually served as a nice first-time tutorial of Tornado.
Me You
have you tried using ?
it seems to be more mature than toredis
Daniel Mewes
Hi Peter,
as mentioned elsewhere, one thing to point out about the new RethinkDB results is that they are using the default setting of "hard" durability from what I can tell.

Hard durability means that every individual write will wait for the data to be written to disk before the next one is run in this benchmark.

As far as I can tell, none of the other databases in this benchmark are configured to wait for disk writes. Depending on your disk and whether it's an SSD or rotational drive, this can easily result in a performance difference of 10x-1000x.

MongoDB's `safe` mode for example would be equivalent to RethinkDB's "soft" durability: Both of them acknowledge a write as soon as it has been applied to the data set in memory. However they don't wait until the data has been persisted to disk (this happens lazily in the background).

For more comparable results, I recommend running with the line `rethinkdb.db('talks').table_create('talks', durability='soft').run(conn)` which is currently commented out in your code.

- Daniel @ RethinkDB
"However, the reason you're using Tornado is probably because it's much more performant that more convenient alternatives like Django". This is (or at least should be) false in the vast majority of the cases, if not all. Please if you want to be fast with Django it has its ways to go (a good way to start is reading the Book: High Performance Django). You use or should use Tornado for real time purposes in the 99,9% of the times, that is long polling, http2, websockets... something cannot be done (at least as good as Tornado or Twisted, although there're ways like gevent, channels...). Of course when you need asynchronous requests/processing or real-time streams you can and probably you should rely in ACID databases, no matter if it's MySQL, Postgresql or Rethinkdb. In the case of Postgresql, there a couple of libraries to wrap psycopg2 properly for tornado.

And before saying "if you have a huge number of requests per second you are not tied to the number of workers or the database I/O capacity" please consider that you are in the 0,01% or less of the cases. For the rest, keep in mind this: "don't use Tornado instead of Django (or WSGI, synchronous stack) because of performance, and even less droping ACID database for the very same reason".
Thank you for posting a comment

Your email will never ever be published

Related posts

10 years of blogging 19 September 2013
Lazy loading below the fold 26 October 2013
ElasticSearch, snowball analyzer and stop words 25 September 2015
How do log ALL PostgresSQL SQL happening 20 July 2015 02 April 2015
django-fancy-cache with or without stats 11 March 2013
Postgres collation errors on CITEXT fields when upgrading to 9.1 21 May 2012
Secs sell! How frickin' fast this site is! (server side) 05 April 2012
Persistent caching with fire-and-forget updates 14 December 2011
Optimization story involving something silly I call "dict+" 13 June 2011
Mocking DBRefs in Mongoose and nodeunit 14 April 2011
Connecting with psycopg2 without a username and password 24 February 2011