So I set out to benchmark good old threaded fcgi and gunicorn and then with a source compiled nginx with the uwsgi module baked in I also benchmarked uwsgi. The first mistake I did was testing a Django view that was using sessions and other crap. I profiled the view to make sure it wouldn't be the bottleneck as it appeared to take only 0.02 seconds each. However, with fcgi, gunicorn and uwsgi I kept being stuck on about 50 requests per second. Why? 1/0.02 = 50.0!!! Clearly the slowness of the Django view was thee bottleneck (for the curious, what took all of 0.02 was the need to create new session keys and putting them into the database).
So I wrote a really dumb Django view with no sessions middleware enabled. Now we're getting some interesting numbers:
fcgi (threaded) 640 r/s fcgi (prefork 4 processors) 240 r/s (*) gunicorn (2 workers) 1100 r/s gunicorn (5 workers) 1300 r/s gunicorn (10 workers) 1200 r/s (?!?) uwsgi (2 workers) 1800 r/s uwsgi (5 workers) 2100 r/s uwsgi (10 workers) 2300 r/s (* this made my computer exceptionally sluggish as CPU when through the roof)
If you're wondering why the numbers appear to be rounded it's because I ran the benchmark multiple times and guesstimated an average (also obviously excluded the first run).
- For gunicorn it didn't change the numbers if I used a TCP (e.g. 127.0.0.1:9000) or a UNIX socket (e.g. /tmp/wsgi.sock)
- On the upstream directive in nginx it didn't impact the benchmark to set
- fcgi on my laptop was unable to fork new processors automatically in this test so it stayed as 1 single process! Why?!!
- when you get more than 2,000 requests/second the benchmark itself and the computer you run it on becomes wobbly. I managed to get 3,400 requests/second out of uwsgi but then the benchmark started failing requests.
- These tests were done on an old 32bit dual core Thinkpad with 2Gb RAM :(
- uwsgi was a bitch to configure. Most importantly, who the hell compiles source code these days when packages are so much much more convenient? (Fry-IT hosts around 100 web servers that need patching and love)
- Why would anybody want to use sockets when they can cause permission problems? TCP is so much more straight forward.
- changing the number of ulimits to 2048 did not improve my results on this computer
- gunicorn is not available as a Debian package :(
- Adding too many workers can actually damage your performance. See example of 10 workers on gunicorn.
- I did not bother with mod_wsgi since I don't want to go near Apache and to be honest last time I tried I got really mysterious errors from mod_wsgi that I ran away screaming.
gunicorn is the winner in my eyes. It's easy to configure and get up and running and certainly fast enough and I don't have to worry about stray threads being created willy nilly like threaded fcgi. uwsgi definitely worth coming back to the day I need to squeeze few more requests per second but right now it just feels to inconvenient as I can't convince my sys admins to maintain compiled versions of nginx for the little extra benefit.
Having said that, the day uwsgi becomes available as a Debian package I'm all over it like a dog on an ass-flavored cookie.
And the "killer benefit" with gunicorn is that I can predict the memory usage. I found, on my laptop: 1 worker = 23Mb, 5 workers = 82Mb, 10 workers = 155Mb and these numbers stayed like that very predictably which means I can decide quite accurately how much RAM I should let Django (ab)use.
Since this was publish we, in my company, have changed all Djangos to run over uWSGI. It's proven faster than any alternatives and extremely stable. We actually started using it before it was merged into core Nginx but considering how important this is and how many sites we have it's not been a problem to run our own Nginx package.
Voila! Now feel free to flame away about the inaccuracies and what multitude of more wheels and knobs I could/should twist to get even more juice out.