Like so many others you probably have an Nginx server sitting in front of your application server (Django, Zope, Rails). The Nginx server serves static files right off the filesystem and when it doesn't do that it proxy passes the request on to the backend. You might be using proxy_pass, uwsgi or fastcgi_pass or at least something very similar. Most likely you have an Nginx site configure something like this:


server {
   access_log /var/log/nginx/mysite.access.log;
   location ^~ /static/ {
       root /var/lib/webapp;
       access_log off;
   }
   location / {
       proxy_pass http://localhost:8000;
   }
}

What I do is that I add an access log directive that times every request. This makes it possible to know how long every non-trivial request takes for the backend to complete:


server {
   log_format timed_combined '$remote_addr - $remote_user [$time_local]  ' 
                             '"$request" $status $body_bytes_sent '
                             '"$http_referer" "$http_user_agent" $request_time';
   access_log /var/log/nginx/timed.mysite.access.log timed_combined;

   location ^~ /css/ {
       root /var/lib/webapp/static;
       access_log off;
   }
   location / {
       proxy_pass http://localhost:8000;
   }
}

If you do that your access log file will look the same as before except now it will have a last column that contains the timing. Now, let your site spin for a couple of days/weeks/months and later download the access log:


$ rsync -avzP root@myserver.com:/var/log/nginx/timed.mysite.access.log .

Excellent, now download this script and save it next to your log file. When you run it you get a nice little menu that should be sufficiently self-explanatory:


$ python analyze.py timed.mysite.access.log
What do you want to know?

     1) Slowest performer
     2) Most common
     3) Total cumulative time

The most interesting one is probably the last one because that's where most time is spent by your web application and perhaps that's the place to start if you want to make your site faster or if you want to know what is most important in terms of test coverage. Here's a sample output from my on the "Total cumulative time":


GETS
537.582    /?replypath=/comment-20041222-x54i/comment-20041231-0gug
519.277    /?replypath=/comment-20041222-x54i
306.039    /
259.845    /rss.xml?oc=Django
251.064    /Bush-country/
233.165    /plog/blogitem-040601-1
224.459    /plog/blogitem-040601-1/?replypath=/c0610287425
186.032    /plog/interior-octopus/octopus.jpg?display=large
170.430    /plog/blogitem-040601-1/?replypath=/comment-20050714-12mf
POSTS
182.964    /plog/button-tag-in-IE
65.311     /plog/unicode-to-ascii
30.086     /plog/blogitem-040406-1/compressor
7.581      /plog/blogitem-040806-1/test-printsql
1.676      /gkc/callback
0.372      /plog/blogitem-040404-1/getCommentCookie
0.364      /plog/donecal-homepage-10k-req-per-sec/manage_editKeywords
0.363      /plog/blogitem-040627-1/previewComment
0.274      /plog/createelement-a
0.246      /plog/blogitem-20031027-2106/previewComment

I hope it helps. Perhaps other people can help me improve the script and later we can turn it into a package. One thing I would like to see for example is to use the median to reduce crazy spikes (e.g. a URL that normally takes 10 milliseconds just once takes 10 seconds)

Comments

Aldrin Leal

You missed the "$" in request_time, obviously. I just discovered after running for one week hehe

Peter Bengtsson

Thanks for noticing. Fixed now.

Nopz

It seems that the link to the script does not work anymore.

Your email will never ever be published.

Related posts