Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page! Currently only showing blog entries under the category: Web development. Clear filter

I like the new Google Page Speed Online for it's simplicity. However, I threw it the URL of my Crosstips site http://crosstips.org and it only gave me a 80 out of 100 even though there were no high priority suggestions.

Google's new Page Speed Online hard to beat

Seems hard to beat. Surely, to win over the remaining 20 points I don't have to tick all the medium and low priority suggestions.

Like so many others you probably have an Nginx server sitting in front of your application server (Django, Zope, Rails). The Nginx server serves static files right off the filesystem and when it doesn't do that it proxy passes the request on to the backend. You might be using proxy_pass, uwsgi or fastcgi_pass or at least something very similar. Most likely you have an Nginx site configure something like this:

server {
   access_log /var/log/nginx/mysite.access.log;
   location ^~ /static/ {
       root /var/lib/webapp;
       access_log off;
   }
   location / {
       proxy_pass http://localhost:8000;
   }
}

What I do is that I add an access log directive that times every request. This makes it possible to know how long every non-trivial request takes for the backend to complete:

server {
   log_format timed_combined '$remote_addr - $remote_user [$time_local]  ' 
                             '"$request" $status $body_bytes_sent '
                             '"$http_referer" "$http_user_agent" $request_time';
   access_log /var/log/nginx/timed.mysite.access.log timed_combined;

   location ^~ /css/ {
       root /var/lib/webapp/static;
       access_log off;
   }
   location / {
       proxy_pass http://localhost:8000;
   }
}

If you do that your access log file will look the same as before except now it will have a last column that contains the timing. Now, let your site spin for a couple of days/weeks/months and later download the access log:

$ rsync -avzP root@myserver.com:/var/log/nginx/timed.mysite.access.log .

Excellent, now download this script and save it next to your log file. When you run it you get a nice little menu that should be sufficiently self-explanatory:

$ python analyze.py timed.mysite.access.log
What do you want to know?

     1) Slowest performer
     2) Most common
     3) Total cumulative time

The most interesting one is probably the last one because that's where most time is spent by your web application and perhaps that's the place to start if you want to make your site faster or if you want to know what is most important in terms of test coverage. Here's a sample output from my on the "Total cumulative time":

GETS
537.582    /?replypath=/comment-20041222-x54i/comment-20041231-0gug
519.277    /?replypath=/comment-20041222-x54i
306.039    /
259.845    /rss.xml?oc=Django
251.064    /Bush-country/
233.165    /plog/blogitem-040601-1
224.459    /plog/blogitem-040601-1/?replypath=/c0610287425
186.032    /plog/interior-octopus/octopus.jpg?display=large
170.430    /plog/blogitem-040601-1/?replypath=/comment-20050714-12mf
POSTS
182.964    /plog/button-tag-in-IE
65.311     /plog/unicode-to-ascii
30.086     /plog/blogitem-040406-1/compressor
7.581      /plog/blogitem-040806-1/test-printsql
1.676      /gkc/callback
0.372      /plog/blogitem-040404-1/getCommentCookie
0.364      /plog/donecal-homepage-10k-req-per-sec/manage_editKeywords
0.363      /plog/blogitem-040627-1/previewComment
0.274      /plog/createelement-a
0.246      /plog/blogitem-20031027-2106/previewComment

I hope it helps. Perhaps other people can help me improve the script and later we can turn it into a package. One thing I would like to see for example is to use the median to reduce crazy spikes (e.g. a URL that normally takes 10 milliseconds just once takes 10 seconds)

I've finally managed to book my ticket to see Zappa. It's the Royal Academy of Music Manson Ensemble who play about 10 Frank Zappa classics. It's here in London on Baker Street.

The Royal Academy of Music website sucks. Its ticket booking part is completely broken. Fortunately I found a way to "hack" it so that I could get a ticket. And it only cost me £1 extra.

On that note, why isn't the box office open on weekends? And why is no one answering any of their phones on a Saturday?

To make a purchase you need an account. (if I wasn't lazy I would now link to multiple studies that have shown what a bad idea that is) but you can't create an account because of a Javascript bug that pops up and says something like "- Field not valid format". Obviously not telling you which field it has failed to evaluate and no, I didn't enter anything in invalid format. So, use a web browser where you can disable Javascript and try to submit the form. (I use Firefox and the Web Developer extensions) Remember you re-enable Javascript after you've created the account.

Now, when you submit the form it will just become a blank page with nothing on it. Don't worry. At this point they will have emailed you your password. Pick up that email and go here http://tickets.ram.ac.uk/peo/crm_login.asp to log in. Now, you can try to buy the ticket as normal and proceed to checkout.

How to book a ticket on the Royal Academy of Music's website On the checkout page, even if you're logged in and type in everything correctly it will still respond with an error. One of those annoying errors that means you have to click the Back button with the risk of losing what you've typed in. The trick is that you have to select "I would like to donate". Something I genuinely don't mind but if they're going to endorse such crappy websites it hurts a little to be generous towards them. Anyway, select £1 as the donation and at this point you should be able to make the purchase.

Granted, these guys are awesome when it comes to music and me, a mortal web developer, can barely rip a CD. However, if selling tickets is something they intend to do more and if there's some sort of relationship between selling tickets, profit and happiness I would urge them to re-evaluate their booking website.

PS. For the techy geekys, doing a W3C source validation on their site yields 184 errors and 8 warnings. Impressive!

wkhtmltopdf is by far the best tool available to make PDFs. Yes. I have tried ReportLab and PISA. ReportLab might be more powerful but what you gain in fine-grained control you lose in hours and hours of productivity.

Anyway, I've learned something about font-size shrinkage and using wkhtmltopdf. Basically, if use percentage to change a font size (Arial in this case) you get a PDF where the letters are unevenly spaced between. It took me a while to figure out what the hell was going on until I changed the font-size from 90% to exactly 11px.

font-size: 90% ('font-size:90%'; the spots of red are my highlights of the ugly spacings)

font-size: 11px ('font-size:11px'; not perfect but much better)

So, at first I thought this was the first time wkhtmltopdf has disappointed me but I guess I'll just have to remember not to use percentages and continue to favor wkhtmltopdf as my choice of weapon in the PDF production world.

TfL Traffic cameras on a Google map Yesterday I found out that Transport for London lifted all restrictions for commercial use of its data that it has made available for developers.

In lack of better imagination I decided to attack the Live Traffic Cameras data and whipped up this little app: tflcameras.peterbe.com

It basically shows a map of London and then shows all the spots where traffic cameras are installed so that you can click on them. The data is updated every 3 hours I think but I haven't checked that claim yet. Use this if you're a London commuter and want to check the traffic before you hit the road.

Oh, and this app uses the geo location stuff so that I know where to zoom in first. But if you're not based in London it zooms in over Trafalgar square by default.

OpenID logo I've learned a couple of things this week on deploying my first site to use a user friendly OpenID.

My first revelation was when I realized that Google and Yahoo! have solved the usability stumbling block that you can use them as providers without having to know a personally unique URL. For example, for Yahoo! it's just http://yahoo.com which means that you don't need to offer a cryptic URL form and you can just show it as a logo image.

The second thing is that Google's hybrid OpenID + OAuth isn't as complicated as it sounds. It's basically a light extension to the OpenID "protocol" whereby you say, "while you're at it, also give me a OAuth token please so that I can connect back into Google's services later". What's important to understand though is that if you use this you need to know the "scope". scope is a URL to a service. Google Docs is a service for example and you need to search the web to figure out what the scope URL is for that service.

The third revelation was when I understood the difference between Simple Registration Extension (SREG) and Attribute Exchange (AX). Basically, AX is a newer more modern alternative and SREG was the first one. AX is better but some OpenID providers don't yet support it. Google for example, only supports AX. Key to be able to support not just Google's OpenID but any OpenID is that you can request both AX and SREG and whichever one works will be returned.

The fourth thing that helped a lot to understand was the Google's OpenID has a bug in its implementation of Attribute Exchange. Actually, perhaps it's a deliberate design choice they've made but in my opinion a bad one. Unless you say you require email, firstname, lastname, country etc. it won't return it. If you use the if_available directive you won't get it. Another bug/bad design choice is that Google seems to not forward the country attribute. It can happily do first- and last name but not country even if the documentation claims so.

The fifth thing is that python-openid is a lot easier to work with than you think. You don't need to do any crazy network checks or callbacks. For initiating the challenge all you're effectively doing is creating a long URL. If you don't like the API methods python openid offers, just add your own with:

redirect_url += '&openid.ax.mode=fetch_request' # etc.

After so many years since OpenID arrived, I'm only now excited about it. It's tonnes easier to implement than OAuth and now it's actually really pleasant to use as an end user.

I have a Zope web app that uses hand coded SQL (PostgreSQL) statements. Similar to the old PHP. My only excuse for not using an ORM was that this project started in 2005 and at the time SQLAlchemy seemed like a nerdy mess with more undocumented quirks than it was worth losing hair over.

Anyway, several statements use ILIKE to get around the problem of making things case insensitive. Something like the Django ORM uses UPPER to get around it. So I wonder how much the ILIKE statement slows down compared to UPPER and the indexed equal operator. Obviously, neither ILIKE or UPPER can use an index.

Long story short, here are the numbers for selecting on about 10,000 index records:

# ONE
EXPLAIN ANALYZE SELECT ... FROM ... WHERE name = 'Abbaye';
Average: 0.14 milliseconds

# TWO
EXPLAIN ANALYZE SELECT ... FROM ... WHERE  UPPER(name::text) = UPPER('Abbaye');
Average: 18.89 milliseconds

# THREE
EXPLAIN ANALYZE SELECT ... FROM ... WHERE  name ILIKE 'Abbaye';
Average: 24.63 milliseconds

UPPER vs. ILIKE

First of all, the conclusion is to use UPPER instead of ILIKE if you don't need to do regular expressions. Secondly, if at all possible try to use the equal operator first and foremost and only reside on the case insensitive one if you really need to.

Lastly, in PostgreSQL is there a quick and easy to use drop in alternative to make an equality operatorish thing for varchars that are case insensitive?

Importance of public URLs and how enterprisecarsales.com f's it up A friend of mine found a nice car she on www.enterprisecarsales.com so she copied the current URL from the address bar and emailed that to me. The URL was: http://www.enterprisecarsales.com/carsales/vehicleDetails.do?carIndex=1&vin=1GCHC24U36E112452 which by the look of it (notice the /vehicleDetails.do part of the URL) takes you to a page that says "Sorry, we are unable to complete your page request since the page you are trying to access no longer is available." How stupid is that? Come on, web developers made those kind of mistakes in 2001. Not 2010.

This means that people can't talk to each other about found matches on the site and this is something people want to do. Especially if you're going to spend $thousands on a car.

Come on Enterprise web team: install Django or something and give users what they want not your excuses.

Check out these all free icons

If you, like me, have various projects that do things like OAuth on Twitter or Google or you have a development site that goes to PayPal. So you're doing some Django development on http://localhost:8000/foo and click, for example, to do an OAuth on Twitter with an app you have there. Then Twitter will redirect you back to the live site with which you've set it up. But you're doing local development so you want to go back to http://localhost:8080/... instead.

Add this bookmarklet: to localhost:8000 to your browser Bookmarks toolbar and it does exactly that.

Here's its code in more verbose form:

(function() { 
   a = function(){
     location.href = window.location.href.replace(/http:\/\/[^\/]+\//,
            'http://localhost:8000/')
   };
   if (/Firefox/.test(navigator.userAgent)) { 
     setTimeout(a,0)
   } else {
      a()
   }
 })()