Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page! Currently only showing blog entries under the category: Web development. Clear filter

If the number 1 rule for making faster websites is to "Minimize HTTP Requests", then, let's try it.

On this site, almost all pages are served entirely from memcache. Django renders the template with the database content and the generated HTML is cached. So I thought I insert a little post processing script that converts all <img src="...something..."> into <img src="data:image/png;base64,iVBORw0KGgo..."> which basic means the HTML gets as fat as the sum of all referenced images combined.

It's either 10Kb HTML followed by (rougly) 10 x 30Kb images or it's 300Kb HTML and 0 images. The result is here: http://www.peterbe.com/about2 (open and view source)

You can read more about the Data URI scheme here if you're not familiar with how it works.

The code is a hack but that's after all what a personal web site is all about :)

So, how much slower is it to serve? Well, actual server-side render time is obviously slower but it's a process you only have to do a small fraction of the total time since the HTML can be nicely cached.

Running..
ab -n 1000 -c 10 http://www.peterbe.com/about

BEFORE:

Document Path:          /about
Document Length:        12512 bytes

Concurrency Level:      10
Time taken for tests:   0.314 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      12779000 bytes
HTML transferred:       12512000 bytes
Requests per second:    3181.36 [#/sec] (mean)
Time per request:       3.143 [ms] (mean)
Time per request:       0.314 [ms] (mean, across all concurrent requests)
Transfer rate:          39701.75 [Kbytes/sec] received

AFTER:

Document Path:          /about2
Document Length:        306965 bytes

Concurrency Level:      10
Time taken for tests:   1.089 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      307117000 bytes
HTML transferred:       306965000 bytes
Requests per second:    918.60 [#/sec] (mean)
Time per request:       10.886 [ms] (mean)
Time per request:       1.089 [ms] (mean, across all concurrent requests)
Transfer rate:          275505.06 [Kbytes/sec] received

So, it's basically 292Mb transferred instead of 12Mb in the test and the requests per second is a third of what it used to be. But it's not too bad. And with web site optimization, what matters is the individual user's impression, not how much or how little the server can serve multiple users.

Next, how does the waterfall of this look?

BEFORE:

WebPagetest WebpageTest before

Pingdom Tools Pingdom Tools before

AFTER:

WebPagetest WebpageTest after

Pingdom Tools Pingdom Tools after

Note! All images when served individually (the "before" version) are all served from a fast CDN. The HTML is served from London, United Kingdom and the Webpagetest was run from Virginia, USA.

What can we conclude from this:

  • It worked! There are less requests. 18 requests becomes 6 requests.
  • The "Start Render" time is significantly started earlier.
  • The "Document Complete" event happens slightly earlier
  • The total file size goes from 286Kb to 283Kb!
  • Before: First load takes 2 seconds, repeated view takes 0.4 seconds
  • After: First load takes 2 seconds, repeated view takes 2 secondsd :(
  • Pingdom Tools sums the kilobytes which gives a rounding error compared to WebPagetest

Some more thoughts and conclusions:

If you're wondering how the total file size is the same as before (sum of html + images) it's because all images are turned into base64 into one large document which gzip presumably does better on. If there were fewer images I'd suspect the second version would be slightly bigger in total.
Apparently the base64 version + gzip is supposed to be 2-5% bigger than the original JPG/PNG individually.

Don't do this at home kids if you don't have a good server-side cache and a good web server that serves the HTML gzipped.

Although the code I put in place to make this possible is, right now, pretty ugly it is after all pretty convenient to the developer because it's like a plugin you just add to the rendering. You don't even notice this going on in the template or in the view code. However...

More work is needed. And that is the IE <= 7 guys. Basically Internet Explorer 7 and worse don't support it at all so you need a shim for them that looks something like this:

<!--[if lt IE 7]> 
<script>
$('img').each(function() {
  $(this).attr(src, $(this).data('orig-src'));
})
</script>
<![endif]-->


It would need some love and work but the principle is there and it's sound.

Or, just ignore them. After all, only 3% of my visitors are on IE8 and only 0.5% are on IE7. At least they can read the text. This brutal exclusion isn't always and option. But the shim is.

I think I'm going to keep it. The code needs to be packaged up and made neat before I stick to it. There is a lot more interesting things one can do with this. For example, you could in a post processor optimize the CSS used by inspecting the DOM to see which selectors can be dumped.

UPDATE

Some really valuable comments below have pointed out that using data URIs cause a memory bloat in Gecko which means that it might be particularly harmful for people with multiple tabs or using mobile devices.

Hmm... back to the drawing board a bit I guess.

"Gamification is the use of game-thinking and game mechanics in non-game contexts in order to engage users and solve problems" -- wikipedia

Gamification sneaks into a software developer's life whether he/she likes it or not. Some work for me, some don't.

What works for me

  1. PyPI downloads on my packages
    Although clouded with inaccuracies and possible false positives (someone's build script could be pip installing over zealously), seeing your download count go up means that people actually depend on your code. Most likely, they're not just downloading to awe, they download to use it.

  2. Github followers and Starred projects
    Being followed on Github means people see your activity on their dashboard (aka. home page). Every commit and every gist you push gets potential eyes on it.
    When people star your project it probably means that they're thinking "oh neat! this could come in handy some day
    so I'll star it for now". That's kinda flattering to be honest.

  3. Twitter followers
    This doesn't apply to everyone of course but to me it does. I really try my best to write about work or code related stuff on Twitter and personal stuff on Facebook. Whenever a blog post of mine gets featured on HN or if I present at some conference I get a couple of new followers.
    Some people do a great job curating their followers, responding and keeping it very relevant. They deserve their followers.
    Yes, there are a lot of bogus Twitter accounts that follow you but since that happens to everyone it's easy to oversee. Since you probably skim through most of the "You have new follower(s)" emails, it's quite flattering when it's a real human being who does what you do or somewhat similar.

  4. Activity on Github projects
    This one is less about fame and fortune and more of a "damage prevention". Clicking into a project and seeing that the last commit was 3 years ago most definitely means the project is dead.
    I have some projects that I don't actively work on but the code might still be relevant and doesn't need much more maintenance. For those kind of projects it's good to have some sporadic activity just to signal to people it's not completely abandoned.

  5. Hacker News posts and comments "Show HN: ..."
    I've now had quite a few posts to HN that get promoted to the front page. Whenever this happens you get those almost embarrassing spikes in your Google Analytics account.
    However, it happened. Enough people thought it was interesting to vote it up to the front page.
    It's important to not count the number of comments as a measure of "success" because oftentimes comments aren't simply constructive feedback but just comments on other comments.
    Keep this one simple, the fact that you have built something that is "Show HN:..." means you probably have worked hard.

What does NOT work for me

  1. Unit test code coverage metrics
    Test coverage percentages are quite a private matter. Kinda like your stool. Unless something amazing happened, keep it to yourself.
    It's nice to see a general increase of the total percentage but do not dare to obsess about it. What matters is that you look through the report and take note that what matters is covered. Coverage on code that is allowed to break and isn't embarrassing if it does, does not need to be green all the way. Who are you trying to impress? The intern you're mentoring or the family you don't have time to spend time with because you're hunting perfection?
    I must, however, admit that I too have in the past inserted pragma: no cover in my code. Also, being able to say that you have 100% test coverage on a lib can be good "advertisement" in your README as it instills confidence in your potential users.

  2. Number of tests
    When you realize that 1 nicely packaged integration test can test just as much as 22 anally verbose unit tests you realize that number of tests is a stupid measure.
    A lot of junior test driven developers write tests that cover circumstances that are just absurd. For example "what if I pass a floating point number instead of a URL string which it's supposed to be??".
    Remember, results and quality count. Having too many tests also means more things to slow you down when you refactor.

  3. Commit counts
    On projects with multiple contributors commit counts is not a measure of anything. It has no valuable implications or deductions. Adding a newline character to a README can be 1 count.
    If you skim through the commit log on a Github project you'll notice that surprisingly many commits are trivial stuff such as style semantics or updating a CREDITS file.
    Yes, someone has to do that stuff too and we're always appreciative of that but it's not a measure of excellence over others. It's just a count.

  4. Resolved bugs/issues count
    If this mattered and was a measure of anything you could simple just swallow everything with a quick turnaround and resolve or close it.
    But not every bug deserves your attention. Even if it is a genuine bug it might still be really low priority which working on costs time and focus distraction away from much more important work.

  5. Number of releases
    It's nice to see projects making releases (or tags) but don't measure things by this. There's so much good quality software that doesn't really fit the release model.

From one of the monthly summary emails
Building a side project is fun. Launching it is fun. Improving and measuring it is fun. But marketing it is aweful!

Marketing your side project means you're not coding, instead you're walking around the interwebs with your pants down trying your hardest to get people to not only try your little project but to also get beyond that by tweeting about it, Facebook status update about it, blog about it or use whatever devices inside it to help the viral spread. Now that! ...is freckin hard.

I'm struggling to even get my best friends and my wife to even try my side projects. I can't blame them, unlike a lemonade stand at a farmers market it's very impersonal. When I tried to get my buddies to try Around The World several did but only very briefly and granted some few did give me feedback but it's really not much to go by.

So, today I'm launching the start of my new web marketing strategy: Begging

Or rather, politely asking people to help me. Instead of using the usual "we" or "our" language I'm referring to it in first person instead. The platform for this strategy experiment is on HUGEpic and it looks like this: hugepic.io/yourhelp/

I'm recently built a feature into HUGEpic that once a month emails everyone who uploaded a picture a little summary of their upload and the number of hits and comments and boldly in the footer of this email there's a link to the /yourhelp/ page (see screenshot above).

Let's see how this works out. Mostly likely it'll be just another noise in the highways of peoples' internet lifes but perhaps it can become successful too.

Mind you, the motives of all of this is for my "insert-sideproject-name-here" to become successful. And by successful I mean popular and lots of traffic. None of my side projects make me any money which makes it easier to beg. However, none of them make any money for the people I'm asking for help. Perhaps that's what could be the version 2.0 of my web marketing strategy.

Example of embedding a picture
New feature just landed! Now you can embed pictures from HUGEpic so it can be on your own site. See example below.

So to do this I opted for the simplest solution possible. It's basically just an iframe to the regular URL but with ?embedded=1 set. What this does is that it removes all buttons except the zoom navigation buttons. There are some other configurable things like like hide_download_counter=1|0 and hide_annotations=1|0. At the moment there's no UI to change these options but at least the functionality is there in case somebody wants it.

Here's what it can look like

One particular little feature I think is neat is that whilst your previewing your embedded code and you zoom in and pan around on your image, the position and zoom level is automatically inserted into the HTML code. The way this is done is by this pattern:

setInterval(pluck_position, 2 * 1000);

function pluck_position() {
    var url = iframe[0].contentWindow.location.href;
    var numbers = url.match(/\/([0-9\.]+)\/([-0-9\.]+)\/([-0-9\.]+)/g)[0];
    var zoom = parseFloat(numbers.split('/')[1]);
    var lat = parseFloat(numbers.split('/')[2]);
    var lng = parseFloat(numbers.split('/')[3]);
    ...
}

Code is here

To try it, click on any picture then click the little "Permalink" icon (the icon that looks like a chain link) in the upper right hand corner and follow the link that appears.

For example, here's an upload of a huge Minecraft world:

League of Friends
After about a month of weekend development the League of Friends is finally finished.

Usually on games like this, if it has a highscore list you might find yourself at number 3,405,912 and the people at the top of the highscore list are people you've never heard of so what's the point of comparing yourself with them?

Inviting someone by email
On Around The World, you select your own friends for your league. Everyone you invite get an email asking if they want to accept it mutually. If you want to invite someone who isn't already on Around The World, you can type in their email address and complete an email that gets sent to that friend on your behalf from Around The World.

About Peter
Also with this, you can click on any of your travelling friends and get lots more details about their progress. It doesn't reveal anything about how smart or not smart that friend is so you never have to worry about looking stupid because it never reveals with easy questions you accidentally got wrong.

About 5 years ago I switched from Apache to Nginx. And with that switch I could practically stop stabbing my feet with HTTP accelerators like Squid and Varnish because Nginx serves files from the filesystem both faster and more efficient than the accelerators. And, it's one less moving part that can go wrong.

Then in late 2010 Amazon introduced Custom Origins on their Amazon CloudFront CDN service. Compared to other competing CDNs I guess CloudFront loses some benchmarks and win some others. Nevertheless, network latency is the speed-freaks biggest enemy and CDNs are awesome.

With Custom Origin all you do is tell CloudFront to act as a "proxy". It takes and URL and replaces the domain name to go and fetch the original from your own server. For example...

  1. You prepare http://mydomain.com/static/foo.css
  2. You configure your CloudFront get your new domain (aka. "Distribution")
  3. You request http://efac1bef32rf3c.cloudfront.net/static/foo.css
  4. CloudFront fetches the resource from http://mydomain.com/static/foo.css and saves a copy
  5. CloudFront observes which cache headers were used and repeat that. Forever.

So, if I make my Nginx server serve /static/foo.css with:

Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000
Cache-Control: public

Then CloudFront will do the same and it means it will never come back to your Nginx again. In other words, your Nginx server serves the cacheable static assets once and all other requests are just the usual HTML and JSON and whatever your backend web server spits out.

So, what does this mean? It means that we can significantly re-think they way we write code that prepares and builds static assets. Instead of a complex build or a run-time process that ultimately writes files to the filesystem we can basically do it all in run-time and not worry about speed. E.g. something like this::

# urls.py
  url(r'/static/(.*\.css)', views.serve_css)

# views.py
def serve_css(request, filename):
    response = http.HttpResponse(mimetype="text/css")
    response.setHeader('Cache-Control': 'public, max-age:315360000')
    content = open(filename).read()
    content = cssmin.cssmin(content)
    content = '/* copyright: you */\n%s' % content
    response.write(content)
    return response

That's untested code that can be vastly improved but I hope you get the idea. Obviously there are lots more things you can and should do such concatenating files.

So, what does this also mean? You don't need Nginx. At least not for serving static files faster. I've shown before that something like Nginx + uWSGI is "better" (faster and less memory) than something like Apache + mod_wsgi but oftentimes the difference is negligable.

I for one am not going to re-write all my various code I have to prepare for optimal static assets hosting but I'll definietly keep this stuff in mind. After all, there are other nifty things Nginx can do too.

By the way, here's a really good diagram that explains CloudFront

UPDATE

Want to read this in Serbian? Thank you Anja Skrba for the translation!

First of all, this technique is only really applicable to apps where there's only one big HTML template which is then shuffles, part hidden and part visible thanks to lots of Javascript. Those familiar with jQuery Mobile will have seen this.

On Around The World there are a lot of images. Majority of them you don't need to see immediately because only one screen is loaded at the time. The page structure looks like this:

<div class="section" id="page1">
  <h2>Page 1</h2>
  <img src="section-icon1.png">
</div>
<div class="section" id="page2" style="display:none">
  <h2>Page 2</h2>
  <img src="section-icon2.png">
</div>
<div class="section" id="page3" style="display:none">
  <h2>Page 3</h2>
  <img src="section-icon3.png">
</div>

So, if you load that you'll notice that your browser will download "section-icon1.png", "section-icon2.png" and "section-icon3.png" even though two of the images are not going to be displayed. Good for pre-loading the images when they're later needed but bad for the user experience since the browser will be busy downloading images rather than displaying the first visible section.

This is how I solve this; first I change the HTML to be this:

<div class="section" id="page1">
  <h2>Page 1</h2>
  <img src="." data-src="section-icon1.png" class="deferred">
</div>
<div class="section" id="page2" style="display:none">
  <h2>Page 2</h2>
  <img src="." data-src="section-icon2.png" class="deferred">
</div>
<div class="section" id="page3" style="display:none">
  <h2>Page 3</h2>
  <img src="." data-src="section-icon3.png" class="deferred">
</div>

And now for the magic that turns these img tags into real normal img tags. The truth is that the Javascript about loading individual sections is a bit more complicated but in its inner core it looks something like this:

// variable 'hash' is something like 'page2'
if ($(hash + '.section').size()) {
  $('.section:visible').hide();
  $(hash + '.section').show();
  $('img.deferred', hash).each(function() {
    var el = $(this);
    el.attr('src', el.data('src'));
    el.removeClass('deferred');
  });
  ...

It makes the HTML slightly more complicated but the end result is great. It's not just useful for the first-time load but also applicable every time someone reloads the page.

Just in case this hits you too when you use CITEXT fields that were originally defined in a Postgres before version 9.1.

ProgrammingError: could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

This can happen if you use something like:

WHERE name='peter'


when field name is a case insensitive text field.

After some googling around and shooting in the dark I found the the only way to crack this is to run this command:

CREATE EXTENSION citext FROM unpackaged;

Hope that helps some poor schmuck with the same problem.

UPDATE

If you have problems applying this to new tables in Postgres 9.1 you might need to run this instead:

CREATE EXTENSION citext WITH SCHEMA public ;

The advantage with WebSockets (over AJAX) is basically that there's less HTTP overhead. Once the connection has been established, all future message passing is over a socket rather than new HTTP request/response calls. So, you'd assume that WebSockets can send and receive much more messages per unit time. Turns out that that's true. But there's a very bitter reality once you add latency into the mix.

So, I created a simple app that uses SockJS and an app that uses jQuery AJAX to see how they would perform under stress. Code is here. All it does is basically, send a simple data structure to the server which echos it back. As soon as the response comes back, it starts over. Over and over till it's done X number of iterations.

Here's the output when I ran this on localhost here on my laptop:

# /ajaxtest (localhost)
start!
Finished
10 iterations in 0.128 seconds meaning 78.125 messages/second
start!
Finished
100 iterations in 0.335 seconds meaning 298.507 messages/second
start!
Finished
1000 iterations in 2.934 seconds meaning 340.832 messages/second

# /socktest (localhost)
Finished
10 iterations in 0.071 seconds meaning 140.845 messages/second
start!
Finished
100 iterations in 0.071 seconds meaning 1408.451 messages/second
start!
Finished
1000 iterations in 0.466 seconds meaning 2145.923 messages/second

Wow! It's so fast that the rate doesn't even settle down. Back-of-an-envelope calculation tells me the WebSocket version is 5 times faster roughly. Again; wow!

Now reality kicks in! It's obviously unrealistic to test against localhost because it doesn't take latency into account. I.e. it doesn't take into account the long distance the data has to travel from the client to the server.

So, I deployed this test application on my server in London, England and hit it from my Firefox here in California, USA. Same number of iterations and I ran it a number of times to make sure I don't get hit by sporadic hickups on the line. Here are the results:

# /ajaxtest (sockshootout.peterbe.com)
start!
Finished
10 iterations in 2.241 seconds meaning 4.462 messages/second
start!
Finished
100 iterations in 28.006 seconds meaning 3.571 messages/second
start!
Finished
1000 iterations in 263.785 seconds meaning 3.791 messages/second

# /socktest (sockshootout.peterbe.com) 
start!
Finished
10 iterations in 5.705 seconds meaning 1.752 messages/second
start!
Finished
100 iterations in 23.283 seconds meaning 4.295 messages/second
start!
Finished
1000 iterations in 227.728 seconds meaning 4.391 messages/second

Hmm... Not so cool. WebSockets are still slightly faster but the difference is negligable. WebSockets are roughly 10-20% faster than AJAX. With that small a difference I'm sure the benchmark is going to vastly effected by other factors that make it unfair for one or the the other such as quirks in my particular browser or the slightest hickup on the line.

What can we learn from this? Well, latency kills all the fun. Also, it means that you don't necessarily need to re-write your already working AJAX heavy app just to gain speed because even though it's ever so slightly faster, the switch from AJAX to WebSocket comes with other risks and challenges such as authentication cookies, having to deal with channel concurrency, load balancing on the server etc.

Before you say it, yes I'm aware than WebSocket web apps comes with other advantages such as being able to hold on to sockets and push data at will from the server. Those are juicy benefits but massive performance boosts ain't one.

Also, I bet that writing this means that peeps will come along and punch hole in my code and my argument. Something I welcome with open arms!

This is part 2. Part 1 is here about how I managed to make this site fast.

The web framework powering this site is Django and in front of that is Nginx which serves all the static content (once before Amazon CloudFront CDN takes over) and all non-static traffic is passed on to a uWSGI daemon which is running 6 worker processes. The database that stores the content is PostgreSQL and all caching is done in Redis. Actually another Redis database is used for other things such as maintaining a quick look-up index of keywords to primary keys so that I can quickly mesh together blog posts by keywords.

However, as we all know the deciding factor of a web sites server-side speed is effectively the speed of the database or any other disk-bound I/O device. To remedy this I've set up some practical caching strategies which I'm quite happy with.

So, how fast is it? Here's an ab stress test against home page with 10,000 requests spread across 10 concurrent users:

Document Path:          /
Document Length:        73272 bytes

Concurrency Level:      10
Time taken for tests:   4.426 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      734250000 bytes
HTML transferred:       732720000 bytes
Requests per second:    2259.59 [#/sec] (mean)
Time per request:       4.426 [ms] (mean)
Time per request:       0.443 [ms] (mean, across all concurrent requests)
Transfer rate:          162022.11 [Kbytes/sec] received

I could probably make that 2,300 requests/second to 3,000 or 4,000 if I just increase the number of workers. However, that costs memory and since I'm currently running 19 other uWSGI workers on this server that all (all 25) in total take up a steady 1.4 Gb I don't feel like increasing that number much more. Besides since this site doesn't really get any traffic, I'm not so concerned about massive throughput on concurrent benchmarks but more about serving each and every page as fast as possible the few times it's called.

Every single page on this site is behind some sort of internal cache. The only time the PostgreSQL is involved is in rendering a page is when it's first requested after a comment has been entered or I've added (or edited) a new post. Thing is, I don't want to be inconvenienced by a stupid cache that forces me to wait an hour every time I change something. No, instead lots of Django database model signals are put in place that fire off cache invalidation when certain pieces of data is changed. You can see the code for that here.

So, for the home page for example: For each request, a small piece of Python code checks the Redis for what the latest comment add-date is and based on that tells the Django page_cache decorator to either render the page as normal or to serve the whole HTML payload from Redis. In other words, on a successful cache "hit" it actually needs two Redis look-ups. Even that could be improved and blindly just spare these look-ups by serving from the workers allocated Python memory instead but that would make things fragile, hard to unit test and it would only make the benchmarks faster which is not necessary.

The most important thing to optimize on a web site is the static content. Well, there's little point in serving the static content fast if it takes 3 seconds to say what static content to serve. Also, a fast website is likely to appear more favorable on the Google bot which effectively makes the site appear higher on Google searches.

In the next part, I'll try to share more in-depth technical bits and pieces of what I actually did although they're no secrets I think some of them are best practice and even senior web developers sometimes get them wrong.