Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page! Currently only showing blog entries under the category: Web development. Clear filter

tl;dr Crash-stats is Mozilla's crash reporter dashboard. Simply fixing the static assets made the site 25% faster.

Before http://www.webpagetest.org/result/150820_X5_V5T/

After http://www.webpagetest.org/result/150824_7F_1C3Q/

(The "First Byte Time" is still terrible but that's for another discussion. We're working on a re-write of the underlying data model for that particular report.)

  • Note how the SpeedIndex dropped from 2823 to 2098 which basically means, you can see stuff sooner.

  • The Load Time used to be 5.7 seconds on average. Now it takes 3.5 seconds.

  • It used to weigh 717 KB to load the whole thing. Now it weighs 326 KB.

The only thing we changed was a long overdue correction of static asset headers and Gzip compression. Now, files with unique URLs (e.g. /static/CACHE/css/23a811f100bc.css) have maximum aggressive cache headers. And now all .js, .css and text/html is Gzipped.

Was it easy to do? Hell no!
Does it matter? Hell yeah! We don't have a lot of users or traffic on these reports but the people who use them do this for a living and making the site feel snappier for them would make their lives more productive.

Suppose that you've enabled gzip delivery of your site and its static assets. How do you test it?

One obvious way is to load the site with the developer tools in your browser and look at the headers there. Like this for example:
Is it gzip'ed? Yes!

Another more hard code and geek-power way is to simply use curl.

It goes without saying, the ideal way to set up Nginx is to make it optional. Don't upload a gzipped file to your server and force gzip down on every client. Instead, let something like Nginx handle it on-the-fly (don't worry, it's ultrafast).

So to see if gzip is working, take your URL and run these two commands:

$ curl -v --compressed http://www.example.com/page.cat > /dev/null

And look for this line:

< Content-Encoding: gzip

Also you should look for this line:

Content-Length: 2403

(number will obviously vary)

Now run the same curl command but without the --compressed. E.g.

$ curl -v http://www.example.com/page.cat > /dev/null

Now there won't be a Content-Encoding header in the response. It defaults to plain text.
Also, now look for the Content-Length and amuse yourself in the profit that this number is likely to be much larger than before.

Here's a realistic example:

With --compressed

$ curl -v --compressed https://crash-stats.allizom.org/home/products/Firefox > /dev/null
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 52.26.241.244...
* Connected to crash-stats.allizom.org (52.26.241.244) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate: crash-stats.allizom.org
* Server certificate: DigiCert SHA2 Secure Server CA
* Server certificate: DigiCert Global Root CA
> GET /home/products/Firefox HTTP/1.1
> User-Agent: curl/7.37.1
> Host: crash-stats.allizom.org
> Accept: */*
> Accept-Encoding: deflate, gzip
>
< HTTP/1.1 200 OK
< Content-Encoding: gzip
< Content-Type: text/html; charset=utf-8
< Date: Thu, 20 Aug 2015 18:38:00 GMT
* Server nginx/1.6.3 is not blacklisted
< Server: nginx/1.6.3
< Set-Cookie: anoncsrf=yieBMvzCn4fO4lmMQbjuq0Cibl9s7oxG; expires=Thu, 20-Aug-2015 20:38:00 GMT; httponly; Max-Age=7200; Path=/; secure
< Vary: Accept-Encoding
< Vary: Cookie
< X-Frame-Options: DENY
< Content-Length: 2403
< Connection: keep-alive
<
{ [data not shown]
100  2403  100  2403    0     0   4734      0 --:--:-- --:--:-- --:--:--  4730
* Connection #0 to host crash-stats.allizom.org left intact

Without

$ curl -v  https://crash-stats.allizom.org/home/products/Firefox > /dev/null
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 54.213.30.86...
* Connected to crash-stats.allizom.org (54.213.30.86) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate: crash-stats.allizom.org
* Server certificate: DigiCert SHA2 Secure Server CA
* Server certificate: DigiCert Global Root CA
> GET /home/products/Firefox HTTP/1.1
> User-Agent: curl/7.37.1
> Host: crash-stats.allizom.org
> Accept: */*
>
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< Date: Thu, 20 Aug 2015 18:38:05 GMT
* Server nginx/1.6.3 is not blacklisted
< Server: nginx/1.6.3
< Set-Cookie: anoncsrf=evG8kmoXjHv5aeyFIQHxNcnGahdxwIOy; expires=Thu, 20-Aug-2015 20:38:05 GMT; httponly; Max-Age=7200; Path=/; secure
< Vary: Accept-Encoding
< Vary: Cookie
< X-Frame-Options: DENY
< Content-Length: 12299
< Connection: keep-alive
<
{ [data not shown]
100 12299  100 12299    0     0  24314      0 --:--:-- --:--:-- --:--:-- 24306
* Connection #0 to host crash-stats.allizom.org left intact

These days, it's almost a painful experience reading newspaper news online.

It's just generally a pain to read these news sites. Because they are soooooo sloooooooow. Your whole browser is stuttering and coughing blood and every click feels like a battle.

The culprit is generally that these sites are so full of crap. Like click trackers, ads, analytics trackers, videos, lots of images and heavy Javascript.

Here's a slight comparison (in no particular order) of various popular news websites (using a DSL connection):

Site Load Time Requests Bytes in SpeedIndex
BBC News homepage 14.4s 163 1,316 KB 6935
Los Angeles Times 35.4s 264 1,909 KB 13530
The New York Times 28.6s 330 4,154 KB 17948
New York Post 40.7s 197 6,828 KB 13824
USA Today 19.6s 368 2,985 KB 3027
The Washington Times 81.7s 547 12,629 KB 18104
The Verge 18.9s 152 2,107 KB 7850
The Huffington Post 22.3s 213 3,873 KB 4344
CNN 45.9s 272 5,988 KB 12579

Wow! That's an average of...

  • 34.2 seconds per page
  • 278 requests per page
  • 4,643 KB per page

That is just too much. I appreciate that these news companies that make these sites need to make a living and most of the explanation of the slow-down is the ads. Almost always, if you click to open one of those pages, it's just a matter of sitting and waiting. It loading in a background tab takes so much resources you feel it in other tabs. And by the time you go to view and you start scrolling your computer starts to cough blood and crying for mercy.

But what about the user? Imagine if you simply pull out of some of the analytics trackers and opt for simple images for the ads and just simplify the documents down to the minimum? Google Search does this and they seem to do OK in terms of ad revenue.

An interesting counter-example; this page on nytimes.com is a long article and when you load it on WebPageTest it yields a result of 163 requests. But if you use your browser dev tools and keep an eye on all network traffic as you slowly scroll down the length of the article all the way down to the footer, you notice it downloads an additional 331 requests. It starts with about 2KB of data but by the time you've downloaded to the end and scrolled past all ads and links it's amassed about 5.5KB of data. I think that's fair. At least the inital feeling when you arrive on the page isn't that of disgust.

Another realization I've found whilst working on this summary is that oftentimes, sites that are REALLY really slow and horrible to use don't necessarily have super many external resources from different domains but they just have far far too much Javascript. This random page on Washington Times for example has 209 Javascript files that together weighs 9.4KB (roughly 8 times the amount of image data). Not only does that need to be downloaded. It also needs to be parsed and I bet a zillion event handlers and DOM manipulators kick in and basically make scrolling a crying game even on a fast desktop computer.

And then it hit me!

We've all been victims of websites annoyingly try to lure you to install their native app when trying to visit their website on a smartphone. The reason they're doing that is because they've been given a second chance to build something without eleventeen iframes and 4MB of Javascript so the native app feels a million better because they've started over and gotten things right.

My parting advice to newspapers who can do something about their website; Radically refactor your code. Question EVERYTHING and delete, remove and clean up things that doesn't add value. You might lose some stats and you might not be able to show as much ads but if your site loads faster you'll get a lot more visitors and a lot more potential to earn bigger $$$ on fewer more targetted ads.

the-toast.net on WebPageTest
The number one rule in the general best practice of web performance is "Minimize HTTP Requests".

Yeah, it's more important than anything else.

So, I took the biggest offender from the collected sites Number of Domains and ran it through WebPageTest. The result is almost comical.

That's amazing! I'm not even sure I could achieve that even if I tried. It's almost like those sickeningly infested Internet Explorer toolbars that are often a joke.

Some numbers to blow your mind (on a DSL connection from San Jose, CA, USA)...

  • Time to fully loaded: 85 seconds
  • Bytes in: 9,851 KB
  • Total requests: 1,390 (!!)
  • SpeedIndex: 10028 (my own site's 950)

What's mindblowing about how bad this site is that even on the repeated view, where a browser cache is supposed to help a lot, is that it takes 30.6 seconds and still makes 347 requests. Just wow!

Last year I put together a little experiment called AJAX or Not? and blogged about it here. The basic idea was to display 1,000 rows in a table. There are several ways of doing it but I decided to compare the following three patterns:

  1. Rendering the whole table in Django server-side and return the whole HTML document.
  2. Rendering a skeleton page, then load the table content as a big chunk of HTML via AJAX.
  3. Rendering a skeleton page, and let AngularJS load all the content of the table from the server as JSON and let AngularJS render it into the DOM.

It was clear as day that the server-side rendering version was hands down the fastest. And the AngularJS rendering the slowest.

Note! AngularJS is amazing and super flexible and powerful because you don't really need to worry about how to re-render once the data changes. This this really useful when you do things like loading more data from a remote endpoint or doing some in-page filtering.

Enter ReactJS

The point of AJAX or Not was not to compare Javascript frameworks but I had some time and I thought I'd write an equivalent version of the AngularJS one with ReactJS (version 0.13.3).

Anyway, here's the code and it's using the GitHub fetch polyfill to do the AJAX query. The AngularJS code is here and here and as you can see it's using track by on the ng-repeat.

WebPageTest
To measure the difference I ran a comparison in WebPageTest which I encourage you open and study for a bit. You can watch the video and download the video here.

Also, note that the Django rendered version loads jQuery. That's because the functionality dictates that clicking on a link should show a confirmation box before going to the link. I know, it's silly but it's very realistic that every page needs some Javascript functionality.

Executive summary...

  • Django server-side takes 0.8 seconds, ReactJS version takes 2.0 seconds and the AngularJS version takes 2.9 seconds.
  • The ReactJS version is the fastest to display something. It displays the header and the image first. Only by 0.2 seconds before the Django server-side version.
  • The AngularJS version causes a lot more CPU utilization. This might really matter when you're on a low-end smartphone.
  • The ReactJS causes twice as much CPU utilization than the server-side version. The AngularJS causes twice as much CPU utilization than the ReactJS version.
  • AngularJS is slightly larger than ReactJS + fetch but I don't think this has a huge effect on the total load time.

Some other thoughts...

  • The ReactJS code is all in one place more or less. That's neat! But it's pretty darn big in terms of number of lines. AngularJS code is split half in the Javascript code and half in the HTML.
  • It's clear, if you want a fast loading page, avoid Javascript as much as you possibly can.
  • This experiment is very optimized in how it gets the data to be displayed. In fact, the server-side rendering time is close to 0 seconds because the whole HTML blob is stored in memcache. A more realistic thing is that extracting the data would take a lot longer if the query isn't so easy to cache. That would be a huge disadvantage for the fully server-side rendered version since if the data query takes a long time you'll sit and stare at a white screen longer. Doing the AJAX approach would definitely be a nicer experience.
  • The difference isn't that big. Both fancy Javascript frameworks have amazing features that leaves jQuery in the dust but if you want your page to load crazy-fast, do as much server-side as you possibly can.

Premailer is a Python library for turning a HTML + CSS into HTML with all the CSS embedded as inline style attributes. This is sadly very necessary to ensure that your fancy HTML emails look spiffy across all email clients and email webapps.

So, last week I put together a little site to test the library via a browser: Premailer.io

It's just a simple webapp with a form where you can enter HTML in three different ways; textarea, by URL and by file upload.

You can also override all the possible advanced options that premailer supports.

What's kinda cool is that you can get a preview of how the HTML document will look like in an iframe that is dynamically loaded with the result from the conversion.

The webapp is of course open source and available on github.com/peterbe/premailer.io. The front-end is an AngularJS app and the build system is Lineman.js. The server is a Falcon server running on uWSGI via Nginx.

There's very little fancy here. There's no limitations or protections. I just hope it becomes handy for people to test premailer out.

The inspiration came from MailChimp's CSS Inliner Tool which is cute but very basic and doesn't allow you the same kinds of input.

If anybody with some AngularJS or highlight.js chops has time I'd love to help fix why the HTML is not syntax highlighted.

(For context, I released Autocompeter.com last week and now I'm thinking about improvements)

I posted a question on Twitter about which highlighting formatting people prefer and got some interesting feedback. More about that later.

The piece of feedback that really got my attention came from my friend Honza Král.
He wondered if not the whole word should be highlighted instead of just the beginning of the word.

I've actually been thinking about that too but never got around to trying it out. Until now.

Before

Before

After

After

What do you think?

I have the code in a branch and I'm still mulling it over. There's sort of a convention to just highlight based on what you've typed so far. I don't want to be too weird because when people don't feel familiar they don't like what they see even if the new actually is better.

In airmozilla the tests almost all derive from one base class whose tearDown deletes the automatically generated settings.MEDIA_ROOT directory and everything in it.

Then there's some code that makes sure a certain thing from the fixtures has a picture uploaded to it.

That means it has do that shutil.rmtree(directory) and that shutil.copy(src, dst) on almost every single test. Some might also not need or depend on it but it's conveninent to put it here.

Anyway, I thought this is all a bit excessive and I could probably optimize that by defining a custom test runner that is first responsible for creating a clean settings.MEDIA_ROOT with the necessary file in it and secondly, when the test suite ends, it deletes the directory.

But before I write that, let's measure how many gazillion milliseconds this is chewing up.

Basically, the tearDown was called 361 times and the _upload_media 281 times. In total, this adds to a whopping total of 0.21 seconds! (of the total of 69.133 seconds it takes to run the whole thing).

I think I'll cancel that optimization idea. Doing some light shutil operations are dirt cheap.

From the It-Depends-on-What-You're-Building department.

As a web developer you have a job:

  1. Display a certain amount of database data on the screen
  2. Do it as fast as possible

The first point is these days easily taken care of with the likes of Django or Rails which makes it über easy to write queries that you then use in templates to generate the HTML and voila you have a web page.

The second point is taken care of with a myriad of techniques. It's almost a paradox. The fastest way to render something on the screen is to generate everything on the server and send it wholesome. It means the browser can very quickly (and boosted by GPU) render something on the screen. But if you have a lot of data that needs to be displayed it's often better to send just a little bit of HTML and then let some Javascript kick in and take care of extracting the rest of the information using AJAX.

Here I have prepared three different versions of ways to display a bunch of information on the screen:

http://www.peterbe.com/ajaxornot/

Visual comparison on WebPagetest
What you should note and take away from this little experimental playground:

  1. All server-side work is done in Django but it's served straight out of memcache so it should be fast server-side.

  2. The content is NOT important. It's just a list of blog posts and their categories and keywords.

  3. To make it somewhat realistic, each version needs to 1) display a JPG and 2) have a Javascript onclick event that throws a confirm() dialog box.

  4. The AngularJS version loads significantly slower but it's not because AngularJS is slow, but because it's able to do so much more later. Loading a Javascript framework is like an investment. Big cost upfront and small cost later when you need more magic to happen without having a complete server refresh.

  5. View 1, 2 and 3 are all three imperfect versions but they illustrate the three major groups of solving the problem stated at the top of this blog post. The other views are attempts of optimizations.

  6. Clearly the "visually fastest" version is the optimization version 5 which is a fork of version 2 which loads, on the server-side, everything that is above the fold and then take care of the content below the fold with AJAX.
    See this visual comparison

  7. Optimization version 4 was a silly optimization. It depends on the fact that JSON is more "compact" than HTML. When you Gzip the content, the difference in size doesn't matter anymore. However, it's an interesting technique because it means you can do all business logic rendering stuff in one language without having to depend on AJAX.

  8. Open the various versions in your browser and try to "feel" how pages the load. Ask your inner gutteral heart which version you prefer; do you prefer a completely blank screen and a browser loading spinner or do you prefer to see some skeleton structure first whilst waiting for the bulk content comes in?

  9. See this as a basis of thoughts and demonstration. Remember the very first sentence in this blog post.

tl;dr Don't run ffmpeg over HTTP(S) and use ffmpegthumbnailer

UPDATE tl;dr Download the file then run ffmpeg with -ss HH:MM:SS first. Don't bother with ffmpegthumbnailer

At work I work on something called Air Mozilla. It's a site for hosting live video broadcasts and then archiving those so they can be retrieved later.

Unlike sites like YouTube we can't take a screencap from the video because many videos are future (aka. "upcoming") videos so instead we use a little placeholder thumbnail (for example, the Rust logo).

However, once it has been recorded we want to switch from the logo to an actual screen capture from the video itself. We set up a cronjob that uses ffmpeg to extract these as JPGs and then the users can go in and select whichever picture they like the best.

This is all work in progress by the way (as of December 2014).

One problem is that we have is that the command for extracting JPGs is really slow. So slow that we can't wrap the subprocess in a Django database connection because it's so slow that the database connection is often killed.

The command to extract them looks something like this:

ffmpeg -i https://cdnexample.com/url/to/file.mp4 -r 0.0143 /tmp/screencaps-%02d.jpg

Where the number r is based on the duration and how many pictures we want out. E.g. 0.0143 = 15 * 1049 where 15 is how many JPGs we want and 1049 is a duration of 17 minutes and 29 seconds.

The script I used first was: ffmpeg1.sh

My first experiment was to try to extract one picture at a time, hoping that way, internally, ffmpeg might be able to optimize something.

The second script I used was: ffmpeg2.sh

The third alternative was to try ffmpegthumbnailer which is an intricate wrapper on ffmpeg and it has the benefit that you can produce slightly higher picture quality too.

The third script I used was: ffmpeg3.sh

Bar chart comparing the 3 different scripts
And running these three depend very much on the state of my DSL at the time.

For a video clip that is 17 minutes long and a 138Mb mp4 file.

ffmpeg1.sh   2m0.847s
ffmpeg2.sh   11m46.734s
ffmpeg3.sh   0m29.780s

Clearly it's not efficient to do one screenshot at a time.
Because with ffmpegthumbnailer you can tell it not to reduce the picture quality the total weight of the produced JPGs from ffmpeg1.sh was 784Kb and the total weight from ffmpeg3.sh was 1.5Mb.

Just to try again, I ran a similar experiment with a 35 minutes long and 890Mb mp4 file. And this time I didn't bother with ffmpeg2.sh. The results were:

ffmpeg1.sh   18m21.330s
ffmpeg3.sh   2m48.656s

So that means that using ffmpegthumbnailer is about 5 times faster than ffmpeg. Huge difference!

And now, a curveball!

The reason for doing ffmpeg -i https://... was so that we don't have to first download the whole beast and run the command on a local file. However, in light of how so much longer this takes and my disdain to have to install and depend on a new tool (ffmpegthumbnailer) across all servers. Why not download the whole file and run the ffmpeg command locally.

So I download the file and it's slow because of my, currently, terrible home DSL. Then I run and time them again but just a local file instead:

ffmpeg1.sh   0m20.426s
ffmpeg3.sh   0m0.635s

Did you see that!? That's an insane difference. Clearly doing this command over HTTP(S) is a bad idea. It'll be worth downloading it first.

UPDATE

On Stackoverflow, LordNeckBeard gave a great tip of using the -ss option before in the input file and now it's much faster. At this point. I'm no longer interested in having to bother with ffmpegthumbnailer.

Let's fork ffmpeg2.sh into two versions.

ffmpeg2.1.sh same as ffmpeg2.sh but a downloaded file instead of a remote HTTPS URL.

ffmpeg2.2.sh as ffmpeg2.1.sh except we put the -ss HH:MM:SS before the input file.

Now, let's run them again on the 138Mb file:

# the 138Mb mp4.mp4 file
ffmpeg2.1.sh   2m10.898s
ffmpeg2.2.sh   0m0.672s

187 times faster

And again, I re-ran this again against a bigger file that is 1.4Gb:

# the 1.4Gb mp4-1.44Gb.mp4 file
ffmpeg2.1.sh   10m1.143s
ffmpeg2.2.sh   0m1.428s

420 times faster