Filtered by Web Performance

Page 2

Reset

I think I might put my whole site behind a CDN

April 23, 2019
0 comments Web development, Nginx, Web Performance

tl;dr; I'm going to put this blog behind KeyCDN and I expect a 2-4x performance boost (on Time To First Byte).

Right now, requests to my blog go straight to an Nginx server in DigitalOcean in NYC, USA. The Nginx server, 99% of the time, serves the blog posts (and static assets) as index.html files straight from disk. If the request is GET /plog/some-slug it will search for a file called /path/to/cached/files/plog/some-slug/index.html (or index.html.br or index.html.gz depending on the user agent's Accept-Encoding header). Only if the file doesn't exist on disk, it goes through to Django (via uWSGI built into Nginx). All of it is done with HTTP/2 and uses LetsEncrypt for SSL.

This has been working great but it's time to step it up. It's time to put the whole site behind a CDN. And I think I'm going to use KeyCDN for it.

In the past, it used to be best-practice that you serve your HTML document from your smart server (e.g. Django) and then, for the static assets, you put in a CDN. Like this:


<html>
  <link rel="stylesheet" href="https://myaccount123.cloudakamaifastlyflare.com/static/main.d910ef9a33.css">

...
<body>
  <img src="https://myaccount123.cloudakamaifastlyflare.com/images/hero.jpg">

...

But with HTTP/2, this becomes an anti-pattern for web performance because your client has already made an expensive HTTP/2 connection (and SSL negotiation) to https://yourcooldomain.com and now it's cheap to just download the rest. I used to do it like that too and I don't regret it. As a matter of fact, on https://songsear.ch is straight to Nginx but all its images are (lazy) loaded via songsearch-2916.kxcdn.com. But I think, when time allows, I'll put all of Song Search behind a CDN too.

Basically, it's time to put the whole site behind a CDN. With smart purging techniques and smarter CDNs respecting your dynamic content cache control headers, it's time to share the load. ...all over the world.

CDN Choices

There are many sites that want to compare CDNs. But many are affiliated or even made by one of them. So it's hard to get comparisons. For example, KeyCDN demonstrates they're the cheapest by comparing themselves with 5 others that they picked. (But mind you, that seems reasonably backed up by this comparison on cdn.reviews).

CDNPerf does a decent job with cool graphs and stuff. Incidentally, they rank my current favorite (KeyCDN) as the slowest compared to the well known giants that I compared it to.

CDNPerf graph

But mind you, the perf difference between KeyCDN and the winner (topmost in the graph as of today) is 36ms vs 47ms which are both fantastic numbers.

CDNPerf list

It's hard to compare CDNs because they're all pretty fast, and actually, they're all reasonably cheap. What really matters is the features and that's a lot harder to compare. CloudFlare often comes up as a CDN provider with stellar features that impress me. I've never actually used them but at least they mention "Fast cache purge" and "API programmability" are their key features. But they also don't mention Brotli caching which I know is a feature KeyCDN supports.

KeyCDN has been great to me in the past when I've used it to CDN host static assets. I'm familiar with their interface and they recently launched an API to do things like purge-by-tag and purge-by-URL. They're cheap, which matters because in this context it's all side-project stuff I want to put behind a CDN. They have a Python library which, although very rough around the edges, it works. And also very important; I've communicated very successfully with them through their support and they've been responsive and helpful. So I'll go with KeyCDN.

The Opportunity

Before I move my domain www.peterbe.com to become a CNAME for one of their CDN domains, I wanted to experiment a little and see how it works and what performance numbers I get for comparison. So I set up beta.peterbe.com and did some Django and Nginx wiring so it would work the same but with the difference that it goes through a CDN for everything.

Then I picked a random page and set up a Hyperping monitor from all of its available regions and let it brew for a while. Unfortunately, Hyperping doesn't let you compare two monitors side-by-side so you're going to have to use your own eyes to compare the graphs:

www means no CDN, just the origin Nginx
NOT behind a CDN (server is New York, USA)

beta means with a CDN in front
Behind a CDN

The "total Response Time" in Hyperping doesn't really make sense. They're an average across all regions it pings from. If you live in, for example, Germany; the only response time that matters to you is 1,215 ms versus 40 ms. Equally, if you live somewhere in New York, the only response time that matters to you is 20 ms versus 64 ms.

I actually ran another benchmark. I used Python like this:


t0 = time.time()
r = requests.get('https://www.peterbe.com/plog/some-slug')
t1 = time.time()
print("Took", t1 - t0)

I did this from South Carolina which means my nearest KeyCDN edge location could be Atlanta, Miami, or New York. Either way, I'm reasonably near New York (compared to the rest of the world) so it'd be a fair performance comparison for all US east coast traffic. (Insert disclaimer here). It downloads the most recent blog posts, in repeated cycles, which gives the CDN a solid chance to warm up and then it compares the median of the last 100 downloads. The output of this is as follows:

beta
    COUNT               1854 (but only using the last 100)
    HIT RATIO           100.0%
    AVERAGE (all)       63.12ms
    MEDIAN (all)        61.89ms

www
    COUNT               1856 (but only using the last 100)
    HIT RATIO           100.0%
    AVERAGE             136.22ms
    MEDIAN              135.61ms

("HIT RATIO" for the non-CDN URL means it was served entirely without Djando server rendering)

What it means is that the median, with a CDN is: 62ms and 135.6ms without. That's a 2x boost.

The crawler stats script is available here: github.com/peterbe/peterbecom-cdn-crawl and I would be thrilled if you can clone it and run it and report what numbers you get and where you're running it from.

Notes and Conclusion

Mind you, 62ms vs. 136ms might sound like a silly difference if Webpagetest says it takes 700ms until the page is interactive (on an LTE connection). And this is a tiny super-optimized page. But never forget A) we can't all live in the US east-coast area and B) if the HTML can download marginally faster it allows the browser to parse it sooner and start downloading all the other stuff much sooner. It'll make a big difference! I'm sure you've all seen graphs like this:

Cold-cache MDN page on 4G
Imagine if all those static asset downloads could have started a whole second "to the left"

Of course a CDN is faster. It's no news. But it's also a hassle and it costs money. It's 2019 and most good CDNs now support Brotli, fast purge-by-url, and HTTP/2. It's time to make the switch! It's not like cache-invalidation is hard.

UPDATE April 23 2019 (same day)

KeyCDN has a neat looking tool that is similar to Hyperping but more of a one-off kinda deal. It's called Performance Test and I wouldn't be surprised it's biased as heck because they probably run these pings from the same location'ish as where they have the edge locations. Anyway, the results are nevertheless juicy. Note the last, TTFB column numbers.

Performance Test without CDN
Performance Test without CDN

Performance Test with CDN
Performance Test with CDN

KeyCDN vs. DigitalOcean Nginx

April 12, 2019
0 comments Web development, Nginx, Web Performance

tl;dr; The global average response time of serving an image from my NYC DigitalOcean server compared to a CDN is almost 10x.

KeyCDN is a CDN service that I use for side-projects. It's great. It has about ~35 edge locations. I don't know much about how their web servers work but I can't imagine it's much different from the origin server. In principle.

The origin server is my DigitalOcean (6 vCPU, 16 GB RAM, Ubuntu 14) droplet. It's running an up-to-date CloudFlare build of Nginx and the static images are served straight from (SSD) disk with a 4 weeks TTL (max-age=2419200,public,immutable). The SSL is done with LetsEncrypt and I'm somewhat confident the Nginx is decently configured and uses HTTP/2.

So the CDN, on songsearch-2916.kxcdn.com, is basically configured to front any requests to songsear.ch. If the origin has cache-control headers, KeyCDN knows it can hold on to it for a while, but it's not a guarantee that it will for the full time specified in the cache-control. Either way; how does it compare?

The Experiment

I picked a random static asset URL. It's a 32 KB JPEG file. Its origin URL and its CDN URL are:

  1. https://songsear.ch/static/albums/2017/05/24/08/170630_300x300.jpg
  2. https://songsearch-2916.kxcdn.com/static/albums/2017/05/24/08/170630_300x300.jpg

Next, I set up a Hyperping monitor on both URLs as GET requests. For the regions (regions from where Hyperping will do pings from), I picked the following:

  1. San Francisco, USA
  2. New York, USA
  3. London, United Kindom
  4. Frankfurt, Germany
  5. Mumbai, India
  6. São Paulo, Brazil
  7. Sydney, Australia

(I wish I had selected all 12 possible regions when I started but now it's too late for lazy me)

Then, I let Hyperping GET these URLs for a while and behold, here are the numbers:

The Results

Average response time:

  • The CDN: 79 ms
  • The origin: 714 ms

That's a 10x difference!

Mind you, the "average response time" is across all regions. It doesn't reflect what people get. If 90% of your visitors are from Australia, the average response times would, of course, be very different. But as an example, the origin server is in New York and there, the average response time is 26 ms vs. 105 ms which is a 5x difference.

Here are some screenshots from Hyperping:

KeyCDN results
KeyCDN

Origin server
Origin server

Conclusion

KeyCDN's server is clearly fast and worth doing. It's unsurprising that it performs better far away from New York but it's surprising how much faster it is at serving than the origin when pinged from New York (5x difference).

The site is still NOT fronted by a CDN because, apart from the images, almost all content is un-cacheable. However, I need to do more research and experimentation with putting everything behind a CDN and being meticulous with setting no-cache headers on dynamic stuff and using async tools to invalidate CDN caches when appropriate.

Optimize inlined SVG on developer.mozilla.org

April 4, 2019
0 comments Web development, Web Performance

tl;dr We could make the initial HTML document 40% smaller if moved from inline SVG to external, optimized, .svg static assets. But there are lots of caveats unless the SVG can be used as an image.

One of the many goals of MDN Web Docs this year is to make it faster. That makes users happier and as a side-effect, it makes Google happier. And hopefully, being faster will mean Google ranks us higher.

I'm still new to the MDN code base and there are many things we can do. One thing I noticed is that the site uses inline SVG. E.g.


<a href="/en-US/docs/Learn">References &amp; Guides
    <svg class="icon icon-caret-down" xmlns="http://www.w3.org/2000/svg" width="16" height="28" viewBox="0 0 16 28">
      <path d="M16 11a.99.99 0 0 1-.297.703l-7 7C8.516 18.89 8.265 19 8 19s-.516-.109-.703-.297l-7-7A.996.996 0 0 1 0 11c0-.547.453-1 1-1h14c.547 0 1 .453 1 1z"/>
    </svg>
</a>

The site uses HTTP/2 so the argument of reducing the number of requests is not valid. Well, with caveats. Browser support for HTTP/2 is getting really good. Definitely good enough to make it worth betting on.
It used to be that there's a trade-off for making static assets external: you can potentially avoid downloads, at all, by browser caching and the initial HTML document becomes smaller. But all, at the cost of more requests.

There are other, more subtle, differences with SVG. For example, the content of the SVG might depend on dynamic data. There might be others that I'm not aware and I'm quick to admit that I don't know much about use and stuff but this article might be full of those details.

The Experiment

I wrote a script that opens https://developer.mozilla.org/en-US/ and extracts every <svg> tag and puts them on disk. E.g. svg.icon.icon-smile_fbf6292.svg. They have a hash checksum on the content in case two <svg>s are different (but with the same classList). Then it does the following:

  1. Run svgo on each .svg to create a .min.svg.
  2. Run zopfli on each .min.svg to create a .min.svg.gz
  3. Run brotli on each .min.svg to create a .min.svg.br
  4. Sum all inline ones total size, sum the size of all .min.svg, sum the size of all .min.svg.gz, sum the size of all .min.svg.br.

Results

Technique Number Total Bytes
Inline 27 22,142 (21.6KB)
Optimized with svgo 15 14,566 (14.2KB)
Zopfli compressed 15 6,236 (6.1KB)
Brotli compressed 15 5,789 (5.7KB)

Conclusions and Caveats

For every single MDN page, we stand to make the initial HTML document 22KB smaller. Every time. Most web developers I know often Google for something and end up on MDN and do so frequently enough that there's a good chance for a warm browser cache.

But! This 22KB is uncompressed. Since the HTML documents are served gzipped, at a ratio of about 1:4, the total inline SVGs is roughly 5.6KB. At the time of writing the MDN home page is 58,496 bytes decompressed and 14,570 bytes gzipped. So that means that we stand to potentially strip away 40% of the document size!

Second but! There are some non-trivial differences in usage of SVG. You can't simply replace...


<a href="/en-US/docs/Learn">References &amp; Guides
  <svg class="icon icon-caret-down" xmlns="http://www.w3.org/2000/svg" width="16" height="28" viewBox="0 0 16 28">
    <path d="M16 11a.99.99 0 0 1-.297.703l-7 7C8.516 18.89 8.265 19 8 19s-.516-.109-.703-.297l-7-7A.996.996 0 0 1 0 11c0-.547.453-1 1-1h14c.547 0 1 .453 1 1z"/>
  </svg>
</a>

...with...


<a href="/en-US/docs/Learn">References &amp; Guides
  <img src="/static/icon-caret.914d0e4.min.svg">
</a>

(Compare this and this)

You can, instead, use <svg><use xlink:href="/static/icon-caret.914d0e4.min.svg".></svg> but it comes with its own host of challenges and problems (styling and IE support) and you still need this <svg> tag to do the wrapping in the first place which adds bytes.

It's not always worth compressing tiny static assets. And it might be worth experimenting with what the CPU cost is for a low-performance mobile device to decompress the asset versus just eating the extra network download cost of leaving it uncompressed.

HTTP/2 is great in that it allows the browser to download external assets earlier, on the same open connection, as the initial HTML document. But it's not without risks and costs that need to be carefully considered.

Optimize DOM selector lookups by pre-warming by selectors' parents

February 11, 2019
0 comments Web development, Node, Web Performance, JavaScript

tl;dr; minimalcss 0.8.2 introduces a 20% post-processing optimization by lumping many CSS selectors to their parent CSS selectors as a pre-emptive cache.

In minimalcss the general core of it is that it downloads a DOM tree, as HTML, parses it and parses all the CSS stylesheets associated. These might be from <link ref="stylesheet"> or <style> tags.
Once the CSS stylesheets are turned into an AST it loops over each and every CSS selector and asks a simple question; "Does this CSS selector exist in the DOM?". The equivalent is to open your browser's Web Console and type:

>>> document.querySelectorAll('div.foo span.bar b').length > 0
false

For each of these lookups (which is done with cheerio by the way), minimalcss reduces the CSS, as an AST, and eventually spits the AST back out as a CSS string. The only problem is; it's slow. In the case of view-source:https://semantic-ui.com/ in the CSS it uses, there are 6,784 of them. What to do?

First of all, there isn't a lot you can do. This is the work that needs to be done. But one thing you can do is be smart about which selectors you look at and use a "decision cache" to pre-emptively draw conclusions. So, if this is what you have to check:

  1. #example .alternate.stripe
  2. #example .theming.stripe
  3. #example .solid .column p b
  4. #example .solid .column p

As you process the first one you extract that the parent CSS selector is #example and if that doesn't exist in the DOM, you can efficiently draw conclusion about all preceeding selectors that all start with #example .... Granted, if they call exist you will pay a penalty of doing an extra lookup. But that's the trade-off that this optimization is worth.

Check out the comments where I tested a bloated page that uses Semantic-UI before and after. Instead of doing 3,285 of these document.querySelector(selector) calls, it's now able too come to the exact same conclusion with just 1,563 lookups.

Sadly, the majority of the time spent processing lies in network I/O and other overheads but this work did reduce something that used to take 6.3s (median) too 5.1s (median).

How much HTML is too much for optimal web performance

October 17, 2018
4 comments Web development, Web Performance

Right off the bat; I don't know. All I know is that it's complicated.

I have this page which is just a blog post page. It's entirely rendered on the server, comments and all. At the time of writing, the total size of the HTML document is 119KB (30KB gzipped). If you remove all the comments, which makes up the bulk of the HTML it reduces down to 31KB (7KB gzipped). Fair enough. That's 23KB less to download. But, does it matter (much)?

Downloading

First of all, I noticed this:

Waterfall
WebPagetest with iPhone 6, 4G on the same US coast as the datacenter

That's a WebPagetest using iPhone 6 on 4G and, lemme emphasize this, it took 126ms to download the HTML document. If you subtract "DNS Lookup" (283ms), "Initial Connection" (1013ms), and "SSL Negotiation" (733ms) it took 684ms serve the file, download it, and parse it. Remember, this is all on 4G. Pretty fast. In conclusion, it's probably not too much HTML in that page to download. This downloadingness is fraction of the total "web performance cost". Let's dig deeper.

Note! With WebPagetest all those numbers like DNS Lookup, Initial Connection and SSL Negotiation are wildly unpredictable between tests. Chances are, the numbers are very different the next time you run a test using the exact same input. Who knows. Deep internet plumbings beyond the control of WebPagetest.

Note! I ran it one more time with the exact same parameters and this time it was 535ms (instead of 684ms) to serve, download, and parse.

Parsing & layout

Parsing is hard to measure but here's what I found when using the Google Chrome dev tools:

Google Chrome Performance devtools
Google Chrome Performance devtools

It says it took...

  • parsed HTML - 94ms
  • recalculate style - 43ms
  • layout - 386ms

That's half a second just loading and rendering. Definitely sucks. But note, this test uses 4x CPU slowdown and 3G simulation. So perhaps it's not so bad.

Let's try again with a smaller HTML document

So I butchered up a hybrid version that has almost the same HTML except all but 1 of those 166'ish div.comment DOM nodes. It's now 31KB (7KB gzipped´) to download instead of 119KB (30KB gzipped).

Same WebPagetest parameters but now this this smaller HTML document:

WebPagetest with a much smaller HTML footprint
WebPagetest with a much smaller HTML footprint

Now it says it only took 39ms to download and 232ms (it was 684ms before) to serve the file, download it, and parse it. Interesting!

Note! I ran it one more time with exact same parameters and this time it was 237ms (instead of 232ms) to serve, download, and parse.

Clearly it's working. The smaller the HTML document the faster it performs. No surprise. But stick around for the conclusion.

Parsing & layout with a smaller HTML document

Check this out:

Google Chrome Performance devtools (smaller HTML document)
Google Chrome Performance devtools (smaller HTML document)

It says it took...

  • parsed HTML - 91ms
  • recalculate style - 6ms
  • layout - 29ms

Mind you, all of these numbers are at the mercy of what my laptop is up to at the moment as it can affect Chrome's rendering if it has, at that moment, less (or more) access to CPU and memory caching.

Either way, it parses + layout in 126ms instead of 523ms for the larger HTML document.

Side-by-side

The best test to see how much faster the smaller HTML document variant is, is to compare them side-by-side. It looks like this:

Side-by-side
Visual comparison on WebPagetest (using 4G)

Two major takeaways from this:

  1. The smaller HTML version starts rendering half a second before the original one.
  2. The complete time favors the smaller HTML version by 2.5 seconds but that's possibly influenced by the ads that load more than any slow layout rendering.
  3. This is using 4G which isn't unheard of but definitely much less common than better speeds.

Here they are compared on "Desktop" which appears to give the smaller HTML version a 0.2 second advantage:

Visual comparison on WebPagetest (using "Desktop")
Visual comparison on WebPagetest (using "Desktop")

And here are the Lighthouse reports side-by-side:

Lighthouses
Side-by-side using Lighthouse

Discussion

The above concludes rather unsurprisingly that a smaller HTML footprint downloads, parses and lays out quicker.

The killer reason that page is so large, with all those comments rendered in the original HTML is simple: SEO. Google loves comments because comments indicate that the page is thriving and a place where people go, spend time, and stick around. I've experimented with this in the past and found that if I make the HTML document smaller (or loading the rest after document load) the SEO takes a big hit. Yes, Google's bot renders with JavaScript but not always and even if it does, I assume it's smart enough to appreciate that content that is loaded (async or post-DOMContentLoad) is less important and thus not what the page is about.

Regarding SEO, we know that Google loves fast sites. Especially for mobile. But content is still king my gut tells me. Left as an exercise to the reader to take a stand on this.

Another problem with lazy loading the comments (or whatever else might be applicable to your site) is that it might cause "flicker". I put that word in quote because sometimes flicker is literally visual flicker and sometimes it's moments of browser sluggishness. The XHR request and the subsequent post-rendering will cause a bunch of work that strains the browser and might make it unpleasant when your eyes and brain is in the midst of committing to consuming it.

Basically, there are significant real benefits of not trying to squeeze every little millisecond out by making the HTML smaller upfront. Remember the fact that the "smaller HTML" version in this test is drastic. I butchered it from 119KB to 31KB which might be so drastic that it's not necessarily applicable at all. In other words, had I reduced the HTML size by just 20% it might not even register on the performance graph but could be significant in terms SEO keywords.

Conclusion

The majority of the time spend making a web page useful to a user is a sum of all sorts of metrics. The size of the HTML document does matter but remember that it's just one of multiple aspects to watch out for.

In conclusion, it's complicated and depends on your needs and context. I hope you can benefit a little bit from the metrics above.

An awesome snippet to web performance test a page programmatically

October 1, 2018
0 comments Web development, JavaScript, Web Performance

I found this in an issue discussing measuring page performance with puppeteer and it's pure gold. Especially because it's so accessible and easy to use.

Here's the code:


const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://www.peterbe.com/');

  console.log('\n==== performance.getEntries() ====\n');
  console.log(
    await page.evaluate(() =>
      JSON.stringify(performance.getEntries(), null, '  ')
    )
  );

  console.log('\n==== performance.toJSON() ====\n');
  console.log(
    await page.evaluate(() => JSON.stringify(performance.toJSON(), null, '  '))
  );

  console.log('\n==== page.metrics() ====\n');
  const perf = await page.metrics();
  console.log(JSON.stringify(perf, null, '  '));

  browser.close();
}

run();

Network waterfall Google Chrome

When you run it you get this output: https://gist.github.com/peterbe/afb09bf9277e5fa9242f8d270c687640
To run it you need to have a decently up-to-date version of puppeteer installed.

I don't claim (far from it actually!) to understand all the metrics points in there but I believe this is basically what the Network panel in the Google Chrome Dev tools is built upon. But some details and facts are easy to figure out and use in your analysis. For example, the fact that the getEntries() lists all the resources that had to be downloaded in the order they were downloaded. Also, at the end of getEntries() you get the first-paint which is often a useful metric.

Anyway, give it a spin. Wrap this up in a platform and see if you can build something really simple and really tailored to your web projects web performance testing.