Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Web Performance. Clear filter

Update to speed comparison for Redis vs PostgreSQL storing blobs of JSON

30 September 2019 2 comments   PostgreSQL, Django, Python, Web Performance, Nginx, Redis


Last week, I blogged about "How much faster is Redis at storing a blob of JSON compared to PostgreSQL?". Judging from a lot of comments, people misinterpreted this. (By the way, Redis is persistent). It's no surprise that Redis is faster.

However, it's a fact that I have do have a lot of blobs stored and need to present them via the web API as fast as possible. It's rare that I want to do relational or batch operations on the data. But Redis isn't a slam dunk for simple retrieval because I don't know if I trust its integrity with the 3GB worth of data that I both don't want to lose and don't want to load all into RAM.

But is it entirely wrong to look at WHICH database to get the best speed?

Reviewing this corner of Song Search helped me rethink this. PostgreSQL is, in my view, a better database for storing stuff. Redis is faster for individual lookups. But you know what's even faster? Nginx

Nginx??

The way the application works is that a React web app is requesting the Amazon product data for the sake of presenting an appropriate affiliate link. This is done by the browser essentially doing:

const response = await fetch('https://songsear.ch/api/song/5246889/amazon');

Internally, in the app, what it does is that it looks this up, by ID, on the AmazonAffiliateLookup ORM model. Suppose it wasn't there in the PostgreSQL, it uses the Amazon Affiliate Product Details API, to look it up and when the results come in it stores a copy of this in PostgreSQL so we can re-use this URL without hitting rate limits on the Product Details API. Lastly, in a piece of Django view code, it carefully scrubs and repackages this result so that only the fields used by the React rendering code is shipped between the server and the browser. That "scrubbed" piece of data is actually much smaller. Partly because it limits the results to the first/best match and it deletes a bunch of things that are never needed such as ProductTypeName, Studio, TrackSequence etc. The proportion is roughly 23x. I.e. of the 3GB of JSON blobs stored in PostgreSQL only 130MB is ever transported from the server to the users.

Again, Nginx?

Nginx has a built in reverse HTTP proxy cache which is easy to set up but a bit hard to do purges on. The biggest flaw, in my view, is that it's hard to get a handle of how much RAM this it's eating up. Well, if the total possible amount of data within the server is 130MB, then that is something I'm perfectly comfortable to let Nginx handle cache in RAM.

Good HTTP performance benchmarking is hard to do but here's a teaser from my local laptop version of Nginx:

▶ hey -n 10000 -c 10 https://songsearch.local/api/song/1810960/affiliate/amazon-itunes

Summary:
  Total:    0.9882 secs
  Slowest:  0.0279 secs
  Fastest:  0.0001 secs
  Average:  0.0010 secs
  Requests/sec: 10119.8265


Response time histogram:
  0.000 [1] |
  0.003 [9752]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.006 [108]   |
  0.008 [70]    |
  0.011 [32]    |
  0.014 [8] |
  0.017 [12]    |
  0.020 [11]    |
  0.022 [1] |
  0.025 [4] |
  0.028 [1] |


Latency distribution:
  10% in 0.0003 secs
  25% in 0.0006 secs
  50% in 0.0008 secs
  75% in 0.0010 secs
  90% in 0.0013 secs
  95% in 0.0016 secs
  99% in 0.0068 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0001 secs, 0.0279 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0026 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0011 secs
  resp wait:    0.0008 secs, 0.0001 secs, 0.0206 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0013 secs

Status code distribution:
  [200] 10000 responses

10,000 requests across 10 clients at rougly 10,000 requests per second. That includes doing all the HTTP parsing, WSGI stuff, forming of a SQL or Redis query, the deserialization, the Django JSON HTTP response serialization etc. The cache TTL is controlled by simply setting a Cache-Control HTTP header with something like max-age=86400.

Now, repeated fetches for this are cached at the Nginx level and it means it doesn't even matter how slow/fast the database is. As long as it's not taking seconds, with a long Cache-Control, Nginx can hold on to this in RAM for days or until the whole server is restarted (which is rare).

Conclusion

If you the total amount of data that can and will be cached is controlled, putting it in a HTTP reverse proxy cache is probably order of magnitude faster than messing with chosing which database to use.

A React vs. Preact case study for a widget

24 July 2019 0 comments   Javascript, Web Performance, ReactJS, Web development


tl;dr; The previous (React) total JavaScript bundle size was: 36.2K Brotli compressed. The new (Preact) JavaScript bundle size was: 5.9K. I.e. 6 times smaller. Also, it appears to load faster in WebPageTest.

I have this page that is a Django server-side rendered page that has on it a form that looks something like this:

<div id="root">  
  <form action="https://songsear.ch/q/">  
    <input type="search" name="term" placeholder="Type your search here..." />
    <button>Search</button>
  </form>  
</div>

It's a simple search form. But, to make it a bit better for users, I wrote a React widget that renders, into this document.querySelector('#root'), a near-identical <form> but with autocomplete functionality that displays suggestions as you type.

Anyway, I built that React bundle using create-react-app. I use the yarn run build command that generates...

Then, in Python, a piece of post-processing code copies the files from the build/static/ directory and inserts it into the rendered HTML file. The CSS gets injected as an inline <style> tag.

It's a simple little widget. No need for any service-workers or react-router or any global state stuff. (Actually, it only has 1 single runtime dependency outside the framework) I thought, how about moving this to Preact?

In comes preact-cli

The app used a couple of React hooks but they were easy to transform into class components. Now I just needed to run:

npx preact create --yarn widget name-of-my-preact-project
cd name-of-my-preact-project
mkdir src
cp ../name-of-React-project/src/App.js src/
code src/App.js

Then, I slowly moved over the src/App.js from the create-react-app project and slowly by slowly I did the various little things that you need to do. For example, to learn to build with preact build --no-prerender --no-service-worker and how I can override the default template.

Long story short, the new built bundles look like this:

(The polyfills.9168d.js gets injected as a script tag if window.fetch is falsy)

Unfortunately, when I did the move from React to Preact I did make some small fixes. Doing the "migration" I noticed a block of code that was never used so that gives the build bundle from Preact a slight advantage. But I think it's nominal.

In conclusion: The previous total JavaScript bundle size was: 36.2K (Brotli compressed). The new JavaScript bundle size was: 5.9K (Brotli compressed). I.e. 6 times smaller. But if you worry about the total amount of JavaScript to parse and execute, the size difference uncompressed was 129K vs. 18K. I.e. 7 times smaller. I can only speculate but I do suspect you need less CPU/battery to process 18K instead of 129K if CPU/batter matters more (or closer to) than network I/O.

WebPageTest - Visual Comparison - Mobile Slow 3G

Rendering speed difference

Rendering speed is so darn hard to measure on the web because the app is so small. Plus, there's so much else going on that matters.

However, using WebPageTest I can do a visual comparison with the "Mobile - Slow 3G" preset. It'll be a somewhat decent measurement of the total time of downloading, parsing and executing. Thing is, the server-side rended HTML form has a button. But the React/Preact widget that takes over the DOM hides that submit button. So, using the screenshots that WebPageTest provides, I can deduce that the Preact widget completes 0.8 seconds faster than the React widget. (I.e. instead of 4.4s it became 3.9s)

Truth be told, I'm not sure how predictable or reproducible is. I ran that WebPageTest visual comparison more than once and the results can vary significantly. I'm not even sure which run I'm referring to here (in the screenshot) but the React widget version was never faster.

Conclusion and thoughts

Unsurprisingly, Preact is smaller because you simply get less from that framework. E.g. synthetic events. I was lucky. My app uses onChange which I could easily "migrate" to onInput and I managed to get it to work pretty easily. I'm glad the widget app was so small and that I don't depend on any React specific third-party dependencies.

But! In WebPageTest Visual Comparison it was on "Mobile - Slow 3G" which only represents a small portion of the traffic. Mobile is a huge portion of the traffic but "Slow 3G" is not. When you do a Desktop comparison the difference is roughtly 0.1s.

Also, in total, that page is made up of 3 major elements

  1. The server-side rendered HTML
  2. The progressive JavaScript widget (what this blog post is about)
  3. A piece of JavaScript initiated banner ad

That HTML controls the "First Meaningful Paint" which takes 3 seconds. And the whole shebang, including the banner ad, takes a total of about 9s. So, all this work of rewriting a React app to Preact saved me 0.8s out of the total of 9s.

Web performance is hard and complicated. Every little counts, but keep your eye on the big ticket items assuming there's something you can do about them.

At the time of writing, preact-cli uses Preact 8.2 and I'm eager to see how Preact X feels. Apparently, since April 2019, it's in beta. Looking forward to giving it a try!

WebSockets vs. XHR 2019

05 May 2019 0 comments   Javascript, Web Performance, Web development

https://sockshootout.app/


Back in 2012, I did an experiment to compare if and/or how much faster WebSockets are compared to AJAX (aka. XHR). It would be a "protocol benchmark" to see which way was faster to schlep data back and forth between a server and a browser in total. The conclusion of that experiment was that WebSockets were faster but when you take latency into account, the difference was minimal. Considering the added "complexities" of WebSockets (keeping connections, results don't come where the request was made, etc.) it's not worth it.

But, 7 years later browsers are very different. Almost all browsers that support JavaScript also support WebSockets. HTTP/2 might make things better too. And perhaps the WebSocket protocol is just better implemented in the browsers. Who knows? An experiment knows.

So I made a new experiment with similar tech. The gist of the code is best explained with some code:

// Inside App.js

loopXHR = async count => {
  const res = await fetch(`/xhr?count=${count}`);
  const data = await res.json();
  const nextCount = data.count;
  if (nextCount) {
    this.loopXHR(nextCount);
  } else {
    this.endXHR();
  }
};

Basically, pick a big number (e.g. 100) and send that integer to the server which does this:

# Inside app.py 

# from the the GET querystring "?count=123"
count = self.get_argument("count")   
data = {"count": int(count) - 1}
self.write(json.dumps(data))

So the browser keeps sending the number back to the server that decrements it and when the server returns 0 the loop ends and you look how long the whole thing took.

Try It

The code is here: https://github.com/peterbe/sockshootout2019

And the demo app is here: https://sockshootout.app (Just press "Start!", wait and press it 2 or 3 more times)

Location, location, location

What matters is the geographical distance between you and the server. The server used in this experiment is in New York, USA.

What you'll find is that the closer you are to the server (lower latency) the better WebSocket performs. Here's what mine looks like:

My result between South Carolina, USA and New York, USA
My result between South Carolina, USA and New York, USA

Now, when I run the whole experiment all on my laptop the results look very different:

Running all locally
Running all locally

I don't have a screenshot for it but a friend of mine ran this from his location in Perth, Australia. There was no difference. If any difference it was "noise".

Same Conclusion?

Yes, latency matters most. The technique, for the benefit of performance, doesn't matter much.

No matter how fancy you're trying to be, what matters is the path the bytes have to travel. Or rather, the distance the bytes have to travel. If you're far away a large majority of the total time is sending and receiving the data. Not the time it takes the browser (or the server) to process it.

However, suppose you do have all your potential clients physically near the server, it might be beneficial to use WebSockets.

Thoughts and Conclusions

My original thought was to use WebSockets instead of XHR for an autocomplete widget. At almost every keystroke, you send it to the server and as search results come in, you update the search result display. Things like that need to be fast and "snappy". But that's not where WebSockets shine. They shine in their ability to actively await results without having a loop that periodically pulls. There's nothing wrong with WebSocket and it has its brilliant use cases.

In summary, don't bother just to get a single-digit percentage performance increase if the complexity of the code and infrastructure is non-trivial. Keep building cool stuff with WebSockets but if you expect one result per action, XHR is good enough.

Bonus

The experiment app does collect everyone's results (just the timings and IP) and I hope to find the time to process this and build graph a correlating the geographical distance compared to the difference between the two techniques. Watch this space!

By the way, if you do plan on writing some WebSocket implementation code I highly recommend Sockette. It's solid and easy to use.

KeyCDN vs AWS CloudFront

29 April 2019 1 comment   Web Performance , Web development


Before I commit to KeyCDN for my little blog I wanted to check if CloudFront is better. Why? Because I already have an AWS account set up, familiar with boto3, it's what we use for work, and it's AWS so it's usually pretty good stuff. As an attractive bonus, CloudFront has 44 edge locations (KeyCDN 34).

Price-wise it's hard to compare because the AWS CloudFront pricing page is hard to read because the costs are broken up by regions. KeyCDN themselves claim KeyCDN is about 2x cheaper than CloudFront. This seems to be true if you look at cdnoverview.com's comparison too. CloudFront seems to have more extra specific costs. For example, with AWS CloudFront you have to pay to invalidate the cache whereas that's free for KeyCDN.

I also ran a little global latency test comparing the two using Hyperping using 7 global regions. The results are as follows:

KeyCDN on Hyperping.io
KeyCDN on Hyperping.io

CloudFront on Hyperping.io
CloudFront on Hyperping.io

RegionKeyCDNCloudFrontWinner
London27 ms36 msKeyCDN
San Francisco29 ms46 msKeyCDN
Frankfurt47 ms1001 msKeyCDN
New York City52 ms68 msKeyCDN
São Paulo105 ms162 msKeyCDN
Sydney162 ms131 msCloudFront
Mumbai254 ms76 msCloudFront

Take these with a pinch of salt because it's only an average for the last 1 hour. Let's agree that they both faster than your regular Nginx server in a single location.

By the way, both KeyCDN and CloudFront support Brotli compression. For CloudFront, this was added in July 2018 and if your origin can serve according to Content-Encoding you simply tell CloudFront to cache based on that header.

Although I've never tried it CloudFront does have an API for doing cache invalidation (aka. purging) and you can use boto3 to do it but I've never tried it. For KeyCDN here's how you do cache invalidation with the python-keycdn-api:

api = keycdn.Api(settings.KEYCDN_API_KEY)
call = "zones/purgeurl/{}.json".format(settings.KEYCDN_ZONE_ID)
all_urls = [
    'origin.example.com/static/foo.css',
    'origin.example.com/static/foo.cssbr',
    'origin.example.com/images/foo.jpg',
]
params = {"urls": all_urls}
response = api.delete(call, params)
print(response)

I'm not in love with that API but I know it issues the invalidation fast whereas with CloudFront I heard it takes a while to take effect.

I think I might put my whole site behind a CDN

23 April 2019 0 comments   Web Performance, Nginx, Web development


tl;dr; I'm going to put this blog behind KeyCDN and I expect a 2-4x performance boost (on Time To First Byte).

Right now, requests to my blog go straight to an Nginx server in DigitalOcean in NYC, USA. The Nginx server, 99% of the time, serves the blog posts (and static assets) as index.html files straight from disk. If the request is GET /plog/some-slug it will search for a file called /path/to/cached/files/plog/some-slug/index.html (or index.html.br or index.html.gz depending on the user agent's Accept-Encoding header). Only if the file doesn't exist on disk, it goes through to Django (via uWSGI built into Nginx). All of it is done with HTTP/2 and uses LetsEncrypt for SSL.

This has been working great but it's time to step it up. It's time to put the whole site behind a CDN. And I think I'm going to use KeyCDN for it.

In the past, it used to be best-practice that you serve your HTML document from your smart server (e.g. Django) and then, for the static assets, you put in a CDN. Like this:

<html>
  <link rel="stylesheet" href="https://myaccount123.cloudakamaifastlyflare.com/static/main.d910ef9a33.css">

...
<body>
  <img src="https://myaccount123.cloudakamaifastlyflare.com/images/hero.jpg">

...

But with HTTP/2, this becomes an anti-pattern for web performance because your client has already made an expensive HTTP/2 connection (and SSL negotiation) to https://yourcooldomain.com and now it's cheap to just download the rest. I used to do it like that too and I don't regret it. As a matter of fact, on https://songsear.ch is straight to Nginx but all its images are (lazy) loaded via songsearch-2916.kxcdn.com. But I think, when time allows, I'll put all of Song Search behind a CDN too.

Basically, it's time to put the whole site behind a CDN. With smart purging techniques and smarter CDNs respecting your dynamic content cache control headers, it's time to share the load. ...all over the world.

CDN Choices

There are many sites that want to compare CDNs. But many are affiliated or even made by one of them. So it's hard to get comparisons. For example, KeyCDN demonstrates they're the cheapest by comparing themselves with 5 others that they picked. (But mind you, that seems reasonably backed up by this comparison on cdn.reviews).

CDNPerf does a decent job with cool graphs and stuff. Incidentally, they rank my current favorite (KeyCDN) as the slowest compared to the well known giants that I compared it to.

CDNPerf graph

But mind you, the perf difference between KeyCDN and the winner (topmost in the graph as of today) is 36ms vs 47ms which are both fantastic numbers.

CDNPerf list

It's hard to compare CDNs because they're all pretty fast, and actually, they're all reasonably cheap. What really matters is the features and that's a lot harder to compare. CloudFlare often comes up as a CDN provider with stellar features that impress me. I've never actually used them but at least they mention "Fast cache purge" and "API programmability" are their key features. But they also don't mention Brotli caching which I know is a feature KeyCDN supports.

KeyCDN has been great to me in the past when I've used it to CDN host static assets. I'm familiar with their interface and they recently launched an API to do things like purge-by-tag and purge-by-URL. They're cheap, which matters because in this context it's all side-project stuff I want to put behind a CDN. They have a Python library which, although very rough around the edges, it works. And also very important; I've communicated very successfully with them through their support and they've been responsive and helpful. So I'll go with KeyCDN.

The Opportunity

Before I move my domain www.peterbe.com to become a CNAME for one of their CDN domains, I wanted to experiment a little and see how it works and what performance numbers I get for comparison. So I set up beta.peterbe.com and did some Django and Nginx wiring so it would work the same but with the difference that it goes through a CDN for everything.

Then I picked a random page and set up a Hyperping monitor from all of its available regions and let it brew for a while. Unfortunately, Hyperping doesn't let you compare two monitors side-by-side so you're going to have to use your own eyes to compare the graphs:

www means no CDN, just the origin Nginx
NOT behind a CDN (server is New York, USA)

beta means with a CDN in front
Behind a CDN

The "total Response Time" in Hyperping doesn't really make sense. They're an average across all regions it pings from. If you live in, for example, Germany; the only response time that matters to you is 1,215 ms versus 40 ms. Equally, if you live somewhere in New York, the only response time that matters to you is 20 ms versus 64 ms.

I actually ran another benchmark. I used Python like this:

t0 = time.time()
r = requests.get('https://www.peterbe.com/plog/some-slug')
t1 = time.time()
print("Took", t1 - t0)

I did this from South Carolina which means my nearest KeyCDN edge location could be Atlanta, Miami, or New York. Either way, I'm reasonably near New York (compared to the rest of the world) so it'd be a fair performance comparison for all US east coast traffic. (Insert disclaimer here). It downloads the most recent blog posts, in repeated cycles, which gives the CDN a solid chance to warm up and then it compares the median of the last 100 downloads. The output of this is as follows:

beta
    COUNT               1854 (but only using the last 100)
    HIT RATIO           100.0%
    AVERAGE (all)       63.12ms
    MEDIAN (all)        61.89ms

www
    COUNT               1856 (but only using the last 100)
    HIT RATIO           100.0%
    AVERAGE             136.22ms
    MEDIAN              135.61ms

("HIT RATIO" for the non-CDN URL means it was served entirely without Djando server rendering)

What it means is that the median, with a CDN is: 62ms and 135.6ms without. That's a 2x boost.

The crawler stats script is available here: github.com/peterbe/peterbecom-cdn-crawl and I would be thrilled if you can clone it and run it and report what numbers you get and where you're running it from.

Notes and Conclusion

Mind you, 62ms vs. 136ms might sound like a silly difference if Webpagetest says it takes 700ms until the page is interactive (on an LTE connection). And this is a tiny super-optimized page. But never forget A) we can't all live in the US east-coast area and B) if the HTML can download marginally faster it allows the browser to parse it sooner and start downloading all the other stuff much sooner. It'll make a big difference! I'm sure you've all seen graphs like this:

Cold-cache MDN page on 4G
Imagine if all those static asset downloads could have started a whole second "to the left"

Of course a CDN is faster. It's no news. But it's also a hassle and it costs money. It's 2019 and most good CDNs now support Brotli, fast purge-by-url, and HTTP/2. It's time to make the switch! It's not like cache-invalidation is hard.

UPDATE April 23 2019 (same day)

KeyCDN has a neat looking tool that is similar to Hyperping but more of a one-off kinda deal. It's called Performance Test and I wouldn't be surprised it's biased as heck because they probably run these pings from the same location'ish as where they have the edge locations. Anyway, the results are nevertheless juicy. Note the last, TTFB column numbers.

Performance Test without CDN
Performance Test without CDN

Performance Test with CDN
Performance Test with CDN

KeyCDN vs. DigitalOcean Nginx

12 April 2019 0 comments   Web Performance, Nginx, Web development


tl;dr; The global average response time of serving an image from my NYC DigitalOcean server compared to a CDN is almost 10x.

KeyCDN is a CDN service that I use for side-projects. It's great. It has about ~35 edge locations. I don't know much about how their web servers work but I can't imagine it's much different from the origin server. In principle.

The origin server is my DigitalOcean (6 vCPU, 16 GB RAM, Ubuntu 14) droplet. It's running an up-to-date CloudFlare build of Nginx and the static images are served straight from (SSD) disk with a 4 weeks TTL (max-age=2419200,public,immutable). The SSL is done with LetsEncrypt and I'm somewhat confident the Nginx is decently configured and uses HTTP/2.

So the CDN, on songsearch-2916.kxcdn.com, is basically configured to front any requests to songsear.ch. If the origin has cache-control headers, KeyCDN knows it can hold on to it for a while, but it's not a guarantee that it will for the full time specified in the cache-control. Either way; how does it compare?

The Experiment

I picked a random static asset URL. It's a 32 KB JPEG file. Its origin URL and its CDN URL are:

  1. https://songsear.ch/static/albums/2017/05/24/08/170630_300x300.jpg
  2. https://songsearch-2916.kxcdn.com/static/albums/2017/05/24/08/170630_300x300.jpg

Next, I set up a Hyperping monitor on both URLs as GET requests. For the regions (regions from where Hyperping will do pings from), I picked the following:

  1. San Francisco, USA
  2. New York, USA
  3. London, United Kindom
  4. Frankfurt, Germany
  5. Mumbai, India
  6. São Paulo, Brazil
  7. Sydney, Australia

(I wish I had selected all 12 possible regions when I started but now it's too late for lazy me)

Then, I let Hyperping GET these URLs for a while and behold, here are the numbers:

The Results

Average response time:

That's a 10x difference!

Mind you, the "average response time" is across all regions. It doesn't reflect what people get. If 90% of your visitors are from Australia, the average response times would, of course, be very different. But as an example, the origin server is in New York and there, the average response time is 26 ms vs. 105 ms which is a 5x difference.

Here are some screenshots from Hyperping:

KeyCDN results
KeyCDN

Origin server
Origin server

Conclusion

KeyCDN's server is clearly fast and worth doing. It's unsurprising that it performs better far away from New York but it's surprising how much faster it is at serving than the origin when pinged from New York (5x difference).

The site is still NOT fronted by a CDN because, apart from the images, almost all content is un-cacheable. However, I need to do more research and experimentation with putting everything behind a CDN and being meticulous with setting no-cache headers on dynamic stuff and using async tools to invalidate CDN caches when appropriate.