Peterbe.com

A blog and website by Peter Bengtsson

Optimize inlined SVG on developer.mozilla.org

04 April 2019 0 comments   Web Performance, Web development


tl;dr We could make the initial HTML document 40% smaller if moved from inline SVG to external, optimized, .svg static assets. But there are lots of caveats unless the SVG can be used as an image.

One of the many goals of MDN Web Docs this year is to make it faster. That makes users happier and as a side-effect, it makes Google happier. And hopefully, being faster will mean Google ranks us higher.

I'm still new to the MDN code base and there are many things we can do. One thing I noticed is that the site uses inline SVG. E.g.

<a href="/en-US/docs/Learn">References &amp; Guides
    <svg class="icon icon-caret-down" xmlns="http://www.w3.org/2000/svg" width="16" height="28" viewBox="0 0 16 28">
      <path d="M16 11a.99.99 0 0 1-.297.703l-7 7C8.516 18.89 8.265 19 8 19s-.516-.109-.703-.297l-7-7A.996.996 0 0 1 0 11c0-.547.453-1 1-1h14c.547 0 1 .453 1 1z"/>
    </svg>
</a>

The site uses HTTP/2 so the argument of reducing the number of requests is not valid. Well, with caveats. Browser support for HTTP/2 is getting really good. Definitely good enough to make it worth betting on.
It used to be that there's a trade-off for making static assets external: you can potentially avoid downloads, at all, by browser caching and the initial HTML document becomes smaller. But all, at the cost of more requests.

There are other, more subtle, differences with SVG. For example, the content of the SVG might depend on dynamic data. There might be others that I'm not aware and I'm quick to admit that I don't know much about use and stuff but this article might be full of those details.

The Experiment

I wrote a script that opens https://developer.mozilla.org/en-US/ and extracts every <svg> tag and puts them on disk. E.g. svg.icon.icon-smile_fbf6292.svg. They have a hash checksum on the content in case two <svg>s are different (but with the same classList). Then it does the following:

  1. Run svgo on each .svg to create a .min.svg.
  2. Run zopfli on each .min.svg to create a .min.svg.gz
  3. Run brotli on each .min.svg to create a .min.svg.br
  4. Sum all inline ones total size, sum the size of all .min.svg, sum the size of all .min.svg.gz, sum the size of all .min.svg.br.

Results

Technique Number Total Bytes
Inline 27 22,142 (21.6KB)
Optimized with svgo 15 14,566 (14.2KB)
Zopfli compressed 15 6,236 (6.1KB)
Brotli compressed 15 5,789 (5.7KB)

Conclusions and Caveats

For every single MDN page, we stand to make the initial HTML document 22KB smaller. Every time. Most web developers I know often Google for something and end up on MDN and do so frequently enough that there's a good chance for a warm browser cache.

But! This 22KB is uncompressed. Since the HTML documents are served gzipped, at a ratio of about 1:4, the total inline SVGs is roughly 5.6KB. At the time of writing the MDN home page is 58,496 bytes decompressed and 14,570 bytes gzipped. So that means that we stand to potentially strip away 40% of the document size!

Second but! There are some non-trivial differences in usage of SVG. You can't simply replace...

<a href="/en-US/docs/Learn">References &amp; Guides
  <svg class="icon icon-caret-down" xmlns="http://www.w3.org/2000/svg" width="16" height="28" viewBox="0 0 16 28">
    <path d="M16 11a.99.99 0 0 1-.297.703l-7 7C8.516 18.89 8.265 19 8 19s-.516-.109-.703-.297l-7-7A.996.996 0 0 1 0 11c0-.547.453-1 1-1h14c.547 0 1 .453 1 1z"/>
  </svg>
</a>

...with...

<a href="/en-US/docs/Learn">References &amp; Guides
  <img src="/static/icon-caret.914d0e4.min.svg">
</a>

(Compare this and this)

You can, instead, use <svg><use xlink:href="/static/icon-caret.914d0e4.min.svg".></svg> but it comes with its own host of challenges and problems (styling and IE support) and you still need this <svg> tag to do the wrapping in the first place which adds bytes.

It's not always worth compressing tiny static assets. And it might be worth experimenting with what the CPU cost is for a low-performance mobile device to decompress the asset versus just eating the extra network download cost of leaving it uncompressed.

HTTP/2 is great in that it allows the browser to download external assets earlier, on the same open connection, as the initial HTML document. But it's not without risks and costs that need to be carefully considered.

Best way to count distinct indexed things in PostgreSQL

21 March 2019 1 comment   PostgreSQL, Django


tl;dr; SELECT COUNT(*) FROM (SELECT DISTINCT my_not_unique_indexed_column FROM my_table) t;

I have a table that looks like this:

songsearch=# \d main_songtexthash
            Table "public.main_songtexthash"
  Column   |           Type           | Collation | Nullable |
-----------+--------------------------+-----------+----------+
 id        | integer                  |           | not null |
 text_hash | character varying(32)    |           | not null |
 created   | timestamp with time zone |           | not null |
 modified  | timestamp with time zone |           | not null |
 song_id   | integer                  |           | not null |
Indexes:
    "main_songtexthash_pkey" PRIMARY KEY, btree (id)
    "main_songtexthash_song_id_key" UNIQUE CONSTRAINT, btree (song_id)
    "main_songtexthash_text_hash_c2771f1f" btree (text_hash)
    "main_songtexthash_text_hash_c2771f1f_like" btree (text_hash varchar_pattern_ops)
Foreign-key constraints:
    ...snip...

And the data looks something like this:

songsearch=# select text_hash, song_id from main_songtexthash limit 10;
            text_hash             | song_id
----------------------------------+---------
 6f98e1945e64353bead9d6ab47a7f176 | 2565031
 0c6662363aa4a340fea5efa24c98db76 |  486091
 a25af539b183cbc338409c7acecc6828 |     212
 5aaf561b38c251e7d863aae61fe1363f | 2141077
 6a221df60f7cbb8a4e604f87c9e3aec0 |  245186
 d2a0b5b3b33cdf5e03a75cfbf4963a6f | 1453382
 95c395dd78679120269518b19187ca80 |  981402
 8ab19b32b3be2d592aa69e4417b732cd |  616848
 8ab19b32b3be2d592aa69e4417b732cd |  243393
 01568f1f57aeb7a97e2544978fc93b4c |     333
(10 rows)

If you look carefully, you'll notice that every song_id has a different text_hash except two of them.
Song IDs 616848 and 243393 both have the same text_hash of value 8ab19b32b3be2d592aa69e4417b732cd.

Also, if you imagine this table only has 10 rows, you could quickly and easily conclude that there are 10 different song_id but 9 different distinct text_hash. However, how do you do this counting if the tables are large??

The Wrong Way

songsearch=# select count(distinct text_hash) from main_songtexthash;
  count
---------
 1825983
(1 row)

And the explanation and cost analysis is:

songsearch=# explain analyze select count(distinct text_hash) from main_songtexthash;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=44942.09..44942.10 rows=1 width=8) (actual time=40029.225..40029.226 rows=1 loops=1)
   ->  Seq Scan on main_songtexthash  (cost=0.00..40233.87 rows=1883287 width=33) (actual time=0.029..193.653 rows=1879521 loops=1)
 Planning Time: 0.059 ms
 Execution Time: 40029.250 ms
(4 rows)

Oh noes! A Sec Scan! Run!

The Right Way

Better explained in this blog post but basically, cutting to the chase, here's how you count on an indexed field:

songsearch=# select count(*) from (select distinct text_hash from main_songtexthash) t;
  count
---------
 1825983
(1 row)

And the explanation and cost analysis is:

songsearch=# explain analyze select count(*) from (select distinct text_hash from main_songtexthash) t;
                                                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=193871.20..193871.21 rows=1 width=8) (actual time=4894.555..4894.556 rows=1 loops=1)
   ->  Unique  (cost=0.55..172861.54 rows=1680773 width=33) (actual time=0.075..4704.741 rows=1825983 loops=1)
         ->  Index Only Scan using main_songtexthash_text_hash_c2771f1f on main_songtexthash  (cost=0.55..168153.32 rows=1883287 width=33) (actual time=0.074..4132.822 rows=1879521 loops=1)
               Heap Fetches: 1879521
 Planning Time: 0.082 ms
 Execution Time: 4894.581 ms
(6 rows)

Same exact result but ~5s instead of ~40s. I'll take that, thank you very much.

The Django Way

As a bonus: Django is smart. Here's how they do it:

>>> SongTextHash.objects.values('text_hash').distinct().count()
1825983

And, the SQL it generates to make that count looks very familiar:

SELECT COUNT(*) FROM (SELECT DISTINCT "main_songtexthash"."text_hash" AS Col1 FROM "main_songtexthash") subquery

Conclusion

Format numbers with numberWithCommas() or Number.toLocaleString()

05 March 2019 1 comment   Javascript, Web development

https://codepen.io/peterbe/pen/xBRGoN?editors=1011#0


In a highly unscientific survey of exactly 2 French native friends, I asked them what they think about formatting large numbers the "French way" versus doing it the "English way". In particular, if the rest of content/app is English, would it be jarring if the formatting of numbers was French. Both Adrian and Mathieu said they prefer displaying the number the French way even if the app/content is French.

If you have an English browser opening https://codepen.io/peterbe/pen/xBRGoN means it's going to display the two numbers the same way. But if you have a French locale in your browser it'll look like this:

French

Number.toLocaleString() is now universally supported so that's no longer a worry.

For years I was using a function like this:

/* http://stackoverflow.com/a/2901298 */
function numberWithCommas(x) {
  var parts = x.toString().split('.');
  parts[0] = parts[0].replace(/\B(?=(\d{3})+(?!\d))/g, ',');
  return parts.join('.');
}

numberWithCommas(100000000);
// "100,000,000"

jsperf
and it works and is reasonably fast too but it's so tempting to not use and instead stand on the shoulder-of-browsers to supply this functionality instead. But consider the alternative:

(100000000).toLocaleString();
// "100,000,000"

The thing that's always worried me is; What will someone's reaction be if the texts are in one locale and the formatting of numbers (and dates, etc!) are in another locale?

All 2 of the people I asked say they don't mind the mixing but admit that it's weird but that ultimately they prefer "their" format.

If you're non-English browser, what do you prefer? If you're a web usability expert, please, too, drop a comment to share what you think.

Django ORM optimization story on selecting the least possible

22 February 2019 16 comments   PostgreSQL, Python, Django, Web development


This an optimization story that should not surprise anyone using the Django ORM. But I thought I'd share because I have numbers now! The origin of this came from a real requirement. For a given parent model, I'd like to extract the value of the name column of all its child models, and the turn all these name strings into 1 MD5 checksum string.

Variants

The first attempted looked like this:

artist = Artist.objects.get(name="Bad Religion")
names = []
for song in Song.objects.filter(artist=artist):
    names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

The SQL used to generate this is as follows:

SELECT "main_song"."id", "main_song"."artist_id", "main_song"."name", 
"main_song"."text", "main_song"."language", "main_song"."key_phrases", 
"main_song"."popularity", "main_song"."text_length", "main_song"."metadata", 
"main_song"."created", "main_song"."modified", 
"main_song"."has_lastfm_listeners", "main_song"."has_spotify_popularity" 
FROM "main_song" WHERE "main_song"."artist_id" = 22729;

Clearly, I don't need anything but just the name column, version 2:

artist = Artist.objects.get(name="Bad Religion")
names = []
for song in Song.objects.filter(artist=artist).only("name"):
    names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

Now, the SQL used is:

SELECT "main_song"."id", "main_song"."name" 
FROM "main_song" WHERE "main_song"."artist_id" = 22729;

But still, since I don't really need instances of model class Song I can use the .values() method which gives back a list of dictionaries. This is version 3:

names = []
for song in Song.objects.filter(artist=a).values("name"):
    names.append(song["name"])
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

This time Django figures it doesn't even need the primary key value so it looks like this:

SELECT "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729;

Last but not least; there is an even faster one. values_list(). This time it doesn't even bother to map the column name to the value in a dictionary. And since I only need 1 column's value, I can set flat=True. Version 4 looks like this:

names = []
for name in Song.objects.filter(artist=a).values_list("name", flat=True):
    names.append(name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()

Same SQL gets used this time as in version 3.

The benchmark

Hopefully this little benchmark script speaks for itself:

from songsearch.main.models import *

import hashlib


def f1(a):
    names = []
    for song in Song.objects.filter(artist=a):
        names.append(song.name)
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


def f2(a):
    names = []
    for song in Song.objects.filter(artist=a).only("name"):
        names.append(song.name)
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


def f3(a):
    names = []
    for song in Song.objects.filter(artist=a).values("name"):
        names.append(song["name"])
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


def f4(a):
    names = []
    for name in Song.objects.filter(artist=a).values_list("name", flat=True):
        names.append(name)
    return hashlib.md5("".join(names).encode("utf-8")).hexdigest()


artist = Artist.objects.get(name="Bad Religion")
print(Song.objects.filter(artist=artist).count())

print(f1(artist) == f2(artist))
print(f2(artist) == f3(artist))
print(f3(artist) == f4(artist))

# Reporting
import time
import random
import statistics

functions = f1, f2, f3, f4
times = {f.__name__: [] for f in functions}

for i in range(500):
    func = random.choice(functions)
    t0 = time.time()
    func(artist)
    t1 = time.time()
    times[func.__name__].append((t1 - t0) * 1000)

for name in sorted(times):
    numbers = times[name]
    print("FUNCTION:", name, "Used", len(numbers), "times")
    print("\tBEST", min(numbers))
    print("\tMEDIAN", statistics.median(numbers))
    print("\tMEAN  ", statistics.mean(numbers))
    print("\tSTDEV ", statistics.stdev(numbers))

I ran this on my PostgreSQL 11.1 on my MacBook Pro with Django 2.1.7. So the database is on localhost.

The results

276
True
True
True
FUNCTION: f1 Used 135 times
    BEST 6.309986114501953
    MEDIAN 7.531881332397461
    MEAN   7.834429211086697
    STDEV  2.03779968066591
FUNCTION: f2 Used 135 times
    BEST 3.039121627807617
    MEDIAN 3.7298202514648438
    MEAN   4.012803678159361
    STDEV  1.8498943539073027
FUNCTION: f3 Used 110 times
    BEST 0.9920597076416016
    MEDIAN 1.4405250549316406
    MEAN   1.5053835782137783
    STDEV  0.3523240470133114
FUNCTION: f4 Used 120 times
    BEST 0.9369850158691406
    MEDIAN 1.3251304626464844
    MEAN   1.4017681280771892
    STDEV  0.3391019435930447

Bar chart

Discussion

I guess the hashlib.md5("".join(names).encode("utf-8")).hexdigest() stuff is a bit "off-topic" but I checked and it's roughly 300 times faster than building up the names list.

It's clearly better to ask less of Python and PostgreSQL to get a better total time. No surprise there. What was interesting was the proportion of these differences. Memorize that and you'll be better equipped if it's worth the hassle of not using the Django ORM in the most basic form.

Also, do take note that this is only relevant in when dealing with many records. The slowest variant (f1) takes, on average, 7 milliseconds.

Summarizing the difference with percentages compared to the fastest variant:

UPDATE Feb 25 2019

James suggested, although a bit "missing the point", that it could be even faster if all the aggregation is pushed into the PostgreSQL server and then the only thing that needs to transfer from PostgreSQL to Python is the final result.

By the way, name column in this particular benchmark, when concatenated into one big string, is ~4KB. So, with variant f5 it only needs to transfer 32 bytes which will/would make a bigger difference if the network latency is higher.

Here's the whole script: https://gist.github.com/peterbe/b2b7ed95d422ab25a65639cb8412e75e

And the results:

276
True
True
True
False
False
FUNCTION: f1 Used 92 times
    BEST 5.928993225097656
    MEDIAN 7.311463356018066
    MEAN   7.594626882801885
    STDEV  2.2027017044658423
FUNCTION: f2 Used 75 times
    BEST 2.878904342651367
    MEDIAN 3.3979415893554688
    MEAN   3.4774907430013022
    STDEV  0.5120246550765524
FUNCTION: f3 Used 88 times
    BEST 0.9310245513916016
    MEDIAN 1.1944770812988281
    MEAN   1.3105544176968662
    STDEV  0.35922655625999383
FUNCTION: f4 Used 71 times
    BEST 0.7879734039306641
    MEDIAN 1.1661052703857422
    MEAN   1.2262606284987758
    STDEV  0.3561764250427344
FUNCTION: f5 Used 90 times
    BEST 0.7929801940917969
    MEDIAN 1.0334253311157227
    MEAN   1.1836051940917969
    STDEV  0.4001442703048186
FUNCTION: f6 Used 84 times
    BEST 0.80108642578125
    MEDIAN 1.1119842529296875
    MEAN   1.2281338373819988
    STDEV  0.37146893005516973

Result: f5 is takes 0.793ms and (the previous "winner") f4 takes 0.788ms.

I'm not entirely sure why f5 isn't faster but I suspect it's because the dataset is too small for it all to matter.

Compare:

songsearch=# explain analyze SELECT "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using main_song_ca949605 on main_song  (cost=0.43..229.33 rows=56 width=16) (actual time=0.014..0.208 rows=276 loops=1)
   Index Cond: (artist_id = 22729)
 Planning Time: 0.113 ms
 Execution Time: 0.242 ms
(4 rows)

with...

songsearch=# explain analyze SELECT md5(STRING_AGG("main_song"."name", '')) AS "names_hash" FROM "main_song" WHERE "main_song"."artist_id" = 22729;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=229.47..229.48 rows=1 width=32) (actual time=0.278..0.278 rows=1 loops=1)
   ->  Index Scan using main_song_ca949605 on main_song  (cost=0.43..229.33 rows=56 width=16) (actual time=0.019..0.204 rows=276 loops=1)
         Index Cond: (artist_id = 22729)
 Planning Time: 0.115 ms
 Execution Time: 0.315 ms
(5 rows)

I ran these two SQL statements about 100 times each and recorded their best possible execution times:

1) The plain SELECT - 0.99ms
2) The STRING_AGG - 1.06ms

So that accounts from ~0.1ms difference only! Which kinda matches the results seen above. All in all, I think the dataset is too small to demonstrate this technique. But, considering the chance that the complexity might not be linear with the performance benefit, it's still interesting.

Even though this tangent is a big off-topic, it is often a great idea to push as much work into the database as you can if applicable. Especially if it means you can transfer a lot less data eventually.

Experimenting with Nginx worker_processes

14 February 2019 0 comments   Linux, MacOSX, Nginx, Web development


I have Nginx 1.15.8 installed with Homebrew on my macOS. By default the /usr/local/etc/nginx/nginx.conf it set to...:

worker_processes  1;

But, from the documentation, it says:

"The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it)." (bold emphasis mine)

What is the ideal number for me? The performance of Nginx on my laptop doesn't really matter. But for my side-projects it's important to have a fast Nginx since it serves static HTML and lots of static assets. However, on my personal servers I have a bunch of other resource hungry stuff going on that I know is more likely to need the resources, like Elasticsearch and uwsgi.

To figure this out, I wrote a benchmark program that requested a small index.html about 10,000 times across 10 concurrent clients with hey.

hey -n 10000 -c 10 http://peterbecom.local/plog/variable_cache_control/awspa

I ran this 10 times between changing the worker_processes in the nginx.conf file. Here's the output:

1 WORKER PROCESSES
BEST  : 13,607.24 reqs/s

2 WORKER PROCESSES
BEST  : 17,422.76 reqs/s

3 WORKER PROCESSES
BEST  : 18,886.60 reqs/s

4 WORKER PROCESSES
BEST  : 19,417.35 reqs/s

5 WORKER PROCESSES
BEST  : 19,094.18 reqs/s

6 WORKER PROCESSES
BEST  : 19,855.32 reqs/s

7 WORKER PROCESSES
BEST  : 19,824.86 reqs/s

8 WORKER PROCESSES
BEST  : 20,118.25 reqs/s

Or, as a graph:

Graph

Now note, this is done here on my MacBook Pro. Not on my Ubuntu DigitalOcean servers. For now, I just want to get a feeling for how these numbers correlate.

Conclusion

The benchmark isn't good enough. The numbers are pretty stable but I'm doing this on my laptop with multiple browsers idling, Slack, and Spotify running. Clearly, the throughput goes up a bit when you allocate more workers but if anything can be learned from this, start with going beyond 1 for a quick fix and from there start poking and more exhaustive benchmarks. And don't forget, if you have time to go deeper on this, to look at the combination of worker_connections and worker_processes.

create-react-app, SCSS, and Bulmaswatch

12 February 2019 0 comments   Javascript, ReactJS, Web development

https://jenil.github.io/bulmaswatch/


1. Create a create-react-app first:

create-react-app myapp

2. Enter it and install node-sass and bulmaswatch

cd myapp
yarn add bulma bulmaswatch node-sass

3. Edit the src/index.js to import index.scss instead:

-import "./index.css";
+import "./index.scss";

4. "Rename" the index.css file:

git rm src/index.css 
touch src/index.scss
git add src/index.scss

5. Now edit the src/index.scss to look like this:

@import "node_modules/bulmaswatch/darkly/bulmaswatch";

This assumes your favorite theme was the darkly one. You can obviously change that later.

6. Run the app:

BROWSER=none yarn start

7. Open the browser at http://localhost:3000

CRA start

That's it! However, the create-react-app default look doesn't expose any of the cool stuff that Bulma can style. So let's rewrite our src/App.js by copying the minimal starter HTML from the Bulma documentation. So make the src/App.js component look something like this:

class App extends Component {
  render() {
    return (
      <section className="section">
        <div className="container">
          <h1 className="title">Hello World</h1>
          <p className="subtitle">
            My first website with <strong>Bulma</strong>!
          </p>
        </div>
      </section>
    );
  }
}

Now it'll look like this:

Bulma starter template

Yes, it's not much but it's a great start. Over to you to take this to infinity and beyond!

Not So Secret Sauce

In the rushed instructions above the choice of theme was darkly. But what you need to do next is go to https://jenil.github.io/bulmaswatch/, click around and eventually pick the one you like. Suppose you like spacelab, then you just change that @import ... line to be:

@import "node_modules/bulmaswatch/spacelab/bulmaswatch";