Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Web development. Clear filter

How MDN's site-search works

26 February 2021 3 comments   Web development, Django, Python, MDN


tl;dr; Periodically, the whole of MDN is built, by our Node code, in a GitHub Action. A Python script bulk-publishes this to Elasticsearch. Our Django server queries the same Elasticsearch via /api/v1/search. The site-search page is a static single-page app that sends XHR requests to the /api/v1/search endpoint. Search results' sort-order is determined by match and "popularity".

Jamstack'ing

The challenge with "Jamstack" websites is with data that is too vast and dynamic that it doesn't make sense to build statically. Search is one of those. For the record, as of Feb 2021, MDN consists of 11,619 documents (aka. articles) in English. Roughly another 40,000 translated documents. In English alone, there are 5.3 million words. So to build a good search experience we need to, as a static site build side-effect, index all of this in a full-text search database. And Elasticsearch is one such database and it's good. In particular, Elasticsearch is something MDN is already quite familiar with because it's what was used from within the Django app when MDN was a wiki.

Note: MDN gets about 20k site-searches per day from within the site.

Build

Diagram

When we build the whole site, it's a script that basically loops over all the raw content, applies macros and fixes, dumps one index.html (via React server-side rendering) and one index.json. The index.json contains all the fully rendered text (as HTML!) in blocks of "prose". It looks something like this:

{
  "doc": {
    "title": "DOCUMENT TITLE",
    "summary": "DOCUMENT SUMMARY",
    "body": [
      {
        "type": "prose", 
        "value": {
          "id": "introduction", 
          "title": "INTRODUCTION",
          "content": "<p>FIRST BLOCK OF TEXTS</p>"
       }
     },
     ...
   ],
   "popularity": 0.12345,
   ...
}

You can see one here: /en-US/docs/Web/index.json

Indexing

Next, after all the index.json files have been produced, a Python script takes over and it traverses all the index.json files and based on that structure it figures out the, title, summary, and the whole body (as HTML).

Next up, before sending this into the bulk-publisher in Elasticsearch it strips the HTML. It's a bit more than just turning <p>Some <em>cool</em> text.</p> to Some cool text. because it also cleans up things like <div class="hidden"> and certain <div class="notecard warning"> blocks.

One thing worth noting is that this whole thing runs roughly every 24 hours and then it builds everything. But what if, between two runs, a certain page has been removed (or moved), how do you remove what was previously added to Elasticsearch? The solution is simple: it deletes and re-creates the index from scratch every day. The whole bulk-publish takes a while so right after the index has been deleted, the searches won't be that great. Someone could be unlucky in that they're searching MDN a couple of seconds after the index was deleted and now waiting for it to build up again.
It's an unfortunate reality but it's a risk worth taking for the sake of simplicity. Also, most people are searching for things in English and specifically the Web/ tree so the bulk-publishing is done in a way the most popular content is bulk-published first and the rest was done after. Here's what the build output logs:

Found 50,461 (potential) documents to index
Deleting any possible existing index and creating a new one called mdn_docs
Took 3m 35s to index 50,362 documents. Approximately 234.1 docs/second
Counts per priority prefixes:
    en-us/docs/web                 9,056
    *rest*                         41,306

So, yes, for 3m 35s there's stuff missing from the index and some unlucky few will get fewer search results than they should. But we can optimize this in the future.

Searching

The way you connect to Elasticsearch is simply by a URL it looks something like this:

https://USER:PASSWD@HASH.us-west-2.aws.found.io:9243

It's an Elasticsearch cluster managed by Elastic running inside AWS. Our job is to make sure that we put the exact same URL in our GitHub Action ("the writer") as we put it into our Django server ("the reader").
In fact, we have 3 Elastic clusters: Prod, Stage, Dev.
And we have 2 Django servers: Prod, Stage.
So we just need to carefully make sure the secrets are set correctly to match the right environment.

Now, in the Django server, we just need to convert a request like GET /api/v1/search?q=foo&locale=fr (for example) to a query to send to Elasticsearch. We have a simple Django view function that validates the query string parameters, does some rate-limiting, creates a query (using elasticsearch-dsl) and packages the Elasticsearch results back to JSON.

How we make that query is important. In here lies the most important feature of the search; how it sorts results.

In one simple explanation, the sort order is a combination of popularity and "matchness". The assumption is that most people want the popular content. I.e. they search for foreach and mean to go to /en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach not /en-US/docs/Web/API/NodeList/forEach both of which contains forEach in the title. The "popularity" is based on Google Analytics pageviews which we download periodically, normalize into a floating-point number between 1 and 0. At the of writing the scoring function does something like this:

rank = doc.popularity * 10 + search.score

This seems to produce pretty reasonable results.

But there's more to the "matchness" too. Elasticsearch has its own API for defining boosting and the way we apply is:

This is then applied on top of whatever else Elasticsearch does such as "Term Frequency" and "Inverse Document Frequency" (tf and if). This article is a helpful introduction.

We're most likely not done with this. There's probably a lot more we can do to tune this myriad of knobs and sliders to get the best possible ranking of documents that match.

Web UI

The last piece of the puzzle is how we display all of this to the user. The way it works is that developer.mozilla.org/$locale/search returns a static page that is blank. As soon as the page has loaded, it lazy-loads JavaScript that can actually issue the XHR request to get and display search results. The code looks something like this:

function SearchResults() {
  const [searchParams] = useSearchParams();
  const sp = createSearchParams(searchParams);
  // add defaults and stuff here
  const fetchURL = `/api/v1/search?${sp.toString()}`;

  const { data, error } = useSWR(
    fetchURL,
    async (url) => {
      const response = await fetch(URL);
      // various checks on the response.statusCode here
      return await response.json();
    }
  );

  // render 'data' or 'error' accordingly here

A lot of interesting details are omitted from this code snippet. You have to check it out for yourself to get a more up-to-date insight into how it actually works. But basically, the window.location (and pushState) query string drives the fetch() call and then all the component has to do is display the search results with some highlighting.

The /api/v1/search endpoint also runs a suggestion query as part of the main search query. This extracts out interest alternative search queries. These are filtered and scored and we issue "sub-queries" just to get a count for each. Now we can do one of those "Did you mean...". For example: search for intersections.

In conclusion

There are a lot of interesting, important, and careful details that are glossed over here in this blog post. It's a constantly evolving system and we're constantly trying to improve and perfect the system in a way that it fits what users expect.

A lot of people reach MDN via a Google search (e.g. mdn array foreach) but despite that, nearly 5% of all traffic on MDN is the site-search functionality. The /$locale/search?... endpoint is the most frequently viewed page of all of MDN. And having a good search engine that's reliable is nevertheless important. By owning and controlling the whole pipeline allows us to do specific things that are unique to MDN that other websites don't need. For example, we index a lot of raw HTML (e.g. <video>) and we have code snippets that needs to be searchable.

Hopefully, the MDN site-search will elevate from being known to be very limited to something now that can genuinely help people get to the exact page better than Google can. Yes, it's worth aiming high!

downloadAndResize - Firebase Cloud Function to serve thumbnails

08 December 2020 0 comments   Web development, That's Groce!, Node, JavaScript


UPDATE 2020-12-30

With sharp after you've loaded the image (sharp(contents) ) make sure to add .rotate() so it automatically rotates the image correctly based on EXIF data.

UPDATE 2020-12-13

I discovered that sharp is much better than jimp. It's order of maginitude faster. And it's actually what the Firebase Resize Images extension uses. Code updated below.

I have a Firebase app that uses the Firebase Cloud Storage to upload images. But now I need thumbnails. So I wrote a cloud function that can generate thumbnails on-the-fly.

There's a Firebase Extension called Resize Images which is nicely done but I just don't like that strategy. At least not for my app. Firstly, I'm forced to pick the right size(s) for thumbnails and I can't really go back on that. If I pick 50x50, 1000x1000 as my sizes, and depend on that in the app, and then realize that I actually want it to be 150x150, 500x500 then I'm quite stuck.

Instead, I want to pick any thumbnail sizes dynamically. One option would be a third-party service like imgix, CloudImage, or Cloudinary but these are not free and besides, I'll need to figure out how to upload the images there. There are other Open Source options like picfit which you install yourself but that's not an attractive option with its implicit complexity for a side-project. I want to stay in the Google Cloud. Another option would be this AppEngine function by Albert Chen which looks nice but then I need to figure out the access control between that and my Firebase Cloud Storage. Also, added complexity.

As part of your app initialization in Firebase, it automatically has access to the appropriate storage bucket. If I do:

const storageRef = storage.ref();
uploadTask = storageRef.child('images/photo.jpg').put(file, metadata);
...

...in the Firebase app, it means I can do:

 admin
      .storage()
      .bucket()
      .file('images/photo.jpg')
      .download()
      .then((downloadData) => {
        const contents = downloadData[0];

...in my cloud function and it just works!

And to do the resizing I use Jimp which is TypeScript aware and easy to use. Now, remember this isn't perfect or mature but it works. It solves my needs and perhaps it will solve your needs too. Or, at least it might be a good start for your application that you can build on. Here's the function (in functions/src/index.ts):

interface StorageErrorType extends Error {
  code: number;
}

const codeToErrorMap: Map<number, string> = new Map();
codeToErrorMap.set(404, "not found");
codeToErrorMap.set(403, "forbidden");
codeToErrorMap.set(401, "unauthenticated");

export const downloadAndResize = functions
  .runWith({ memory: "1GB" })
  .https.onRequest(async (req, res) => {
    const imagePath = req.query.image || "";
    if (!imagePath) {
      res.status(400).send("missing 'image'");
      return;
    }
    if (typeof imagePath !== "string") {
      res.status(400).send("can only be one 'image'");
      return;
    }
    const widthString = req.query.width || "";
    if (!widthString || typeof widthString !== "string") {
      res.status(400).send("missing 'width' or not a single string");
      return;
    }
    const extension = imagePath.toLowerCase().split(".").slice(-1)[0];
    if (!["jpg", "png", "jpeg"].includes(extension)) {
      res.status(400).send(`invalid extension (${extension})`);
      return;
    }
    let width = 0;
    try {
      width = parseInt(widthString);
      if (width < 0) {
        throw new Error("too small");
      }
      if (width > 1000) {
        throw new Error("too big");
      }
    } catch (error) {
      res.status(400).send(`width invalid (${error.toString()}`);
      return;
    }

    admin
      .storage()
      .bucket()
      .file(imagePath)
      .download()
      .then((downloadData) => {
        const contents = downloadData[0];
        console.log(
          `downloadAndResize (${JSON.stringify({
            width,
            imagePath,
          })}) downloadData.length=${humanFileSize(contents.length)}\n`
        );

        const contentType = extension === "png" ? "image/png" : "image/jpeg";
        sharp(contents)
          .rotate()
          .resize(width)
          .toBuffer()
          .then((buffer) => {
            res.setHeader("content-type", contentType);
            // TODO increase some day
            res.setHeader("cache-control", `public,max-age=${60 * 60 * 24}`);
            res.send(buffer);
          })
          .catch((error: Error) => {
            console.error(`Error reading in with sharp: ${error.toString()}`);
            res
              .status(500)
              .send(`Unable to read in image: ${error.toString()}`);
          });
      })
      .catch((error: StorageErrorType) => {
        if (error.code && codeToErrorMap.has(error.code)) {
          res.status(error.code).send(codeToErrorMap.get(error.code));
        } else {
          res.status(500).send(error.message);
        }
      });
  });

function humanFileSize(size: number): string {
  if (size < 1024) return `${size} B`;
  const i = Math.floor(Math.log(size) / Math.log(1024));
  const num = size / Math.pow(1024, i);
  const round = Math.round(num);
  const numStr: string | number =
    round < 10 ? num.toFixed(2) : round < 100 ? num.toFixed(1) : round;
  return `${numStr} ${"KMGTPEZY"[i - 1]}B`;
}

Here's what a sample URL looks like.

I hope it helps!

I think the next thing for me to consider is to extend this so it uploads the thumbnail back and uses the getDownloadURL() of the created thumbnail as a redirect instead. It would be transparent to the app but saves on repeated views. That'd be a good optimization.

Popularity contest for your grocery list

21 November 2020 0 comments   Web development, Mobile, That's Groce!


tl;dr; Up until recently, when you started to type a new entry in your That's Groce shopping list, the suggestions that would appear weren't sorted intelligently. Now they are. They're sorted by popularity.

The whole point with the suggestions that appear is to make it easier for you to not have to type the rest. The first factor that decides which should appear is simply based on what you've typed so far. If you started typing ch we can suggest:

They all contain ch in some form (starting of words only). But space is limited and you can't show every suggestion. So, if you're going to cap it to only show, say, 4 suggestions; which ones should you show first?
I think the solution is to do it by frequency. I.e. items you often put onto the list.

How to calculate the frequency

The way That's Groce now does it is that it knows the exact times a certain item was added to the list. It then takes that list and applies the following algorithm:

For each item...

  1. Discard the dates older than 3 months
  2. Discard any duplicates from clusters (e.g. you accidentally added it and removed it and added it again within minutes)
  3. Calculate the distance (in seconds) between each add
  4. From the last 4 times it was added, take the median value of the distance between

So the frequency becomes a number of seconds. It should feel somewhat realistic. In my family, it actually checks out. We buy bananas every week but sometimes slightly more often than that and in our case, the number comes to ~6 days.

The results

Before sorting by popularity
Before sorting by popularity

After sorting by popularity
After sorting by popularity

Great! The chances of appreciating and picking one of the suggestions is greater if it's more likely to be what you were looking for. And things that have been added frequently in the past are more likely to be added again.

How to debug this

There's now a new page called the "Popularity contest". You get to it from the "List options" button in the upper right-hand corner. On its own, it's fairly "useless" because it just lists them. But it's nice to get a feeling for what your family most frequently add to the list. A lot more can probably be done to this page but for now, it really helps to back up the understanding of how the suggestions are sorted when you're adding new entries.

Popularity contest

If you look carefully at my screenshot here you'll notice two little bugs. There are actually two different entries for "Lemon 🍋" and that was from the early days when that could happen.
Also, another bug is that there's one entry called "Bananas" and one called "Bananas 🍌" which is also something that's being fixed in the latest release. My own family's list predates those fixes.

Hope it helps!

Generating random avatar images in Django/Python

28 October 2020 1 comment   Web development, Django, Python


tl;dr; <img src="/avatar.random.png" alt="Random avataaar"> generates this image:

Random avataaar
(try reloading to get a random new one. funny aren't they?)

When you use Gravatar you can convert people's email addresses to their mugshot.
It works like this:

<img src="https://www.gravatar.com/avatar/$(md5(user.email))">

But most people don't have their mugshot on Gravatar.com unfortunately. But you still want to display an avatar that is distinct per user. Your best option is to generate one and just use the user's name or email as a seed (so it's always random but always deterministic for the same user). And you can also supply a fallback image to Gravatar that they use if the email doesn't match any email they have. That's where this blog post comes in.

I needed that so I shopped around and found avataaars generator which is available as a React component. But I need it to be server-side and in Python. And thankfully there's a great port called: py-avataaars.

It depends on CairoSVG to convert an SVG to a PNG but it's easy to install. Anyway, here's my hack to generate random "avataaars" from Django:

import io
import random

import py_avataaars
from django import http
from django.utils.cache import add_never_cache_headers, patch_cache_control


def avatar_image(request, seed=None):
    if not seed:
        seed = request.GET.get("seed") or "random"

    if seed != "random":
        random.seed(seed)

    bytes = io.BytesIO()

    def r(enum_):
        return random.choice(list(enum_))

    avatar = py_avataaars.PyAvataaar(
        style=py_avataaars.AvatarStyle.CIRCLE,
        # style=py_avataaars.AvatarStyle.TRANSPARENT,
        skin_color=r(py_avataaars.SkinColor),
        hair_color=r(py_avataaars.HairColor),
        facial_hair_type=r(py_avataaars.FacialHairType),
        facial_hair_color=r(py_avataaars.FacialHairColor),
        top_type=r(py_avataaars.TopType),
        hat_color=r(py_avataaars.ClotheColor),
        mouth_type=r(py_avataaars.MouthType),
        eye_type=r(py_avataaars.EyesType),
        eyebrow_type=r(py_avataaars.EyebrowType),
        nose_type=r(py_avataaars.NoseType),
        accessories_type=r(py_avataaars.AccessoriesType),
        clothe_type=r(py_avataaars.ClotheType),
        clothe_color=r(py_avataaars.ClotheColor),
        clothe_graphic_type=r(py_avataaars.ClotheGraphicType),
    )
    avatar.render_png_file(bytes)

    response = http.HttpResponse(bytes.getvalue())
    response["content-type"] = "image/png"
    if seed == "random":
        add_never_cache_headers(response)
    else:
        patch_cache_control(response, max_age=60, public=True)

    return response

It's not perfect but it works. The URL to this endpoint is /avatar.<seed>.png and if you make the seed parameter random the response is always different.

To make the image not random, you replace the <seed> with any string. For example (use your imagination):

{% for comment in comments %}
  <img src="/avatar.{{ comment.user.id }}.png" alt="{{ comment.user.name }}">
  <blockquote>{{ comment.text }}</blockquote>
  <i>{{ comment.date }}</i>
{% endfor %}

I've put together this test page if you want to see more funny avatar combinations instead of doing work :)

That's Groce!

22 October 2020 0 comments   Web development, Family, Mobile, Preact, That's Groce!

https://thatsgroce.web.app/


tl;dr That's Groce! is: A mobile web app to help families do grocery shopping and meal planning. Developed out of necessity by a family (Peter and Ashley) and used daily in their home.

Hopefully, the About page explains what it does.

Sample list
Screenshot of a sample list

The backstory

We used to use Wunderlist, but that stopped working. Next, we tried Cozi and that worked for a while but it was buggy and annoying in so many ways. Finally, we gave up and decided to build our own. Exactly how we need it to be, as efficient as possible.

We also tried a couple of regular to-do list apps where you can have shared accounts but we wanted something perfectly tailored towards the specific needs of family grocery shopping (and meal planning). That's how That's Groce! was born.

The killer features

The about page does a good job of listing the killer features but let's emphasize it one more time.

It's not an app store app

Saved as Home screen app

You won't find it on the Apple App store. It's a web app that's been tailored to work well in mobile web browsers (iOS Safari) and you can use the "Add to Home screen" so it looks and acts like a regular app.
It would be nice to try to make it a regular native mobile app but that takes significant time which is hard to find but certainly something to aspire to if it can be done in a nice way.

"Really smart about suggestions"

What does that killer feature mean? (At the time of writing (Oct 2020), it isn't launched yet but the pieces are coming together.) Are there certain stables you buy recurringly? Like milk or bananas or Cheerios. If the app can start to see a pattern of commonly added items, it can suggest it immediately so when you're making your list on Monday morning, you just need to tap to add those.

Another important thing is that as you type, it can suggest many things based on the first or couple of characters you type in, but you can't suggest every single possible word so which one should you suggest first?
The way That's Groce! works is that it learns based on the number of times and how recently you add something to your list. As of today, look what happens when I type a on my list:

Suggestions based on typing 'a'
When I type a it suggests things that start with "A" but based on frequency.

The more you use it, the better the suggestions get.

Also, to get you started, over 100 items are preloaded as good suggestions but that's just to get you up and running. Once your family starts to use it, your own suggestions get better and better over time.

"Same order you usually walk through your grocery store"

This was important to us because we found we walk through the aisles in pretty much the same way. Every time. When you walk in you have your produce (veggies first, then fruit, then salad stuff) on the right. Then baked goods and deli. Then meats and alcohol. Etc. So if you can group your items based on these descriptions you can be really efficient with your list and it becomes a lot easier to cross off sections of the store and not have to scroll up and down or having to walk back to pick up that pizza dough all the way back at the deli section.

For this to work, you need to type in groups for your items. But you can call them whatever you like. If you want to type "Aisle 1", "Aisle 2", "Dairy stuff" you can. It's all up to you. Keep in mind that it might feel like a bit of up-front work at first, and it is, but your list is learning so you essentially only have to do it once.

Don't be a slave to your list!

If you do decide to try it, keep one thing in mind: You're in control. You don't need to type in perfect descriptions, amounts, groups, and quantities. If you don't know how to spell "bee-ar-naise sauce", don't worry about it. It's your list. You can type whatever you want or need. A lot of to-do lists invite you with complex options to organize the hell out of your list items. Don't do that. Think of That's Groce! as a fridge post-it note that you and your partner keep in their pocket that automatically synchronizes.

You can help

We built this for ourselves but it's built in a way that any family can use it and hopefully also be better organized. But once you sign in you can submit feedback for suggestions. And if you're into coding, the whole app is Open Source so it's fairly easy to modify the code or even host it yourself if you wanted to: https://github.com/peterbe/groce/

Also, if you do try it and like it, please consider going to the Share the ❤️ page and, you know, share it with friends. Much appreciated!

Progressive CSS rendering with or without data URLs

26 September 2020 0 comments   Web development, Web Performance, JavaScript


You can write your CSS so that it depends on images. Like this:

li.one {
  background-image: url("skull.png");
}

That means that the browser will do its best to style the li.one with what little it has from the CSS. Then, it'll ask the browser to go ahead and network download that skull.png URL.

But, another option is to embed the image as a data URL like this:

li.one{background-image:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIAAAACACAYAAADDPmHL...rkJggg==)

As a block of CSS, it's much larger but it's one less network call. What if you know that skull.png will be needed? Is it faster to inline it or to leave it as a URL? Let's see!

First of all, I wanted to get a feeling for how much larger an image is in bytes if you transform them to data URLs. Check out this script's output:

▶ ./bin/b64datauri.js src/*.png src/*.svg
src/lizard.png       43,551     58,090     1.3x
src/skull.png        7,870      10,518     1.3x
src/clippy.svg       483        670        1.4x
src/curve.svg        387        542        1.4x
src/dino.svg         909        1,238      1.4x
src/sprite.svg       10,330     13,802     1.3x
src/survey.svg       2,069      2,786      1.3x

Basically, as a blob of data URL, the images become about 1.3x larger. Hopefully, with HTTP2, the headers are cheap for each URL downloaded over the network, but it's not 0. (No idea what the CPU-work multiplier is)

Experiment assumptions and notes

It's a fairly commonly known fact that data URLs have a CPU cost. That base64 needs to be decoded before the image can be decoded by the renderer. So let's stick to fairly small images.

The experiment

I made a page that looks like this:

li {
  background-repeat: no-repeat;
  width: 150px;
  height: 150px;
  margin: 20px;
  background-size: contain;
}
li.one {
  background-image: url("skull.png");
}
li.two {
  background-image: url("dino.svg");
}
li.three {
  background-image: url("clippy.svg");
}
li.four {
  background-image: url("sprite.svg");
}
li.five {
  background-image: url("survey.svg");
}
li.six {
  background-image: url("curve.svg");
}

and

<ol>
  <li class="one">One</li>
  <li class="two">Two</li>
  <li class="three">Three</li>
  <li class="four">Four</li>
  <li class="five">Five</li>
  <li class="six">Six</li>
</ol>

See the whole page here

The page also uses Bootstrap to make it somewhat realistic. Then, using minimalcss combine the external CSS with the CSS inline and produce a page that is just HTML + 1 <style> tag.

Now, based on that page, the variant is that each url($URL) in the CSS gets converted to url(data:mime/type;base64,blablabla...). The HTML is gzipped (and brotli compressed) and put behind a CDN. The URLs are:

Also, there's this page which is without the critical CSS inlined.

To appreciate what this means in terms of size on the HTML, let's compare:

Considering that gzip (accept-encoding: gzip,deflate) is almost always used by browsers, that means the page is 15KB more before it can be fully downloaded. (But, it's streamed so maybe the comparison is a bit flawed)

Analysis

WebPagetest.org results here. I love WebPagetest, but the results are usually a bit erratic to be a good enough for comparing. Maybe if you could do the visual comparison repeated times, but I don't think you can.

WebPagetest comparison
WebPagetest visual comparison

And the waterfalls...

WebPagetest waterfall, with regular URLs
With regular URLs

WebPagetest waterfall with data URLs
With data URLs

Fairly expected.

Next up, using Google Chrome's Performance dev tools panel. Set to 6x CPU slowdown and online with Fast 3G.

I don't know how to demonstrate this other than screenshots:

Performance with external images
Performance with external images

Performance with data URLs
Performance with data URLs

Those screenshots are rough attempts at showing the area when it starts to display the images.

Whole Performance tab with external images
Whole Performance tab with external images

Whole Performance tab with data URLs
Whole Performance tab with data URLs

I ran these things 2 times and the results were pretty steady.

I tried Lighthouse but the difference was indistinguishable.

Summary

Yes, inlining your CSS images is faster. But it's with a slim margin and the disadvantages aren't negligible.

This technique costs more CPU because there's a lot more base64 decoding to be done, and what if you have a big fat JavaScript bundle in there that wants a piece of the CPU? So ask yourself, how valuable is to not hog the CPU. Perhaps someone who understands the browser engines better can tell if the base64 decoding cost is spread nicely onto multiple CPUs or if it would stand in the way of the main thread.

What about anti-progressive rendering

When Facebook redesigned www.facebook.com in mid-2020 one of their conscious decisions was to inline the SVG glyphs into the JavaScript itself.

"To prevent flickering as icons come in after the rest of the content, we inline SVGs into the HTML using React rather than passing SVG files to <img> tags."

Although that comment was about SVGs in the DOM, from a JavaScript perspective, the point is nevertheless relevant to my experiment. If you look closely, at the screenshots above (or you open the URL yourself and hit reload with HTTP caching disabled) the net effect is that the late-loading images do cause a bit of "flicker". It's not flickering as in "now it's here", "now it's gone", "now it's back again". But it's flickering in that things are happening with progressive rendering. Your eyes might get tired and they say to your brain "Wake me up when the whole thing is finished. I can wait."

This topic quickly escalates into perceived performance which is a stratosphere of its own. And personally, I can only estimate and try to speak about my gut reactions.

In conclusion, there are advantages to using data URIs over external images in CSS. But please, first make sure you don't convert the image URLs in a big bloated .css file to data URLs if you're not sure they'll all be needed in the DOM.

Bonus!

If you're not convinced of the power of inlining the critical CSS, check out this WebPagetest run that includes the image where it references the whole bootstrap.min.css as before doing any other optimizations.

With baseline that isn't just the critical CSS
With baseline that isn't just the critical CSS