A blog and website by Peter Bengtsson

Writing a custom Datadog reporter using the Python API

21 May 2018 2 comments   Python

Datadog is an awesome sofware-as-a-service where you can aggregate and visualize statsd metrics sent from an application. For visualizing timings you create a time series graph. It can look something like this:

Time series

This time series looks sane because because it's timings made very very frequently. But what if it happens very rarely. Like once a day. Then, the graph doesn't look very useful. See this example:

"Rare time" series

Not only is it happening rarely, the amount of seconds is really quite hard to parse. I.e. what's 2.6 million milliseconds (answer is approximately 45 minutes). So to solve that I used the Datadog API. Now I can get a metric of every single point in milliseconds and I can make a little data table with human-readable dates and times.

The end result looks something like this:

|          WHEN           |        TIME AGO        |       TIME TOOK       |
| Mon 2018-05-21T17:00:00 | 2 hours 43 minutes ago | 23 minutes 32 seconds |
| Sun 2018-05-20T17:00:00 | 1 day 2 hours ago      | 20 seconds            |
| Sat 2018-05-19T17:00:00 | 2 days 2 hours ago     | 20 seconds            |
| Fri 2018-05-18T17:00:00 | 3 days 2 hours ago     | 2 minutes 24 seconds  |
| Wed 2018-05-16T20:00:00 | 4 days 23 hours ago    | 38 minutes 38 seconds |

It's not gorgeous and there are a lot of caveats but it's at least really easy to read. See the code here.

I don't think you can run this code since you don't have the same (hardcoded) metrics but hopefully it can serve as an example to whet your appetite.

What I'm going to do next, if I have time, is to run this as a Flask app instead that outputs a HTML table on a Herokup app or something.

To CDN assets or just HTTP/2

17 May 2018 1 comment   Web Performance, Javascript, Web development

tl;dr; I see little benefit in using a CDN at this point.

I took two random pages here on my blog. One and Another. Doesn't matter what they say but it's important to notice that they're extremely similar. No big pictures. Both have 1 banner ad each. Both served with HTTP/2. Neither have any blocking linked assets. I.e. there is no blocking <link ref="stylesheet" href="styles.css"> and the script tags are are either async or defer. Both pages reference one little .png that is not deliberately lazy loaded. That's the baseline.

The HTML document, in both URLs, is served with HTTP/2 but it references a the lazy loaded .css and (a bunch of) .js files, via a CDN. In other words, it looks like this:

▶ curl -v
> GET /plog/hashin-0.7.0 HTTP/2
< HTTP/2 200
<link rel="preload" href="" 
 as="style" onload="this.onload=null;this.rel='stylesheet'">
<script defer src=""></script>

So, is a an awesome CDN, but to a first-time visitor, that is going to require a DNS lookup and the creation of a new TCP connection that can be kept alive. The alternative to this is to not put any of the of the .png, .css or .js assets on a CDN. Basically, instead of <script src="">, just do <script src="/foo.js">.

CDNs are really important since latency is a killer to web performance and remember that "Use a CDN" is rule number 2 in the, now dated, YSlow ruleset. However, we're entering an era where HTTP/2 is becoming more and more available in mainstream browsers (hint: nearly 100% of visitors to my site are HTTP/2 support). Buuuuuut, the latency (DNS, connection and SSL negotiation) doesn't matter that much if you have already paid those costs to get to the origin web server ( in this example).

The Experiment

What I'm interested in seeing if there is a way to gauge/measure when it's best to use a CDN and when it's best to use the origin web server to serve all assets. My friend @stereobooster suggested: " is all you need"

Ok. Let's measure that then with and see what we can learn.

Here's a visual comparison of the two URLs when they both use CDN for the static assets.

Here's a visual comparison of one using a CDN for static assets and one does not.

You can see their webpagetests individually here and here.

Assets over CDN
Two connection prices paid. Downloads individual assets faster but ultimately takes a longer time.

One HTTP/2 connection only
Only 1 connection price paid. ALL assets downloaded sooner, albeit individually slower.


My web server is served from a highly optimized Nginx server in New York, USA. The two Webpagetest visual comparisons above are both done from Virgina, USA. But the killer feature of a CDN is that latency can be so much better thanks to edge locations of the CDN. In particular, KeyCDN have an edge location in Stockholm, Sweden. So what happens when you run the URLs from a Webpagetest machine in Stockholm, Sweden?

The both start to render at the same time (expected since the HTML document is still in New York, USA) but the (rougly) total time to download all the .css and .js is (about) 2.6 seconds when a CDN and 1.9 seconds without a CDN. In other words, despite the CDN geographically so much closer, the static assets are still available sooner without a CDN.

It's pretty clear at this point that it's not a good idea to use a CDN for static assets. Even if they're not critical. The "First Meaningful Paint" and "Time To Interactive" are about the same but when HTTP/2 can download all the .js files faster, their useful JavaScript can start being useful sooner with HTTP/2.

What Else

So in my site, it's easiest to host the whole site on an Nginx server in a Digital Ocean server. It's easy to invalidate its cache (just delete the file from disk and wait for Django to regenerate it). Another advantage with using plain Nginx is that I serve the HTML with Cache-Control headers and then do some post-processing of the .html file and since Nginx is disk-based, I don't have to update a CDN.

An alternative would be to put the whole site behind a CDN. That way, the initial HTML document can be served from a CDN edge location, using HTTP/2 and send the rest of the static assets on the same HTTP/2 connection. But this means that every single dynamic URL (e.g. HTTP POSTs or some per-user XHR requests) has to go via a CDN rather than going straight to the Nginx that is connected to the Django web server.

Last but not least, even though my Nginx server is on a decent machine and pretty well tuned, I very much doubt it's as fast and powerful as a KeyCDN or CloudFront or Akamai or Google Cloud CDN. Those servers are beasts! Mind you, the DNS + connection + SSL negotiation, when requesting from Stockholm, Sweden was about 0.75s to my Nginx in New York, USA. For the KeyCDN edge location the DNS + connection + SSL negotiation was about 0.52s. So not a huge difference actually.

Another important aspect is Service Workers. Perhaps I don't know how to hack it, but it doesn't work when you use differnet domains for the service worker .js file and the URIs it references.

In conclusion; I see little benefit in using a CDN at this point. Perhaps for larger assets like videos, GIFs or high-res images. HTTP/2 changes one of the major web performance rules. End of an era(?)

Rust > Go > Python parse millions of dates in CSV files

15 May 2018 8 comments   Python

It all started so innocently. The task at hand was to download an inventory of every single file ever uploaded to a public AWS S3 bucket. The way that works is that you download the root manifest.json. It references a boat load of .csv.gz files. So to go through every single file uploaded to the bucket, you read the manifest.json, the download each and every .csv.gz file. Now you can parse these and do something with each row. An example row in one of the CSV files looks like this:


In the Mozilla Buildhub what we do is we periodically do this, in Python (with asyncio), to spot if there are any files in the S3 bucket have potentially missed to record in an different database.
But ouf the 150 or so .csv.gz files, most of the files are getting old and in this particular application we can be certain it's unlikely to be relevant and can be ignored. To come to that conclusion you parse each .csv.gz file, parse each row of the CSV, extract the last_modified value (e.g. 2017-09-21T13:08:25.000Z) into a datetime.datetime instance. Now you can quickly decide if it's too old or recent enough to go through the other various checks.

So, the task is to parse 150 .csv.gz files totalling about 2.5GB with roughly 75 million rows. Basically parsing the date strings into datetime.datetime objects 75 million times.


What this script does is it opens, synchronously, each and every .csv.gz file, parses each records date and compares it to a constant ("Is this record older than 6 months or not?")

This is an extraction of a bigger system to just look at the performance of parsing all those .csv.gz files to figure out how many are old and how many are within 6 months. Code looks like this:

import datetime
import gzip
import csv
from glob import glob

cutoff = - datetime.timedelta(days=6 * 30)

def count(fn):
    count = total = 0
    with, 'rt') as f:
        reader = csv.reader(f)
        for line in reader:
            lastmodified = datetime.datetime.strptime(
            if lastmodified > cutoff:
                count += 1
            total += 1

    return total, count

def run():
    total = recent = 0
    for fn in glob('*.csv.gz'):
        if len(fn) == 39:  # filter out other junk files that don't fit the pattern
            t, c = count(fn)
            total += t
            recent += c

    print('{:.1f}%'.format(100 * recent / total))


Code as a gist here.

Only problem. This is horribly slow.

To reproduce this, I took a sample of 38 of these .csv.gz files and ran the above code with CPython 3.6.5. It took 3m44s on my 2017 MacBook Pro.

Let's try a couple low-hanging fruit ideas:

Hmm... Clearly this is CPU bound and using multiple processes is the ticket. But what's really holding us back is the date parsing. From the "Fastest Python datetime parser" benchmark the trick is to use ciso8601. Alright, let's try that. Diff:

< cutoff = - datetime.timedelta(days=6 * 30)
> import ciso8601
> cutoff = datetime.datetime.utcnow().replace(
>     tzinfo=datetime.timezone.utc
> ) - datetime.timedelta(days=6 * 30)
<             lastmodified = datetime.datetime.strptime(
<                 line[3],
<                 '%Y-%m-%dT%H:%M:%S.%fZ'
<             )
>             lastmodified = ciso8601.parse_datetime(line[3])

Version with ciso8601 here.

So what originally took 3 and a half minutes now takes 18 seconds. I suspect that's about as good as it gets with Python.
But it's not too shabby. Parsing 12,980,990 date strings in 18 seconds. Not bad.


My Go is getting rusty but it's quite easy to write one of these so I couldn't resist the temptation:

package main

import (

func count(fn string, index int) (int, int) {
    fmt.Printf("%d %v\n", index, fn)
    f, err := os.Open(fn)
    if err != nil {
    defer f.Close()
    gr, err := gzip.NewReader(f)
    if err != nil {
    defer gr.Close()

    cr := csv.NewReader(gr)
    rec, err := cr.ReadAll()
    if err != nil {
    var count = 0
    var total = 0
    layout := "2006-01-02T15:04:05.000Z"

    minimum, err := time.Parse(layout, "2017-11-02T00:00:00.000Z")
    if err != nil {

    for _, v := range rec {
        last_modified := v[3]

        t, err := time.Parse(layout, last_modified)
        if err != nil {
        if t.After(minimum) {
            count += 1
        total += 1
    return total, count

func FloatToString(input_num float64) string {
    // to convert a float number to a string
    return strconv.FormatFloat(input_num, 'f', 2, 64)

func main() {
    var pattern = "*.csv.gz"

    files, err := filepath.Glob(pattern)
    if err != nil {
    total := int(0)
    recent := int(0)
    for i, fn := range files {
        if len(fn) == 39 {
            // fmt.Println(fn)
            c, t := count(fn, i)
            total += t
            recent += c
    ratio := float64(recent) / float64(total)
    fmt.Println(FloatToString(100.0 * ratio))

Code as as gist here.

Using go1.10.1 I run go make main.go and then time ./main. This takes just 20s which is about the time it took the Python version that uses a process pool and ciso8601.

I showed this to my colleague @mostlygeek who saw my scripts and did the Go version more properly with its own repo.
At first pass (go build filter.go and time ./filter) this one clocks in at 19s just like my naive initial hack. However if you run this as time GOPAR=2 ./filter it will use 8 workers (my MacBook Pro as 8 CPUs) and now it only takes: 5.3s.

By the way, check out @mostlygeek's if you want to generate and download yourself a bunch of these .csv.gz files.


First @mythmon stepped up and wrote two versions. One single-threaded and one using rayon which will use all CPUs you have.

The version using rayon looks like this (single-threaded version here):

extern crate csv;
extern crate flate2;
extern crate serde_derive;
extern crate chrono;
extern crate rayon;

use std::env;
use std::fs::File;
use std::io;
use std::iter::Sum;

use chrono::{DateTime, Utc, Duration};
use flate2::read::GzDecoder;
use rayon::prelude::*;

#[derive(Debug, Deserialize)]
struct Row {
    bucket: String,
    key: String,
    size: usize,
    last_modified_date: DateTime<Utc>,
    etag: String,

struct Stats {
    total: usize,
    recent: usize,

impl Sum for Stats {
    fn sum<I: Iterator<Item=Self>>(iter: I) -> Self {
        let mut acc = Stats { total: 0, recent: 0 };
        for stat in  iter {
            acc.recent += stat.recent;

fn main() {
    let cutoff = Utc::now() - Duration::days(180);
    let filenames: Vec<String> = env::args().skip(1).collect();

    let stats: Stats = filenames.par_iter()
        .map(|filename| count(&filename, cutoff).expect(&format!("Couldn't read {}", &filename)))

    let percent = 100.0 * stats.recent as f32 / as f32;
    println!("{} / {} = {:.2}%", stats.recent,, percent);

fn count(path: &str, cutoff: DateTime<Utc>) -> Result<Stats, io::Error> {
    let mut input_file = File::open(&path)?;
    let decoder = GzDecoder::new(&mut input_file)?;
    let mut reader = csv::ReaderBuilder::new()

    let mut total = 0;
    let recent = reader.deserialize::<Row>()
        .flat_map(|row| row)  // Unwrap Somes, and skip Nones
        .inspect(|_| total += 1)
        .filter(|row| row.last_modified_date > cutoff)

    Ok(Stats { total, recent })

I installed it like this (I have rustc 1.26 installed):

▶ cargo build --release --bin single_threaded
▶ time ./target/release/single_threaded ../*.csv.gz

That finishes in 22s.

Now let's try the one that uses all CPUs in parallel:

▶ cargo build --release --bin rayon
▶ time ./target/release/rayon ../*.csv.gz

That took 5.6s

That's rougly 3 times faster than the best Python version.

When chatting with my teammates about this, I "nerd-sniped" in another colleague, Ted Mielczarek who forked Mike's Rust version.

Compile and running these two I get 17.4s for the single-threaded version and 2.5s for the rayon one.

In conclusion

  1. Simplest Python version: 3m44s
  2. Using PyPy (for Python 3.5): 2m30s
  3. Using asyncio: 3m37s
  4. Using concurrent.futures.ThreadPoolExecutor: 7m05s
  5. Using concurrent.futures.ProcessPoolExecutor: 1m5s
  6. Using ciso8601 to parse the dates: 1m08s
  7. Using ciso8601 and concurrent.futures.ProcessPoolExecutor: 18.4s
  8. Novice Go version: 20s
  9. Go version with parallel workers: 5.3s
  10. Single-threaded Rust version: 22s
  11. Parallel workers in Rust: 5.6s
  12. (Ted's) Single-threaded Rust version: 17.4s
  13. (Ted's) Parallel workers in Rust: 2.5s

Most interesting is that this is not surprising. Of course it gets faster if you use more CPUs in parallel. And of course a C binary to do a critical piece in Python will speed things up. What I'm personally quite attracted to is how easy it was to replace the date parsing with ciso8601 in Python and get a more-than-double performance boost with very little work.

Yes, I'm perfectly aware that these are not scientific conditions and the list of disclaimers is long and boring. However, it was fun! It's fun to compare and constrast solutions like this. Don't you think?

Webpack Bundle Analyzer for create-react-app

14 May 2018 0 comments   ReactJS, Javascript

webpack-bundle-analyzer is an awesome little program for understanding why and which parts of your bundled .js files are so big. It's a lot more advanced (and pretty) than source-map-explorer.

Thanks to this tip by @trevorwhealy you can now use webpack-bundle-analyzer on a create-react-app bundle. Yay!

Check out the report I made for the client side code of

Webpack bundle analyzed for

One thing I personally noticed from this is that the .png do take up quite a lot of kilobytes. And I'm quite that the whatwg-fetch polyfill uses 12KB before gzip.

Always return namespaces in Django REST Framework

11 May 2018 0 comments   Django, Python

By default, when you hook up a model to Django REST Framework and run a query in JSON format, what you get is a list. E.g.

For GET localhost:8000/api/mymodel/

  {"id": 1, "name": "Foo"},
  {"id": 2, "name": "Bar"},
  {"id": 3, "name": "Baz"}

This isn't great because there's no good way to include other auxiliary data points that are relevant to this query. In Elasticsearch you get something like this:

  "took": 106,
  "timed_out": false,
  "_shards": {},
  "hits": {
    "total": 0,
    "hits": [],
    "max_score": 1

Another key is that perhaps today you can't think of any immediate reason why you want to include some additonal meta data about the query, but perhaps some day you will.

The way to solve this in Django REST Framework is to override the list function in your Viewset classes.


from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer


from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer

    def list(self, request, *args, **kwargs):
        response = super().list(request, *args, **kwargs)
        # Where the magic happens!
        return response

Now, to re-wrap that, the is a OrderedDict which you can change. Here's one way to do it:

from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer

    def list(self, request, *args, **kwargs):
        response = super().list(request, *args, **kwargs) = {
        return response

And if you want to do the same the "detail API" where you retrieve a single model instance, you can add an override to the retrieve method:

def retrieve(self, request, *args, **kwargs):
    response = super().retrieve(request, *args, **kwargs) = {
    return response

That's it. Perhaps it's personal preference but if you, like me, prefers this style, this is how you do it. I like namespacing things instead of dealing with raw lists.

"Namespaces are one honking great idea -- let's do more of those!"

From import this

Note! This works equally when you enable pagination. Enabling pagination immediately changes the main result from a list to a dictionary. I.e. Instead of...

  {"id": 1, "name": "Foo"},
  {"id": 2, "name": "Bar"},
  {"id": 3, "name": "Baz"}

you now get...

  "count": 3,
  "next": null,
  "previous": null,
  "items": [
    {"id": 1, "name": "Foo"},
    {"id": 2, "name": "Bar"},
    {"id": 3, "name": "Baz"}

So if you apply the "trick" mentioned in this blog post you end up with...:

  "hits": {
    "count": 3,
    "next": null,
    "previous": null,
    "items": [
      {"id": 1, "name": "Foo"},
      {"id": 2, "name": "Bar"},
      {"id": 3, "name": "Baz"}

How I found out where a bash alias was set up

09 May 2018 0 comments   Linux

I wanted to install a command line tool called gg. But for some reason, gg was already tied to an alias. No problem, I'll just delete that alias. I looked in ~/.bash_profile and I looked in ~/.zshrc and it wasn't there!

But here's how I managed to figure out where it came from:

▶ which gg
gg: aliased to git gui citool

Then I copied the git gui citool part of that output and ran:

▶ rg --hidden 'git gui citool'
104:alias gg='git gui citool'
105:alias gga='git gui citool --amend'

A ha! So it was .oh-my-zsh/plugins/git/git.plugin.zsh that was the culprit. Totally forgot about the plugin. It's full of other useful aliases so I just commented out the one(s) I knew I don't need any more.

By the way rg, aka. ripgrep is probably one of the best tools I have. I use it so often that it's attached to my belt rather than in my toolbox.

Real minimal example of going from setState to MobX

04 May 2018 0 comments   ReactJS, Javascript

This is not meant as a tutorial on MobX but hopefully it can be inspirational for people who have grokked how React's setState works but now feel they need to move the state management in their React app out of the components.

To jump right in, here is a changeset that demonstrates how to replace setState with a MobX store:

It's a really simple Todo list application based on create-react-app. Not much to read into at this point.

Here are some caveats to be aware if you look at the diff and wonder...

Caveat last but not least... This diff does not much other than adding more library dependencies and fancy "observable arrays" that are hard to introspect with console.log debugging.
However, the intention is to...

  1. Add react-router to the mix so opening the Todo list is just one of many possible views.
  2. Now the Store.js file can be all about data. Data retrieval, storage, manipulation, mutation etc. The other components will be more simple since their only job is to render that's in the store and send events back to the store based on user actions.
  3. Note that the store is also put into window. That means I can open the web console and type store.items[2].text = "Test change" and simply by hitting enter the app re-renders to this change.

gtop is best

02 May 2018 0 comments   Javascript, MacOSX, Linux

To me, using top inside a Linux server via SSH is all muscle-memory and it's definitely good enough. On my Macbook when working on some long-running code that is resource intensive the best tool I know of is: gtop

gtop in action
gtop in action

I like it because it has the graphs I want and need. It splits up the work of each CPU which is awesome. That's useful for understanding how well a program is able to leverage more than one CPU process.

And it's really nice to have the list of Processes there to be able to quickly compare which programs are running and how that might affect the use of the CPUs.

Instead of listing alternatives I've tried before, hopefully this Reddit discussion has good links to other alternatives

Fastest Python datetime parser

02 May 2018 0 comments   Python

tl;dr; It's ciso8601.

I have this Python app that I'm working on. It has a cron job that downloads a listing of every single file listed in an S3 bucket. AWS S3 publishes a manifest of .csv.gz files. You download the manifest and for each hashhashash.csv.gz you download those files. Then my program reads these CSV files and it is able to ignore certain rows based on them being beyond the retention period. It basically parses the ISO formatted datetime as a string, compares it with a cutoff datetime.datetime instance and is able to quickly skip or allow it in for full processing.

At the time of writing, it's roughly 160 .csv.gz files weighing a total of about 2GB. In total it's about 50 millions rows of CSV. That means it's 50 million datetime parsings.

I admit, this cron job doesn't have to be super fast and it's OK if it takes an hour since it's just a cron job running on a server in the cloud somewhere. But I would like to know, is there a way to speed up the date parsing because that feels expensive to do in Python 50 million times per day.

Here's the benchmark:

import csv
import datetime
import random
import statistics
import time

import ciso8601

def f1(datestr):
    return datetime.datetime.strptime(datestr, '%Y-%m-%dT%H:%M:%S.%fZ')

def f2(datestr):
    return ciso8601.parse_datetime(datestr)

def f3(datestr):
    return datetime.datetime(

# Assertions
assert f1(
).strftime('%Y%m%d%H%M') == f2(
).strftime('%Y%m%d%H%M') == f3(
).strftime('%Y%m%d%H%M') == '201709211254'

functions = f1, f2, f3
times = {f.__name__: [] for f in functions}

with open('046444ae07279c115edfc23ba1cd8a19.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        func = random.choice(functions)
        t0 = time.clock()
        t1 = time.clock()
        times[func.__name__].append((t1 - t0) * 1000)

def ms(number):
    return '{:.5f}ms'.format(number)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', format(len(numbers), ','), 'times')
    print('\tBEST  ', ms(min(numbers)))
    print('\tMEDIAN', ms(statistics.median(numbers)))
    print('\tMEAN  ', ms(statistics.mean(numbers)))
    print('\tSTDEV ', ms(statistics.stdev(numbers)))

Yeah, it's a bit ugly but it works. Here's the output:

FUNCTION: f1 Used 111,475 times
    BEST   0.01300ms
    MEDIAN 0.01500ms
    MEAN   0.01685ms
    STDEV  0.00706ms
FUNCTION: f2 Used 111,764 times
    BEST   0.00100ms
    MEDIAN 0.00200ms
    MEAN   0.00197ms
    STDEV  0.00167ms
FUNCTION: f3 Used 111,362 times
    BEST   0.00300ms
    MEDIAN 0.00400ms
    MEAN   0.00409ms
    STDEV  0.00225ms

In summary:

Or, if you compare to the slowest (f1):


If you know with confidence that you don't want or need timezone aware datetime instances, you can use csiso8601.parse_datetime_unaware instead.

from the README:

"Please note that it takes more time to parse aware datetimes, especially if they're not in UTC. If you don't care about time zone information, use the parse_datetime_unaware method, which will discard any time zone information and is faster."

In my benchmark the strings I use look like this 2017-09-21T12:54:24.000Z. I added another function to the benmark that uses ciso8601.parse_datetime_unaware and it clocked in at the exact same time as f2.

The impressive first-meaningful-paint improvement of using minimalcss

24 April 2018 1 comment   Javascript, Web development

tl;dr; The critical CSS solution, using minimalcss, yields a 40% improvement in First Meaningful Paint and 90% improvement in the Time to Start Render.

About a month ago I enabled minimalcss here on my personal blog to properly test it in production. In that blog post I only looked at the difference in file size. This time, I'm testing the impact of it using I picked two blog posts (that don't have images), but both have some syntax highlighting of code CSS. One and Two. Doesn't matter what's in those two pages but they're relatively similar in shape and size.

In three experiments I compare, on 4G Chome...

  1. both optimized
  2. first one optimized only
  3. second one optimized only
  4. both not optimized

Note! In the following screenshots from Webpagetest the "Thumbnail interval" is set to 0.5 seconds.

Note2! These pages used HTTP/2 and the CSS stylesheets are loaded from a CDN.

BOTH pages optimized

Both pages optimized

Timings both optimized

They are about 0.3 seconds apart in their first meaningful paint.

This is the baseline comparison. It's not perfectly the same first-meaningful-paint but close enough. The point is what difference it makes later.

Only FIRST page optimized

First page optimized

Timings first page optimized

The optimized page is 0.7 seconds faster.

Only SECOND page optimized

Second page optimized

Timings second page optimized

The optimized page is 1.4 seconds faster.

Bonus! Both pages NOT optimized

Just to check that it all holds up. Here, both pages are compared with out the critical CSS optimization.

Both pages NOT optimized

Timings both NOT optimized

If we roughly average out the first paint on the sample where both were optimized (2.1 seconds), this time it's 2.8 seconds. So the optimization of the critical CSS with minimalcss roughly makes the first paint makes it 40% faster.


In the "First Meaningful Paint" is just one of many ways of measuring "success". I put that last word in quotation marks because this stuff is not trivial. Just because you manage to show anything to the user doesn't necessarily mean the user is happy if the user can't do what they want to do. And if you front-load a bunch of things with every trick in the book, you might have a lot of load in the background that might affect the scrolling with yank or flashing content.

I never really know which one to live by as a measure of success but the visual comparison timeline from Webpagetest definitely is fruitful and easy to understand. There is also "First Contentful Paint" which shows a slightly bigger difference between the optimized and the not-optimized pages.

For now, I'm going to call this a success. Adding minimalcss was a mixed bag of challenges. The execution of the script, on a server, requires the right amount of tooling and safeguards. After all, it depends on real web browser running inside a web server spawned from a background asynchronous message queue task with retries, logging, and metrics.

For one thing, if you just look at the visual comparisons and focus only on the rendered title the difference is not 40% faster. It's about 100% faster. That difference is explained by the word "meaningful" in First Meaningful Paint. The rest of the content is at the mercy of the banner ad and the remote loaded web fonts. For example, if you instead compare the "Time to Start Render" average timings of the optimized pages was 1.6 seconds vs 3.1 seconds for the not optimized pages. In other words, the improvement of First Meaningul Paint was 40% and the improvement of Time to Start Render was 93%.

Lastly, remember that what minimalcss (and other critical path CSS optimization tools) does is that it copies CSS from the .css files and includes it in the HTML document. That copying means the HTML document weighs 27KB more and still it wins.