Peterbe.com

A blog and website by Peter Bengtsson

Run something forever in bash until you want to stop it

13 February 2018 0 comments   Linux


I often use this in various projects. I find it very useful. Thought I'd share to see if others find it useful.

Running something forever

Suppose you have some command that you want to run a lot. One way is to do this:

$ ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command && \
  ./manage.py run-some-command

That runs the command 10 times. Clunky but effective.

Another alternative is to hijack the watch command. By default it waits 2 seconds between each run but if the command takes longer than 2 seconds, it'll just wait. Running...

$ watch ./manage.py run-some-command

Is almost the same as running...:

$ clear && sleep 2 && ./manage.py run-some-command && \
  clear && sleep 2 && ./manage.py run-some-command && \
  clear && sleep 2 && ./manage.py run-some-command && \
  clear && sleep 2 && ./manage.py run-some-command && \
  clear && sleep 2 && ./manage.py run-some-command && \
  clear && sleep 2 && ./manage.py run-some-command && \
  ...
  ...forever until you Ctrl-C it...

But that's clunky too because you might not want it to clear the screen between each run and you get an un-necessary delay between each run.

The biggest problem is that with using watch or copy-n-paste the command many times with && between is that if you need to stop it you have to Ctrl-C and that might kill the command at a precious time.

A better solution

The important thing is that if you want to stop the command repeater, is that it gets to finish what it's working on at the moment.

Here's a great and simple solution:

#!/usr/bin/env bash
set -eo pipefail

_stopnow() {
  test -f stopnow && echo "Stopping!" && rm stopnow && exit 0 || return 0
}

while true
do
    _stopnow
    # Below here, you put in your command you want to run:

    ./manage.py run-some-command
done

Save that file as run-forever.sh and now you can do this:

$ bash run-forever.sh

It'll sit there and do its thing over and over. If you want to stop it (from another terminal):

$ touch stopnow

(the file stopnow will be deleted after it's spotted once)

Getting fancy

Instead of taking this bash script and editing it every time you need it to run a different command you can make it a globally available command. Here's how I do it:

#!/usr/bin/env bash
set -eo pipefail


count=0

_stopnow() {
    count="$(($count+1))"
    test -f stopnow && \
      echo "Stopping after $count iterations!" && \
      rm stopnow && exit 0 || return 0
}

control_c()
# run if user hits control-c
{
  echo "Managed to do $count iterations"
  exit $?
}

# trap keyboard interrupt (control-c)
trap control_c SIGINT

echo "To stop this forever loop created a file called stopnow."
echo "E.g: touch stopnow"
echo ""
echo "Now going to run '$@' forever"
echo ""
while true
do
    _stopnow

    eval $@

    # Do this in case you accidentally pass an argument
    # that finishes too quickly.
    sleep 1

done

This code in a Gist here.

Put this file in ~/bin/run-forever.sh and chmod +x ~/bin/run-forever.sh.

Now you can do this:

$ run-forever.sh ./manage.py run-some-command

If the command you want to run, forever, requires an operator you have to wrap everything in single quotation marks. For example:

$ run-forever.sh './manage.py run-some-command && echo "Cooling CPUs..." && sleep 10' 

Be very careful with your add_header in Nginx! You might make your site insecure

11 February 2018 0 comments   Nginx, Web development, Linux


tl;dr; When you use add_header in a location block in Nginx, it undoes all "parent" add_header directives. Dangerous!

Gist of the problem is this:

There could be several add_header directives. These directives are inherited from the previous level if and only if there are no add_header directives defined on the current level.

From the documentation on add_header

The grand but subtle mistake

Basically, I had this:

server {
    server_name example.com;

    ...gzip...
    ...ssl...
    ...root...

    # Great security headers...
    add_header X-Frame-Options SAMEORIGIN;
    add_header X-XSS-Protection "1; mode=block";
    ...more security headers...

    location / {
        try_files    $uri /index.html;
    }
}

And when you curl it, you can see that it works:

$ curl -I https://example.com
[snip]
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Strict-Transport-Security: max-age=63072000; includeSubdomains; preload

The mistake I had, was that I added a new add_header inside a relevant location block. If you do that, all the other "global" add_headers are dropped.
E.g.

server {
    server_name example.com;

    ...gzip...
    ...ssl...
    ...root...

    # Great security headers...
    add_header X-Frame-Options SAMEORIGIN;
    add_header X-XSS-Protection "1; mode=block";
    ...more security headers...

    location / {
        try_files    $uri /index.html;
        # NOTE! Adding some more headers here
+       add_header X-debug-whats-going-on on; 
    }
}

Now, same curl command:

$ curl -I https://example.com
[snip]
X-debug-whats-going-on: on

Bad score on Observatory for www.peterbe.com
Yikes! Now those other useful security headers are gone!

Here are your options:

  1. Don't add headers like that inside location blocks. Yeah, that's not always a choice.
  2. Copy-n-paste all the general security add_header blocks into the location blocks where you have to have "custom" add_header entries.
  3. Use an include file, see below.

How to include files

First create a new file, like /etc/nginx/snippets/general-security-headers.conf then put this into it:

# Great security headers...
add_header X-Frame-Options SAMEORIGIN;
add_header X-XSS-Protection "1; mode=block";
...more security headers...
# More realistically, see https://gist.github.com/plentz/6737338

Now, instead of saying these add_header lines in your /etc/nginx/sites-enabled/example.conf change that to:

server {
    server_name example.com;

    ...gzip...
    ...ssl...
    ...root...

    include /etc/nginx/snippets/general-security-headers.conf;

    location / {
        try_files    $uri /index.html;
        # Note! This gets included *again* because
        # this location block needs its own custom add_header
        # directives.
        include /etc/nginx/snippets/general-security-headers.conf;
        # NOTE! Adding some more headers here
        add_header X-debug-whats-going-on on; 
    }
}

(You need to use your imagination that a real Nginx config site probably has many different more complex location directives)

It's arguably a bit clunky but it works and it's the best of both worlds. The right security headers for all locations and ability to set custom add_header directives for specific locations.

Discussion

I'm most disappointed in myself for not noticing. Not for not noticing this in the Nginx documentation, but that I didn't check my security headers on more than one path. But I'm also quite disappointed in Nginx for this rather odd behaviour. To quote my security engineer at Mozilla, April King:

"add" doesn't usually mean "subtract everything else"

She agreed with me that the way it works is counter-intuitive and showed me this snippet which uses include files the same way.

Show size of every PostgreSQL database you have

07 February 2018 0 comments   PostgreSQL


tl;dr; SELECT pg_database.datname, pg_database_size(pg_database.datname), pg_size_pretty(pg_database_size(pg_database.datname)) FROM pg_database ORDER by 2 DESC;

I recently had to transfer all my postgres data for my local databases on the laptop. Although I, unfortunately, can't remember what it was before I deleted a bunch of databases, now after some clean up, all my PostgreSQL databases weighs 12GB.

To find out the size of all your database, start psql like this:

$ psql postgres

Then run this:

SELECT pg_database.datname, 
pg_database_size(pg_database.datname), 
pg_size_pretty(pg_database_size(pg_database.datname)) 
FROM pg_database ORDER by 2 DESC;

Here's what it looked like on my laptop:

postgres=# SELECT pg_database.datname,
postgres-# pg_database_size(pg_database.datname),
postgres-# pg_size_pretty(pg_database_size(pg_database.datname))
postgres-# FROM pg_database ORDER by 2 DESC;
         datname          | pg_database_size | pg_size_pretty
--------------------------+------------------+----------------
 songsearch               |      10689224876 | 10194 MB
 air_mozilla_org          |        355639812 | 339 MB
 kl                       |        239297028 | 228 MB
 kl2                      |        239256068 | 228 MB
 peterbecom               |        191914500 | 183 MB
 kintobench               |        125968556 | 120 MB
 airmozilla               |         41640452 | 40 MB
 socorro_webapp           |         32530948 | 31 MB
 tecken                   |         26706092 | 25 MB
 dailycookie              |         12935684 | 12 MB
 autocompeter             |         12313092 | 12 MB
 socorro_integration_test |         11428356 | 11 MB
 breakpad                 |         11313668 | 11 MB
 test_peterbecom          |          9298436 | 9081 kB
 battleshits              |          9028100 | 8817 kB
 thuawood2                |          8716804 | 8513 kB
 thuawood                 |          8667652 | 8465 kB
 fastestdb                |          8012292 | 7825 kB
 premailer                |          7676420 | 7497 kB
 crontabber               |          7586308 | 7409 kB
 postgres                 |          7536812 | 7360 kB
 crontabber_exampleapp    |          7488004 | 7313 kB
 whatsdeployed            |          7414276 | 7241 kB
 socorro_test             |          7315972 | 7145 kB
 template1                |          7307780 | 7137 kB
 template0                |          7143940 | 6977 kB
(26 rows)

Component, component function or plain function in React

06 February 2018 0 comments   ReactJS


tl;dr; Use React.PureComponent (or React.Component) if your component contains, or might contain, non-trivial logic that might affect it rendering or not. For all other cases, use a function, especially if it's not React specific.

Your choices

When you have state, especially good old this.state and this.setState you have to use React.PureComponent (or React.Component if you must).

For stateless functions, where you're just getting some props in, perhaps massaging them and rendering some JSX, you have choices.
You can write a React component in these three different ways:

Component

class MyComponent extends React.PureComponent {
  render() {
    return <h1>Hello {this.props.name}</h1>
  }
}

Component function

const MyComponent = ({ name }) => {
  return <h1>Hello {name}</h1>
}

Plain function

const MyComponent = name => {
  return <h1>Hello {name}</h1>
}

The first two can be used like this:

return (
  <div>
    <MyComponent name="Peter"/>
  </div>
)

The last one can be called directly:

return (
  <div>
    {MyComponent("Peter")}/>
  </div>
)

To be exact, you can actually call the second, component function, like this too:

return (
  <div>
    {MyComponent({name: "Peter"})}/>
  </div>
)

Example CodeSandbox here.

Each one has its strength and weaknesses.

Pros & cons for class MyComponent extends React.PureComponent

Pros & cons for const MyComponent = (...) =>

Benchmarking the difference

I don't know with confidence if this is the right way to test this but I really wanted to avoid process.env.NODE_ENV==='development' and I wanted to run each variant a bunch of times, because it feels more realistic, so as to avoid the slowness of the initial mounting.

So I made an app that looks like this:

class Components extends React.Component {
  render() {
    return <Component100 count={this.props.count} />;
  }
}

export default Components;

class Component100 extends React.PureComponent {
  render() {
    return <Component99 count={this.props.count} />;
  }
}
class Component99 extends React.PureComponent {
  render() {
    return <Component98 count={this.props.count} />;
  }
}

//...
//...you can imagine...
//...

class Component1 extends React.PureComponent {
  render() {
    return <Component0 count={this.props.count} />;
  }
}

class Component0 extends React.PureComponent {
  render() {
    collect('Components', performance.now());
    return <h1>Component0: {this.props.count}</h1>;
  }
}

This long chain of components calling "sub-components" starts right after the prop at the top changes. In the App that parents all of the variants, when the state changes the props change and it trickles down to that final last component. By taking a timestamp right before changing the state and during that last render you get a rough timeline for how long the whole chain took to render.

See the variants here:

Perhaps it's best to skim the code of the App.js too. It's a bit messy and there's a bunch of whacky code that uses global window to log all the timestamps but the gist is that it measures the few milliseconds it takes before a re-render is triggered until the final components render function gets called.

The app has a little hacky interval function that randomly switches between the different variants every 2 seconds and every 300 milliseconds it clicks a button, which changes the state which triggers a re-render.

Benchmark results

Results

Component style Median Comparison
Components 3.46ms 100%
ComponentFunctions 3.04ms 14% faster
Functions 2.02ms 71% faster

This was done using React 16.2.0 with process.env.NODE_ENV === 'production' in Firefox 60.

Sample app
You can try for yourself here: https://peterbe.github.io/function-or-component/

It might break when you click Reset. If it doesn't work very well in github.io, just download it and test locally.

Discussion

Here's my rule of thumb, the life-cycle hooks are awesome. I often write a component, using ...extends React.PureComponent even though it could be a plain function. But over time, eventually you expand it and realize you need some life cycle hook. Or you might find that writing inline functions is getting messy. Or, you realize that this component is sometimes unnecessarily called by a more complicated parent, with the same props as last time!

The performance penalty, for using full React components, is small. It exists, but it's probably dwarfed by other costs such as mounting not to mention actual DOM updating. It's also very likely that your components could benefit more from avoiding render (which only shouldComponentUpdate really can do) than the cost of calling it. Meaning, if the slower component only has to render 500 times, marginally slower, than the function component rendering 1,000 times, then the slower sometimes-not-needing-to-render will eventually win the performance battle.

There is still value in the functional stateless component. See the pros & cons above. But one rule of thumb I have is that if the component is really simple and contains no fancy logic that might affect its rendering or not rendering, then use components as functions. They're "sending a message" (to the code reader) by being brief and simple. For example, I have this little snippet in my Common.js module:

export const formatFileSize = (bytes, decimals = 0) => {
  if (!bytes) return '0 bytes'
  const k = 1024
  const dm = decimals + 1 || 3
  const sizes = ['bytes', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB']
  const i = Math.floor(Math.log(bytes) / Math.log(k))
  return parseFloat((bytes / Math.pow(k, i)).toFixed(dm)) + ' ' + sizes[i]
}

It's got nothing to do with React and that becomes extra obvious simply my looking at it. It's cleary got just one job. It's used a lot and often by more complicated components.

Last but not least; I'm very aware that the much more experienced React gurus of the world have already said something similar but with more accuracy. But I didn't want to just blurt out my opinion without adding some meat and some numbers to it. And I've always disliked the confusion that there's a choice at all so hopefully this blog post will help someone else who still suffers from having to wonder when to use which.

This tweet sums it up well:
Craig Kerstiens tweet

Convert web page to PDF, nicely

04 February 2018 0 comments   Web development, Misc. links

http://pdf.fivefilters.org/simple-print/


I saw about this service on Hacker News and I'm impressed. It can convert any article-like web page into a PDF. Not the first time we've seen that but this service really gets it right.

Here's example, when I print one of my own blog posts:

Blog post on Simple Print
Blog post on Simple Print. PDF download

Blog post via Firefox print to PDF
Regular print to PDF

Does that look scrumptious? It drops one of the images but it really gets the layout right.

I'm not sure this beats the neat integration that Pocket has but it certainly is a nice hack. Which reminds me, I really need to improve my print.css stylesheet.

Fastest way to unzip a zip file in Python

31 January 2018 10 comments   Python


So the context is this; a zip file is uploaded into a web service and Python then needs extract that and analyze and deal with each file within. In this particular application what it does is that it looks at the file's individual name and size, compares that to what has already been uploaded in AWS S3 and if the file is believed to be different or new, it gets uploaded to AWS S3.

Uploads today
The challenge is that these zip files that come in are huuuge. The average is 560MB but some are as much as 1GB. Within them, there are mostly plain text files but there are some binary files in there too that are huge. It's not unusual that each zip file contains 100 files and 1-3 of those make up 95% of the zip file size.

At first I tried unzipping the file, in memory, and deal with one file at a time. That failed spectacularly with various memory explosions and EC2 running out of memory. I guess it makes sense. First you have the 1GB file in RAM, then you unzip each file and now you have possibly 2-3GB all in memory. So, the solution, after much testing, was to dump the zip file to disk (in a temporary directory in /tmp) and then iterate over the files. This worked much better but I still noticed the whole unzipping was taking up a huge amount of time. Is there perhaps a way to optimize that?

Baseline function

First it's these common functions that simulate actually doing something with the files in the zip file:

def _count_file(fn):
    with open(fn, 'rb') as f:
        return _count_file_object(f)

def _count_file_object(f):
    # Note that this iterates on 'f'.
    # You *could* do 'return len(f.read())'
    # which would be faster but potentially memory 
    # inefficient and unrealistic in terms of this 
    # benchmark experiment. 
    total = 0
    for line in f:
        total += len(line)
    return total

Here's the simplest one possible:

def f1(fn, dest):
    with open(fn, 'rb') as f:
        zf = zipfile.ZipFile(f)
        zf.extractall(dest)

    total = 0
    for root, dirs, files in os.walk(dest):
        for file_ in files:
            fn = os.path.join(root, file_)
            total += _count_file(fn)
    return total

If I analyze it a bit more carefully, I find that it spends about 40% doing the extractall and 60% doing the looping over files and reading their full length.

First attempt

My first attempt was to try to use threads. You create an instance of zipfile.ZipFile, extract every file name within and start a thread for each name. Each thread is given a function that does the "meat of the work" (in this benchmark, iterating over the file and getting its total size). In reality that function does a bunch of complicated S3, Redis and PostgreSQL stuff but in my benchmark I just made it a function that figures out the total length of file. The thread pool function:

def f2(fn, dest):

    def unzip_member(zf, member, dest):
        zf.extract(member, dest)
        fn = os.path.join(dest, member.filename)
        return _count_file(fn)

    with open(fn, 'rb') as f:
        zf = zipfile.ZipFile(f)
        futures = []
        with concurrent.futures.ThreadPoolExecutor() as executor:
            for member in zf.infolist():
                futures.append(
                    executor.submit(
                        unzip_member,
                        zf,
                        member,
                        dest,
                    )
                )
            total = 0
            for future in concurrent.futures.as_completed(futures):
                total += future.result()
    return total

Result: ~10% faster

Second attempt

So perhaps the GIL is blocking me. The natural inclination is to try to use multiprocessing to spread the work across multiple available CPUs. But doing so has the disadvantage that you can't pass around a non-pickleable object so you have to send just the filename to each future function:

def unzip_member_f3(zip_filepath, filename, dest):
    with open(zip_filepath, 'rb') as f:
        zf = zipfile.ZipFile(f)
        zf.extract(filename, dest)
    fn = os.path.join(dest, filename)
    return _count_file(fn)



def f3(fn, dest):
    with open(fn, 'rb') as f:
        zf = zipfile.ZipFile(f)
        futures = []
        with concurrent.futures.ProcessPoolExecutor() as executor:
            for member in zf.infolist():
                futures.append(
                    executor.submit(
                        unzip_member_f3,
                        fn,
                        member.filename,
                        dest,
                    )
                )
            total = 0
            for future in concurrent.futures.as_completed(futures):
                total += future.result()
    return total

Result: ~300% faster

That's cheating!

The problem with using a pool of processors is that it requires that the original .zip file exists on disk. So in my web server, to use this solution, I'd first have to save the in-memory ZIP file to disk, then invoke this function. Not sure what the cost of that it's not likely to be cheap.

Well, it doesn't hurt to poke around. Perhaps, it could be worth it if the extraction was significantly faster.

But remember! This optimization depends on using up as many CPUs as it possibly can. What if some of those other CPUs are needed for something else going on in gunicorn? Those other processes would have to patiently wait till there's a CPU available. Since there's other things going on in this server, I'm not sure I'm willing to let on process take over all the other CPUs.

Conclusion

Doing it serially turns out to be quite nice. You're bound to one CPU but the performance is still pretty good. Also, just look at the difference in the code between f1 and f2! With concurrent.futures pool classes you can cap the number of CPUs it's allowed to use but that doesn't feel great either. What if you get the number wrong in a virtual environment? Of if the number is too low and don't benefit any from spreading the workload and now you're just paying for overheads to move the work around?

I'm going to stick with zipfile.ZipFile(file_buffer).extractall(temp_dir). It's good enough for this.

Want to try your hands on it?

I did my benchmarking using a c5.4xlarge EC2 server. The files can be downloaded from:

wget https://www.peterbe.com/unzip-in-parallel/hack.unzip-in-parallel.py
wget https://www.peterbe.com/unzip-in-parallel/symbols-2017-11-27T14_15_30.zip

The .zip file there is 34MB which is relatively small compared to what's happening on the server.

The hack.unzip-in-parallel.py is a hot mess. It contains a bunch of terrible hacks and ugly stuff but hopefully it's a start.

Make .local domains NOT slow in macOS

29 January 2018 0 comments   MacOSX, Linux


Problem

I used to have a bunch of domains in /etc/hosts like peterbecom.dev for testing Nginx configurations locally. But then it became impossible to test local sites in Chrome because an .dev is force redirected to HTTPS. No problem, so I use .local instead. However, DNS resolution was horribly slow. For example:

time curl -I http://peterbecom.local/about/minimal.css > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  1763    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
curl -I http://peterbecom.local/about/minimal.css > /dev/null  0.01s user 0.01s system 0% cpu 5.585 total

5.6 seconds to open a local file in Nginx.

Solution

Here's that one weird trick to solve it: Add an entry for IPv4 AND IPv6 in /etc/hosts.

So now I have:

▶ cat /etc/hosts | grep peterbecom
127.0.0.1       peterbecom.local
::1             peterbecom.local

Verification

Ah! Much better. Thing are fast again:

time curl -I http://peterbecom.local/about/minimal.css > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  1763    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl -I http://peterbecom.local/about/minimal.css > /dev/null  0.01s user 0.01s system 37% cpu 0.041 total

0.04 seconds instead of 5.6.

Even more aggressively trying to preload your next page load

22 January 2018 2 comments   Javascript, Web development


In 2014 I tried out an experiment to "Aggressively prefetching everything you might click". It was received with mixed reviews. Today, 4 years later, I stand by that experiment/solution and I even like it so much that I've decided to extend it.

How it works

The gist of the solution is that if you mouse hover over an internal link, with a 200ms delay, an XHR request is made to that URL as a simple GET. Suppose the XHR finishes loading in, say 300ms, and you eventually click the link, by the time it tries to load it, it loads it straight from your browser cache. You get that "instant load" feel and it makes navigating the site more enjoyable. Suppose that you're really fast with your mouse/trackpad and you click the link faster than 500ms (but slower than 200ms) the XHR request gets automatically cancelled by the browser. When your browser loads the new page, it basically has to start from scratch. No harm done. Just not as fast.

Sure, there is a chance that you hover over a link, and stay hovering for more than 200ms but then decide to not click on it. Then the XHR preload was a waste of resources.
But!! If you even have a mouse cursor, the chances that you're on a WiFi connected laptop.

None of this "kicks in" when you're on a mobile device. The onMouseOver event won't trigger. And, I dare to say that only on mobile devices does it strongly matter to reduce the stuff the client has to download. So what's the harm of forcing your laptop to download a couple of extra kilobytes? If you hover over the link, the chances are, after all, that you will click the link.

Even more aggressive

Today I decided to step it up even more. Now, after the HTML has been downloaded, the HTML downloaded is scanned with a regular expression for image URLs that sit on my CDN (where I host all images with far-future cache headers). The first 5 image URLs are preloaded so that when you eventually make that link click, not only is the page load instant, but most images are too.

What do you think? Too aggressive or genius?

Before hovering
Before hovering over the "About" link

After hovering
After hovering over the "About" link

Now, if I go ahead and make the click, the HTML load will be instant and the first 3 images will be instant too.

Show me the code!

It ain't pretty but it works: prefetcher.js

Yes, it's jQuery and I'm OK with that. Yes, the CDN domain name is hardcoded and if this was a work project I'd never do that. Heck, the ultimate reason I'm blogging about this is ultimately to share/teach. When you build something similar you can do it more robustly.

minimalcss 0.6.2 now strips all unused font faces

22 January 2018 0 comments   Node, Javascript, Web development


minimalcss is a Node API and cli app to analyze the minimal CSS needed for initial load. One of it's killer features is that all CSS parsing is done the "proper way". Meaning, it's reduced down to an AST that can be iterated over, mutated and serialized back to CSS as a string.

Thanks to this, together with my contributors @stereobooster and @lahmatiy, minimalcss can now figure out which @font-face rules are redundant and can be "safely" removed. It can make a big difference on web performance. Either because it prevents expensive network requests of downloading some https://fonts.gstatic.com/s/lato/v14/hash.woff2 or downloading base64 encoded fonts.

For example, this very blog uses Semantic UI which is a wonderful CSS framework. But it's quite expensive and contains a bunch of base64 encoded fonts. The Ratings module uses a @font-face rule that weighes about 15KB.

Sure, you don't have to download and insert semanticui.min.css in your HTML but it's just sooo convenient. Especially when there's tools like minimalcss that allows you to be "lazy" but get that perfect first load web performance thing.
So, the CSS when doing a search looks like this:

Unoptimized
126KB of CSS (gzipped) transferred and 827KB of CSS parsed.

Let's run this through minimalcss instead:

$ minimalcss.js --verbose -o /tmp/peterbe.search.css "https://www.peterbe.com/search?q=searching+for+something"
$ ls -lh /tmp/peterbe.search.css
-rw-r--r--  1 peterbe  wheel    27K Jan 22 09:59 /tmp/peterbe.search.css
$ head -n 14 /tmp/peterbe.search.css
/*
Generated 2018-01-22T14:59:05.871Z by minimalcss.
Took 4.43 seconds to generate 26.85 KB of CSS.
Based on 3 stylesheets totalling 827.01 KB.
Options: {
  "urls": [
    "https://www.peterbe.com/search?q=searching+for+something"
  ],
  "debug": false,
  "loadimages": false,
  "withoutjavascript": false,
  "viewport": null
}
*/

And let's simulate it being gzipped:

$ gzip /tmp/peterbe.search.css
$ ls -lh /tmp/peterbe.search.css.gz
-rw-r--r--  1 peterbe  wheel   6.0K Jan 22 09:59 /tmp/peterbe.search.css.gz

Wow! Instead of downloading 27KB you only need 6KB. CSS parsing isn't as expensive as JavaScript parsing but it's nevertheless a saving of 827KB - 27KB = 800KB of CSS for the browser to not have to worry about. That's awesome!

By the way, the produced minimal CSS contains a lot of license preamble as left over from the fact that the semanticui.min.css is made up of components. See the gist itself.
Out of the total size of 27KB (uncompressed) 8KB is just the license preambles. minimalcss does not attempt to touch that when it minifies but you could easily add your own little tooling to re-write it, since there's a lot of repetition and save another ~7KB. However, all that repetition compresses well so it might not be worth it.

Conditional aggregation in Django 2.0

12 January 2018 4 comments   PostgreSQL, Django, Python


Django 2.0 came out a couple of weeks ago. It now supports "conditional aggregation" which is SQL standard I didn't even know about.

Before

So I have a Django app which has an endpoint that generates some human-friendly stats about the number of uploads (and their total size) in various different time intervals.

First of all, this is how it set up the time intervals:

today = timezone.now()
start_today = today.replace(hour=0, minute=0, second=0)
start_yesterday = start_today - datetime.timedelta(days=1)
start_this_month = today.replace(day=1)
start_this_year = start_this_month.replace(month=1)

And then, for each of these, there's a little function that returns a dict for each time interval:

def count_and_size(qs, start, end):
    sub_qs = qs.filter(created_at__gte=start, created_at__lt=end)
    return {
        'count': sub_qs.count(),
        'total_size': sub_qs.aggregate(size=Sum('size'))['size'],
}

numbers['uploads'] = {
    'today': count_and_size(upload_qs, start_today, today),
    'yesterday': count_and_size(upload_qs, start_yesterday, start_today),
    'this_month': count_and_size(upload_qs, start_this_month, today),
    'this_year': count_and_size(upload_qs, start_this_year, today),
}

What you get is exactly 2 x 4 = 8 queries. One COUNT and one SUM for each time interval. E.g.

SELECT SUM("upload_upload"."size") AS "size" 
FROM "upload_upload" 
WHERE ("upload_upload"."created_at" >= ...

SELECT COUNT(*) AS "__count" 
FROM "upload_upload" 
WHERE ("upload_upload"."created_at" >= ...

...6 more queries...

Middle

Oops. I think this code comes from a slightly rushed job. We can do the COUNT and the SUM at the same time for each query.

# New, improved count_and_size() function!
def count_and_size(qs, start, end):
    sub_qs = qs.filter(created_at__gte=start, created_at__lt=end)
    return sub_qs.aggregate(
        count=Count('id'),
        total_size=Sum('size'),
    )

numbers['uploads'] = {
    'today': count_and_size(upload_qs, start_today, today),
    'yesterday': count_and_size(upload_qs, start_yesterday, start_today),
    'this_month': count_and_size(upload_qs, start_this_month, today),
    'this_year': count_and_size(upload_qs, start_this_year, today),
}

Much better, now there's only one query per time bucket. So 4 queries in total. E.g.

SELECT COUNT("upload_upload"."id") AS "count", SUM("upload_upload"."size") AS "total_size" 
FROM "upload_upload" 
WHERE ("upload_upload"."created_at" >= ...

...3 more queries...

After

But we can do better than that! Instead, we use conditional aggregation. The syntax gets a bit hairy because there's so many keyword arguments, but I hope I've indented it nicely so it's easy to see how it works:

def make_q(start, end):
    return Q(created_at__gte=start, created_at__lt=end)

q_today = make_q(start_today, today)
q_yesterday = make_q(start_yesterday, start_today)
q_this_month = make_q(start_this_month, today)
q_this_year = make_q(start_this_year, today)

aggregates = upload_qs.aggregate(
    today_count=Count('pk', filter=q_today),
    today_total_size=Sum('size', filter=q_today),

    yesterday_count=Count('pk', filter=q_yesterday),
    yesterday_total_size=Sum('size', filter=q_yesterday),

    this_month_count=Count('pk', filter=q_this_month),
    this_month_total_size=Sum('size', filter=q_this_month),

    this_year_count=Count('pk', filter=q_this_year),
    this_year_total_size=Sum('size', filter=q_this_year),
)
numbers['uploads'] = {
    'today': {
        'count': aggregates['today_count'],
        'total_size': aggregates['today_total_size'],
    },
    'yesterday': {
        'count': aggregates['yesterday_count'],
        'total_size': aggregates['yesterday_total_size'],
    },
    'this_month': {
        'count': aggregates['this_month_count'],
        'total_size': aggregates['this_month_total_size'],
    },
    'this_year': {
        'count': aggregates['this_year_count'],
        'total_size': aggregates['this_year_total_size'],
    },
}

Voila! One single query to get all those pieces of data.
The SQL sent to PostgreSQL looks something like this:

SELECT 
  COUNT("upload_upload"."id") FILTER (WHERE ("upload_upload"."created_at" >= ...)) AS "today_count", 
  SUM("upload_upload"."size") FILTER (WHERE ("upload_upload"."created_at" >= ...)) AS "today_total_size",

  COUNT("upload_upload"."id") FILTER (WHERE ("upload_upload"."created_at" >= ...)) AS "yesterday_count", 
  SUM("upload_upload"."size") FILTER (WHERE ("upload_upload"."created_at" >= ...)) AS "yesterday_total_size", 

  ...

FROM "upload_upload";

Is this the best thing to do? I'm starting to have my doubts.

Watch Out!

When I take this now 1 monster query for a spin with an EXPLAIN ANALYZE prefix I notice something worrying!

                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=74.33..74.34 rows=1 width=16) (actual time=0.587..0.587 rows=1 loops=1)
   ->  Seq Scan on upload_upload  (cost=0.00..62.13 rows=813 width=16) (actual time=0.012..0.210 rows=813 loops=1)
 Planning time: 0.427 ms
 Execution time: 0.674 ms
(4 rows)

A sequential scan! That's terrible. The created_at column is indexed in a BTREE so why can't it use the index.

The short answer is: I don't know!
I've uploaded a reduced, but still complete, example demonstrating this in a gist. It's very similar to the example in the stackoverflow question I asked.

So what did I do? I went back to the "middle" solution. One SELECT query per time bucket. So 4 queries in total, but at least all 4 is able to use an index.