Peterbe.com

A blog and website by Peter Bengtsson

Create a large empty file for testing

Linux

Because I always end up Googling this and struggling to find it easily, I'm going to jot it down here so it's more present on the web for others (and myself!) to quickly find.

Suppose you want to test something like a benchmark; for example, a unit test that has to process a largish file. You can use the dd command which is available on macOS and most Linuxes.

▶ dd if=/dev/zero of=big.file count=1024 bs=1024

▶ ls -lh big.file
-rw-r--r--  1 peterbe  staff   1.0M Sep  8 15:54 big.file

So the count=1024 creates a 1MB file. To create a 500KB one you simply use...

▶ dd if=/dev/zero of=big.file count=500 bs=1024

▶ ls -lh big.file
-rw-r--r--  1 peterbe  staff   500K Sep  8 15:55 big.file

It creates a binary file so you can't cat view it. But if you try to use less, for example, you'll see this:

▶ less big.file
"big.file" may be a binary file.  See it anyway? [Enter]

^@^@^@...snip...^@^@^@
big.file (END)

Please post a comment if you have thoughts or questions.

Programmatically render a NextJS page without a server in Node

Web development, Node, JavaScript

https://github.com/peterbe/programmatically-render-next-page

If you use getServerSideProps() in Next you can render a page by visiting it. E.g. GET http://localhost:3000/mypages/page1
Or if you use getStaticProps() with getStaticPaths(), you can use npm run build to generate the HTML file (e.g. .next/server/pages directory).
But what if you don't want to start a server. What if you have a particular page/URL in mind that you want to generate but without starting a server and sending an HTTP GET request to it? This blog post shows a way to do this with a plain Node script.

Here's a solution to programmatically render a page:

#!/usr/bin/env node

import http from "http";

import next from "next";

async function main(uris) {
  const nextApp = next({});
  const nextHandleRequest = nextApp.getRequestHandler();
  await nextApp.prepare();

  const htmls = Object.fromEntries(
    await Promise.all(
      uris.map((uri) => {
        try {
          // If it's a fully qualified URL, make it its pathname
          uri = new URL(uri).pathname;
        } catch {}
        return renderPage(nextHandleRequest, uri);
      })
    )
  );
  console.log(htmls);
}

async function renderPage(handler, url) {
  const req = new http.IncomingMessage(null);
  const res = new http.ServerResponse(req);
  req.method = "GET";
  req.url = url;
  req.path = url;
  req.cookies = {};
  req.headers = {};
  await handler(req, res);
  if (res.statusCode !== 200) {
    throw new Error(`${res.statusCode} on rendering ${req.url}`);
  }
  for (const { data } of res.outputData) {
    const [, body] = data.split("\r\n\r\n");
    if (body) return [url, body];
  }
  throw new Error("No output data has a body");
}

main(process.argv.slice(2)).catch((err) => {
  console.error(err);
  process.exit(1);
});

To demonstrate I created this sample repo: https://github.com/peterbe/programmatically-render-next-page

Note, that you need to run npm run build first so Next can have all the static assets ready.

In conclusion

The alternative, in automation, would be run something like this:

▶ npm run build && npm run start &
▶ sleep 5  # give the server a chance to start
▶ xh http://localhost:3000/aboutus
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Tue, 06 Sep 2022 12:23:42 GMT
Etag: "m8ff9sdduo1hk"
Keep-Alive: timeout=5
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Powered-By: Next.js

<!DOCTYPE html><html><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><title>About Us page</title><meta name="description" content="We do things. I hope."/><link rel="icon" href="/favicon.ico"/><meta name="next-head-count" content="5"/><link rel="preload" href="/_next/static/css/ab44ce7add5c3d11.css" as="style"/><link rel="stylesheet" href="/_next/static/css/ab44ce7add5c3d11.css" data-n-g=""/><link rel="preload" href="/_next/static/css/ae0e3e027412e072.css" as="style"/><link rel="stylesheet" href="/_next/static/css/ae0e3e027412e072.css" data-n-p=""/><noscript data-n-css=""></noscript><script defer="" nomodule="" src="/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js"></script><script src="/_next/static/chunks/webpack-7ee66019f7f6d30f.js" defer=""></script><script src="/_next/static/chunks/framework-db825bd0b4ae01ef.js" defer=""></script><script src="/_next/static/chunks/main-3123a443c688934f.js" defer=""></script><script src="/_next/static/chunks/pages/_app-deb173bd80cbaa92.js" defer=""></script><script src="/_next/static/chunks/996-f1475101e84cf548.js" defer=""></script><script src="/_next/static/chunks/pages/aboutus-41b1f037d974ef60.js" defer=""></script><script src="/_next/static/REJUWXI26y-lp9JVmzJB5/_buildManifest.js" defer=""></script><script src="/_next/static/REJUWXI26y-lp9JVmzJB5/_ssgManifest.js" defer=""></script></head><body><div id="__next"><div class="Home_container__bCOhY"><main class="Home_main__nLjiQ"><h1 class="Home_title__T09hD">About Use page</h1><p class="Home_description__41Owk"><a href="/">Go to the <b>Home</b> page</a></p></main><footer class="Home_footer____T7K"><a href="/">Home page</a></footer></div></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{}},"page":"/aboutus","query":{},"buildId":"REJUWXI26y-lp9JVmzJB5","nextExport":true,"autoExport":true,"isFallback":false,"scriptLoader":[]}</script></body></html>

There are probably many great ideas that this can be used for. At work we use getServerSideProps() and we have too many pages to build them all statically. We need a solution like this to do custom analysis of the rendered HTML to check for broken links by analyzing every generated <a href> tag.

Please post a comment if you have thoughts or questions.

Join a list with a bitwise or operator in Python

Python

The bitwise OR operator in Python is often convenient when you want to combine multiple things into one thing. For example, with the Django ORM you might do this:

from django.db.models import Q

filter_ = Q(first_name__icontains="peter") | Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

See how it hardcodes the filtering on strings peter and ashley.
But what if that was a bit more complicated:

from django.db.models import Q

filter_ = Q(first_name__icontains="peter")
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

So far, same functionality.

But what if the business logic is more complicated? You can't do this:

filter_ = None
if include("peter"):
    filter_ | = Q(first_name__icontains="peter")  # WILL NOT WORK
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

What if the list of things you want to filter on depends on a list? You'd need to do the |= stuff "dynamically". One way to solve that is with functools.reduce. Suppose the list of things you want to bitwise-OR together is a list:

from django.db.models import Q
from operator import or_
from functools import reduce


def include(_):
    import random
    return random.random() > 0.5

filters = []
if include("peter"):
    filters.append(Q(first_name__icontains="peter"))
if include("ashley"):
    filters.append(Q(first_name__icontains="ashley"))

assert len(filters), "must have at least one filter"
filter_ = reduce(or_, filters)  # THE MAGIC!

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

And finally, if it's a list already:

from django.db.models import Q
from operator import or_
from functools import reduce

names = ["peter", "ashley"]
qs = [Q(first_name__icontains=x) for x in names]
filter_ = reduce(or_, qs)

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

Side note

Django's django.db.models.Q is actually quite flexible with used with MyModel.objects.filter(...) because this actually works:

from django.db.models import Q

def include(_):
    import random
    return random.random() > 0.5

filter_ = Q()  # MAGIC SAUCE
if include("peter"):
    filter_ |= Q(first_name__icontains="peter")
if include("ashley"):
    filter_ |= Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

Please post a comment if you have thoughts or questions.

Comparing compression commands with hyperfine

Bash, MacOSX, Linux

Today I stumbled across a neat CLI for benchmark comparing CLIs for speed: hyperfine. By David @sharkdp Peter.
It's a great tool in your arsenal for quick benchmarks in the terminal.

It's written in Rust and is easily installed with brew install hyperfine. For example, let's compare a couple of different commands for compressing a file into a new compressed file. I know it's comparing apples and oranges but it's just an example:

hyperfine usage example
(click to see full picture)

It basically executes the following commands over and over and then compares how long each one took on average:

  • apack log.log.apack.gz log.log
  • gzip -k log.log
  • zstd log.log
  • brotli -3 log.log

If you're curious about the ~results~ apples vs oranges, the final result is:

▶ ls -lSh log.log*
-rw-r--r--  1 peterbe  staff    25M Jul  3 10:39 log.log
-rw-r--r--  1 peterbe  staff   2.4M Jul  5 22:00 log.log.apack.gz
-rw-r--r--  1 peterbe  staff   2.4M Jul  3 10:39 log.log.gz
-rw-r--r--  1 peterbe  staff   2.2M Jul  3 10:39 log.log.zst
-rw-r--r--  1 peterbe  staff   2.1M Jul  3 10:39 log.log.br

The point is that you type hyperfine followed by each command in quotation marks. The --prepare is run for each command and you can also use --cleanup="{cleanup command here}.

It's versatile so it doesn't have to be different commands but it can be: hyperfine "python optimization1.py" "python optimization2.py" to compare to Python scripts.

🎵 You can also export the output to a Markdown file. Here, I used:

▶ hyperfine "apack log.log.apack.gz log.log" "gzip -k log.log" "zstd log.log" "brotli -3 log.log" --prepare="rm -fr log.log.*" --export-markdown log.compress.md
▶ cat log.compress.md | pbcopy

and it becomes this:

Command Mean [ms] Min [ms] Max [ms] Relative
apack log.log.apack.gz log.log 291.9 ± 7.2 283.8 304.1 4.90 ± 0.19
gzip -k log.log 240.4 ± 7.3 232.2 256.5 4.03 ± 0.18
zstd log.log 59.6 ± 1.8 55.8 65.5 1.00
brotli -3 log.log 122.8 ± 4.1 117.3 132.4 2.06 ± 0.09

Please post a comment if you have thoughts or questions.

How to know if a PR has auto-merge enabled in a GitHub Action workflow

GitHub

tl;dr

      - name: Only if auto-merge is enabled
        if: ${{ github.event.pull_request.auto_merge }}
        run: echo "Auto-merge IS ENABLED"

      - name: Only if auto-merge is NOT enabled
        if: ${{ !github.event.pull_request.auto_merge }}
        run: echo "Auto-merge is NOT enabled"

The use case that I needed was that I have a workflow that does a bunch of things that aren't really critical to test the PR, but they also take a long time. In particular, every pull request deploys a "preview environment" so you get a "staging" site for each pull request. Well, if you know with confidence that you're not going to be clicking around on that preview/staging site, why bother deploying it (again)?

Also, a lot of PRs get the "Auto-merge" enabled because whoever pressed that button knows that as long as it builds OK, it's ready to merge in.

What's cool about the if: statements above is that they will work in all of these cases too:

on:
  workflow_dispatch:
  pull_request:
  push:
     branches:
       - main

I.e. if this runs because it was a push to main the line ${{ !github.event.pull_request.auto_merge }} will resolve to truthy. Same if you use the workflow dispatch from workflow_dispatch.

Please post a comment if you have thoughts or questions.

Auto-merge GitHub pull requests based on "partial required checks"

GitHub

Auto-merge is a fantastic GitHub Actions feature. You first need to set up some branch protections and then, as soon as you've created the PR you can press the "Enable auto-merge (squash)". It will ("Squash and merge") merge the PR as soon as all branch protection checks succeeded. Neat.

But what if you have a workflow that is made up of half critical and half not-so-important stuff. In particular, what if there's stuff in the workflow that is really slow and you don't want to wait. One example is that you might have a build-and-deploy workflow where you've decided that the "build" part of that is a required check, but the (slow) deployment is just a nice-to-have. Here's an example of that:

name: Build and Deploy stuff

on:
  workflow_dispatch:
  pull_request:


permissions:
  contents: read

jobs:
  build-stuff:
    runs-on: ubuntu-latest
    steps:
      - name: Slight delay
        run: sleep 5

  deploy-stuff:
    needs: build-stuff
    runs-on: ubuntu-latest
    steps:
      - name: Do something
        run: sleep 26

It's a bit artificial but perhaps you can see beyond that. What you can do is set up a required status check, as a branch protection, just for the build-stuff job.

Note how the job is made up of build-stuff and deploy-stuff, where the latter depends on the first. Now set up branch protection purely based on the build-stuff. This option should appear as you start typing buil there in the "Status checks that are required." section of Branch protections.

Branch protection

Now, when the PR is created it immediately starts working on that build-stuff job. While that's running you press the "Enable auto-merge (squash)" button:

Checks started

What will happen is that as soon as the build-stuff job (technically the full name becomes "Build and Deploy stuff / build-stuff") goes green, the PR is auto-merged. But the next (dependent) job deploy-stuff now starts so even if the PR is merged you still have an ongoing workflow job running. Note the little orange dot (instead of the green checkmark).

Still working

It's quite an advanced pattern and perhaps you don't have the use case yet, but it's good to know it's possible. What our use case at work was, was that we use auto-merge a lot in automation and our complete workflow depended on a slow step that is actually conditional (and a bit slow). So we didn't want the auto-merge to be delayed because of something that might be slow and might also turn out to not be necessary.

Please post a comment if you have thoughts or questions.