How to restore all unstaged files in with git

February 8, 2024
0 comments GitHub, MacOSX, Linux

tl;dr git restore -- .

I can't believe I didn't know this! Maybe, at one point, I did, but, since forgotten.

You're in a Git repo and you have edited 4 files and run git status and see this:


❯ git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   four.txt
    modified:   one.txt
    modified:   three.txt
    modified:   two.txt

no changes added to commit (use "git add" and/or "git commit -a")

Suppose you realize; "Oh no! I didn't mean to make those changes in three.txt" You can restore that file by mentioning it by name:


❯ git restore three.txt

❯ git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   four.txt
    modified:   one.txt
    modified:   two.txt

no changes added to commit (use "git add" and/or "git commit -a")

Now, suppose you realize you want to all of those modified files. How do you restore them all without mentioning each and every one by name. Simple:


❯ git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   four.txt
    modified:   one.txt
    modified:   two.txt

no changes added to commit (use "git add" and/or "git commit -a")

❯ git restore -- .

❯ git status
On branch main
nothing to commit, working tree clean

The "trick" is: git restore -- .

As far as I understand restore is the new word for checkout. You can equally run git checkout -- . too.

How slow is Node to Brotli decompress a file compared to not having to decompress?

January 19, 2024
3 comments Node, MacOSX, Linux

tl;dr; Not very slow.

At work, we have some very large .json that get included in a Docker image. The Node server then opens these files at runtime and displays certain data from that. To make the Docker image not too large, we compress these .json files at build-time. We compress the .json files with Brotli to make a .json.br file. Then, in the Node server code, we read them in and decompress them at runtime. It looks something like this:


export function readCompressedJsonFile(xpath) {
  return JSON.parse(brotliDecompressSync(fs.readFileSync(xpath)))
}

The advantage of compressing them first, at build time, which is GitHub Actions, is that the Docker image becomes smaller which is advantageous when shipping that image to a registry and asking Azure App Service to deploy it. But I was wondering, is this a smart trade-off? In a sense, why compromise on runtime (which faces users) to save time and resources at build-time, which is mostly done away from the eyes of users? The question was; how much overhead is it to have to decompress the files after its data has been read from disk to memory?

The benchmark

The files I test with are as follows:

ls -lh pageinfo*
-rw-r--r--  1 peterbe  staff   2.5M Jan 19 08:48 pageinfo-en-ja-es.json
-rw-r--r--  1 peterbe  staff   293K Jan 19 08:48 pageinfo-en-ja-es.json.br
-rw-r--r--  1 peterbe  staff   805K Jan 19 08:48 pageinfo-en.json
-rw-r--r--  1 peterbe  staff   100K Jan 19 08:48 pageinfo-en.json.br

There are 2 groups:

  1. Only English (en)
  2. 3 times larger because it has English, Japanese, and Spanish

And for each file, you can see the effect of having compressed them with Brotli.

  1. The smaller JSON file compresses 8x
  2. The larger JSON file compresses 9x

Here's the benchmark code:


import fs from "fs";
import { brotliDecompressSync } from "zlib";
import { Bench } from "tinybench";

const JSON_FILE = "pageinfo-en.json";
const BROTLI_JSON_FILE = "pageinfo-en.json.br";
const LARGE_JSON_FILE = "pageinfo-en-ja-es.json";
const BROTLI_LARGE_JSON_FILE = "pageinfo-en-ja-es.json.br";

function f1() {
  const data = fs.readFileSync(JSON_FILE, "utf8");
  return Object.keys(JSON.parse(data)).length;
}

function f2() {
  const data = brotliDecompressSync(fs.readFileSync(BROTLI_JSON_FILE));
  return Object.keys(JSON.parse(data)).length;
}

function f3() {
  const data = fs.readFileSync(LARGE_JSON_FILE, "utf8");
  return Object.keys(JSON.parse(data)).length;
}

function f4() {
  const data = brotliDecompressSync(fs.readFileSync(BROTLI_LARGE_JSON_FILE));
  return Object.keys(JSON.parse(data)).length;
}

console.assert(f1() === 2633);
console.assert(f2() === 2633);
console.assert(f3() === 7767);
console.assert(f4() === 7767);

const bench = new Bench({ time: 100 });
bench.add("f1", f1).add("f2", f2).add("f3", f3).add("f4", f4);
await bench.warmup(); // make results more reliable, ref: https://github.com/tinylibs/tinybench/pull/50
await bench.run();

console.table(bench.table());

Here's the output from tinybench:

┌─────────┬───────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name │ ops/sec │ Average Time (ns)  │  Margin  │ Samples │
├─────────┼───────────┼─────────┼────────────────────┼──────────┼─────────┤
│    0    │   'f1'    │  '179'  │  5563384.55941942  │ '±6.23%' │   18    │
│    1    │   'f2'    │  '150'  │ 6627033.621072769  │ '±7.56%' │   16    │
│    2    │   'f3'    │  '50'   │ 19906517.219543457 │ '±3.61%' │   10    │
│    3    │   'f4'    │  '44'   │ 22339166.87965393  │ '±3.43%' │   10    │
└─────────┴───────────┴─────────┴────────────────────┴──────────┴─────────┘

Note, this benchmark is done on my 2019 Intel MacBook Pro. This disk is not what we get from the Apline Docker image (running inside Azure App Service). To test that would be a different story. But, at least we can test it in Docker locally.

I created a Dockerfile that contains...

ARG NODE_VERSION=20.10.0

FROM node:${NODE_VERSION}-alpine

and run the same benchmark in there by running docker composite up --build. The results are:

┌─────────┬───────────┬─────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name │ ops/sec │ Average Time (ns)  │  Margin  │ Samples │
├─────────┼───────────┼─────────┼────────────────────┼──────────┼─────────┤
│    0    │   'f1'    │  '151'  │ 6602581.124978315  │ '±1.98%' │   16    │
│    1    │   'f2'    │  '112'  │  8890548.4166656   │ '±7.42%' │   12    │
│    2    │   'f3'    │  '44'   │ 22561206.40002191  │ '±1.95%' │   10    │
│    3    │   'f4'    │  '37'   │ 26979896.599974018 │ '±1.07%' │   10    │
└─────────┴───────────┴─────────┴────────────────────┴──────────┴─────────┘

Analysis/Conclusion

First, focussing on the smaller file: Processing the .json is 25% faster than the .json.br file

Then, the larger file: Processing the .json is 16% faster than the .json.br file

So that's what we're paying for a smaller Docker image. Depending on the size of the .json file, your app runs ~20% slower at this operation. But remember, as a file on disk (in the Docker image), it's ~8x smaller.

I think, in conclusion: It's a small price to pay. It's worth doing. Your context depends.
Keep in mind the numbers there to process that 300KB pageinfo-en-ja-es.json.br file, it was able to do that 37 times in one second. That means it took 27 milliseconds to process that file!

The caveats

To repeat, what was mentioned above: This was run in my Intel MacBook Pro. It's likely to behave differently in a real Docker image running inside Azure.

The thing that I wonder the most about is arguably something that actually doesn't matter. 🙃
When you ask it to read in a .json.br file, there's less data to ask from the disk into memory. That's a win. You lose on CPU work but gain on disk I/O. But only the end net result matters so in a sense that's just an "implementation detail".

Admittedly, I don't know if the macOS or the Linux kernel does things with caching the layer between the physical disk and RAM for these files. The benchmark effectively asks "Hey, hard disk, please send me a file called ..." and this could be cached in some layer beyond my knowledge/comprehension. In a real production server, this only happens once because once the whole file is read, decompressed, and parsed, it won't be asked for again. Like, ever. But in a benchmark, perhaps the very first ask of the file is slower and all the other runs are unrealistically faster.

Feel free to clone https://github.com/peterbe/reading-json-files and mess around to run your own tests. Perhaps see what effect async can have. Or perhaps try it with Bun and it's file system API.

Search hidden directories with ripgrep, by default

December 30, 2023
0 comments MacOSX, Linux

Do you use rg (ripgrep) all the time on the command line? Yes, so do I. An annoying problem with it is that, by default, it does not search hidden directories.

"A file or directory is considered hidden if its base name starts with a dot character (.)."

One such directory, that is very important in my git/GitHub-based projects (which is all of mine by the way) is the .github directory. So I cd into a directory and it finds nothing:


cd ~/dev/remix-peterbecom
rg actions/setup-node
# Empty! I.e. no results

It doesn't find anything because the file .github/workflows/test.yml is part of a hidden directory.

The quick solution to this is to use --hidden:


❯ rg --hidden actions/setup-node
.github/workflows/test.yml
20:        uses: actions/setup-node@v4

I find it very rare that I would not want to search hidden directories. So I added this to my ~/.zshrc file:


alias rg='rg --hidden'

Now, this happens:


❯ rg actions/setup-node
.github/workflows/test.yml
20:        uses: actions/setup-node@v4

With that being set, it's actually possible to "undo" the behavior. You can use --no-hidden


❯ rg --no-hidden actions/setup-node

And that can useful if there is a hidden directory that is not git ignored yet. For example .download-cache/.

fnm is much faster than nvm.

December 28, 2023
1 comment Node, MacOSX

I used nvm so that when I cd into a different repo, it would automatically load the appropriate version of node (and npm). Simply by doing cd ~/dev/remix-peterbecom, for example, it would make the node executable to become whatever the value of the optional file ~/dev/remix-peterbecom/.nvmrc's content. For example v18.19.0.
And nvm helps you install and get your hands on various versions of node to be able to switch between. Much more fine-tuned than brew install node20.

The problem with all of this is that it's horribly slow. Opening a new terminal is annoyingly slow because that triggers the entering of a directory and nvm slowly does what it does.

The solution is to ditch it and go for fnm instead. Please, if you're an nvm user, do consider making this same jump in 2024.

Installation

Running curl -fsSL https://fnm.vercel.app/install | bash basically does some brew install and figuring out what shell you have and editing your shell config. By default, it put:


export PATH="/Users/peterbe/Library/Application Support/fnm:$PATH"
eval "`fnm env`"

...into my .zshrc file. But, I later learned you need to edit the last line to:


-eval "`fnm env`"
+eval "$(fnm env --use-on-cd)"

so that it automatically activates immediately after you've cd'ed into a directory.
If you had direnv to do this, get rid of that. fmn does not need direnv.

Now, create a fresh new terminal and it should be set up, including tab completion. You can test it by typing fnm[TAB]. You'll see:


❯ fnm
alias                   -- Alias a version to a common name
completions             -- Print shell completions to stdout
current                 -- Print the current Node.js version
default                 -- Set a version as the default version
env                     -- Print and set up required environment variables for fnm
exec                    -- Run a command within fnm context
help                    -- Print this message or the help of the given subcommand(s)
install                 -- Install a new Node.js version
list         ls         -- List all locally installed Node.js versions
list-remote  ls-remote  -- List all remote Node.js versions
unalias                 -- Remove an alias definition
uninstall               -- Uninstall a Node.js version
use                     -- Change Node.js version

Usage

If you had .nvmrc files sprinkled about from before, fnm will read those. If you cd into a directory, that contains .nvmrc, whose version fnm hasn't installed, yet, you get this:


❯ cd ~/dev/GROCER/groce/
Can't find an installed Node version matching v16.14.2.
Do you want to install it? answer [y/N]:

Neat!

But if you want to set it up from scratch, go into your directory of choice, type:


fnm ls-remote

...to see what versions of node you can install. Suppose you want v20.10.0 in the current directory do these two commands:


fnm install v20.10.0
echo v20.10.0 > .node-version

That's it!

Notes

  • I prefer that .node-version convention so I've been going around doing mv .nvmrc .node-version in various projects

  • fnm ls is handy to see which ones you've installed already

  • Suppose you want to temporarily use a specific version, simply type fnm use v16.20.2 for example

  • I heard good things about volta too but got a bit nervous when I found out it gets involved in installing packages and not just versions of node.

  • fnm does not concern itself with upgrading your node versions. To get the latest version of node v21.x, it's up to you to check fnm ls-remote and compare that with the output of node --version.

Be careful with Date.toLocaleDateString() in JavaScript

May 8, 2023
4 comments Node, MacOSX, JavaScript

tl;dr; Always pass timeZone:"UTC" when calling Date.toLocaleDateString

The surprise

In my browser's web console:

>>> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
"26"

On my server located in the same time zone:

Welcome to Node.js v16.13.0.
Type ".help" for more information.
> process.env.TZ
undefined
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'26'

Here on my laptop:

Welcome to Node.js v16.20.0.
Type ".help" for more information.
> process.env.TZ
undefined
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'27'

What! Despite $TZ not being set, it formats according to something else.

02:50 Zulu means, to me, in the US Eastern time zone, the day before.

Why this matters

Web console server React errors
I kept getting this production error from React that the SSR-rendered HTML differed from the client-side rendered HTML. Strangely, I could never reproduce this locally and the error doesn't say what's different. All the Stack Overflow suggestions and Google results speak of the most basic easy things to check. It's not unusual that this happens when dealing with dates because even though the database (PostgreSQL) stores the dates in full UTC, sometimes when data travels via app servers through JSON pipelines, date formatting can drop important bits.
But here, '2014-11-27T02:50:49Z' is specific.

What made this so incredibly hard to debug was that it worked on one page but not on the other even though the two had the same exact component code. I broke it apart thinking there was something nasty in the content of the Markdown-rendered HTML. No. The reason it only happened on some pages was that I had a function that looked like this:


export function formatDateBasic(date: string) {
  return new Date(date).toLocaleDateString("en-us", {
    year: "numeric",
    month: "long",
    day: "numeric",
  });
}

And, different pages listed, almost non-deterministic, with different dates for related content which was referred to along with their dates. So on one page, there might be a single date that formats differently in EDT (Eastern daylight-saving time) compared to UTC. For example, Apr 1 at 18:00 Zulu, is still Apr 1 in EDT.

The explanation

I'm sorry that I don't understand this better, but Node's implementation of Date.toLocaleDateString does more than depend on process.env.TZ. I think $TZ is just a way to gain control.

For example, start the node REPL like this:

On my Ubuntu 20.04 server:

$ TZ=utc node
Welcome to Node.js v16.20.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'27'

On my MacBook:

❯ TZ=utc node
Welcome to Node.js v16.13.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'27'

To find out what timezone your computer has:

On Ubuntu:

$ timedatectl
               Local time: Mon 2023-05-08 12:42:03 UTC
           Universal time: Mon 2023-05-08 12:42:03 UTC
                 RTC time: Mon 2023-05-08 12:42:04
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

On macOS:

❯ sudo systemsetup -gettimezone
Password:
Time Zone: America/New_York

The solution

Setting TZ is probably a good thing. That can get a bit tricky though. Your code needs to run consistently on your laptop, in GitHub Actions, on a VPS server, in an Edge cloud function, etc.

A better way is to force Date.toLocaleString to be fed a timezone. Now it's controlled at the highest level:


export function formatDateBasic(date: string) {
  return new Date(date).toLocaleDateString("en-us", {
    year: "numeric",
    month: "long",
    day: "numeric",
+   timeZone: "UTC"
  });
}

Now, it no longer depends on the OS it runs on.

On my Ubuntu server:

Welcome to Node.js v16.20.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric", timeZone: "UTC"})
'27'

On my macOS:

Welcome to Node.js v16.13.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric", timeZone: "UTC"})
'27'

Fun fact

I once made it unnecessarily weird for me in the debugging session, when I figured out about the timeZone option. What I ran was this:

Welcome to Node.js v16.13.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric", zimeZone: "UTC"})
'26'

I expected it to be '27' now but why did it revert?? Notice the typo? And Date.toLocaleDateString won't throw an error for passing in options it doesn't expect.

How to count the most common lines in a file

October 7, 2022
0 comments Bash, MacOSX, Linux

tl;dr sort myfile.log | uniq -c | sort -n -r

I wanted to count recurring lines in a log file and started writing a complicated Python script but then wondered if I can just do it with bash basics.
And after some poking and experimenting I found a really simple one-liner that I'm going to try to remember for next time:

You can't argue with the nice results :)

cat myfile.log
one
two
three
one
two
one
once
one

▶ sort myfile.log | uniq -c | sort -n -r
   4 one
   2 two
   1 three
   1 once
Previous page
Next page