A blog and website by Peter Bengtsson
Inside a step in a GitHub Action, I want to run a script, and depending on the outcome of that, maybe do some more things. Essentially, if the script fails, I want to print some extra user-friendly messages, but the whole Action should still fail with the same exit code.
In pseudo-code, this is what I want to achieve:
exit_code = that_other_script() if exit_code > 0: print("Extra message if it failed") exit(exit_code)
So here's how to do that with bash
:
# If it's not the default, make it so that it proceeds even if
# any one line exits non-zero
set +e
./script/update-internal-links.js --check
exit_code=$?
if [ $exit_code != 0 ]; then
echo "Extra message here informing that the script failed"
exit $exit_code
fi
The origin, for me, at the moment, was that I had a GitHub Action where it calls another script that might fail. If it fails, I wanted to print out a verbose extra hint to whoever looks at the output. Steps in GitHub Action runs with set -e
by default I think, meaning that if anything goes wrong in the step it leaves the step and runs those steps with if: ${{ failure() }}
next.
tl;dr sort myfile.log | uniq -c | sort -n -r
I wanted to count recurring lines in a log file and started writing a complicated Python script but then wondered if I can just do it with bash basics.
And after some poking and experimenting I found a really simple one-liner that I'm going to try to remember for next time:
You can't argue with the nice results :)
▶ cat myfile.log
one
two
three
one
two
one
once
one
▶ sort myfile.log | uniq -c | sort -n -r
4 one
2 two
1 three
1 once
tl;dr; fd -I -t d node_modules | rg -v 'node_modules/(\w|@)' | xargs du -sh | sort -hr
It's very possible that there's a tool that does this, but if so please enlighten me.
The objective is to find which of all your various projects' node_modules
directory is eating up the most disk space.
The challenge is that often you have nested node_modules
within and they shouldn't be included.
The command uses fd
which comes from brew install fd
and it's a fast alternative to the built-in find
. Definitely worth investing in if you like to live fast on the command line.
The other important command here is rg
which comes from brew install ripgrep
and is a fast alternative to built-in grep
. Sure, I think one can use find
and grep
but that can be left as an exercise to the reader.
▶ fd -I -t d node_modules | rg -v 'node_modules/(\w|@)' | xargs du -sh | sort -hr 1.1G ./GROCER/groce/node_modules/ 1.0G ./SHOULDWATCH/youshouldwatch/node_modules/ 826M ./PETERBECOM/django-peterbecom/adminui/node_modules/ 679M ./JAVASCRIPT/wmr/node_modules/ 546M ./WORKON/workon-fire/node_modules/ 539M ./PETERBECOM/chiveproxy/node_modules/ 506M ./JAVASCRIPT/minimalcss-website/node_modules/ 491M ./WORKON/workon/node_modules/ 457M ./JAVASCRIPT/battleshits/node_modules/ 445M ./GITHUB/DOCS/docs-internal/node_modules/ 431M ./GITHUB/DOCS/docs/node_modules/ 418M ./PETERBECOM/preact-cli-peterbecom/node_modules/ 418M ./PETERBECOM/django-peterbecom/adminui0/node_modules/ 399M ./GITHUB/THEHUB/thehub/node_modules/ ...
How it works:
fd -I -t d node_modules
: Find all directories called node_modules
but ignore any .gitignore
directives in their parent directories.rg -v 'node_modules/(\w|@)'
: Exclude all finds where the word node_modules/
is followed by a @
or a [a-z0-9]
character. xargs du -sh
: For each line, run du -sh
on it. That's like doing cd some/directory && du -sh
, where du
means "disk usage" and -s
means total and -h
means human-readable.sort -hr
: Sort by the first column as a "human numeric sort" meaning it understands that "1M" is more than "20K"Now, if I want to free up some disk space, I can look through the list and if I recognize a project I almost never work on any more, I just send it to rm -fr
.
Today I stumbled across a neat CLI for benchmark comparing CLIs for speed: hyperfine. By David @sharkdp Peter.
It's a great tool in your arsenal for quick benchmarks in the terminal.
It's written in Rust and is easily installed with brew install hyperfine
. For example, let's compare a couple of different commands for compressing a file into a new compressed file. I know it's comparing apples and oranges but it's just an example:
It basically executes the following commands over and over and then compares how long each one took on average:
apack log.log.apack.gz log.log
gzip -k log.log
zstd log.log
brotli -3 log.log
If you're curious about the ~results~ apples vs oranges, the final result is:
▶ ls -lSh log.log* -rw-r--r-- 1 peterbe staff 25M Jul 3 10:39 log.log -rw-r--r-- 1 peterbe staff 2.4M Jul 5 22:00 log.log.apack.gz -rw-r--r-- 1 peterbe staff 2.4M Jul 3 10:39 log.log.gz -rw-r--r-- 1 peterbe staff 2.2M Jul 3 10:39 log.log.zst -rw-r--r-- 1 peterbe staff 2.1M Jul 3 10:39 log.log.br
The point is that you type hyperfine
followed by each command in quotation marks. The --prepare
is run for each command and you can also use --cleanup="{cleanup command here}
.
It's versatile so it doesn't have to be different commands but it can be: hyperfine "python optimization1.py" "python optimization2.py"
to compare to Python scripts.
🎵 You can also export the output to a Markdown file. Here, I used:
▶ hyperfine "apack log.log.apack.gz log.log" "gzip -k log.log" "zstd log.log" "brotli -3 log.log" --prepare="rm -fr log.log.*" --export-markdown log.compress.md ▶ cat log.compress.md | pbcopy
and it becomes this:
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
apack log.log.apack.gz log.log |
291.9 ± 7.2 | 283.8 | 304.1 | 4.90 ± 0.19 |
gzip -k log.log |
240.4 ± 7.3 | 232.2 | 256.5 | 4.03 ± 0.18 |
zstd log.log |
59.6 ± 1.8 | 55.8 | 65.5 | 1.00 |
brotli -3 log.log |
122.8 ± 4.1 | 117.3 | 132.4 | 2.06 ± 0.09 |
tl;dr; Here's a useful bash script to avoid starting something when its already running as a ghost process.
Huey is a great little Python library for doing background tasks. It's like Celery but much lighter, faster, and easier to understand.
What cost me almost an hour of hair-tearing debugging today was that I didn't realize that a huey
daemon process had gotten stuck in the background with code that wasn't updating as I made changes to the tasks.py
file in my project. I just couldn't understand what was going on.
The way I start my project is with honcho which is a Python Foreman clone. The Procfile
looks something like this:
elasticsearch: cd /Users/peterbe/dev/PETERBECOM/elasticsearch-7.7.0 && ./bin/elasticsearch -q
web: ./bin/run.sh web
minimalcss: cd minimalcss && PORT=5000 yarn run start
huey: ./manage.py run_huey --flush-locks --huey-verbose
adminui: cd adminui && yarn start
pulse: cd pulse && yarn run dev
And you start that with simply typing:
honcho start
When you Ctrl-C, it kills all those processes but somehow somewhere it doesn't always kill everything. Restarting the computer isn't a fun alternative.
So, to prevent my sanity from draining I wrote this script:
#!/usr/bin/env bash
set -eo pipefail
# This is used to make sure that before you start huey,
# there isn't already one running the background.
# It has happened that huey gets lingering stuck as a
# ghost and it's hard to notice it sitting there
# lurking and being weird.
bad() {
echo "Huey is already running!"
exit 1
}
good() {
echo "Huey is NOT already running"
exit 0
}
ps aux | rg huey | rg -v 'rg huey' | rg -v 'huey-isnt-running.sh' && bad || good
(If you're wondering what rg
is; it's short for ripgrep
)
And I change my Procfile
accordingly:
-huey: ./manage.py run_huey --flush-locks --huey-verbose
+huey: ./bin/huey-isnt-running.sh && ./manage.py run_huey --flush-locks --huey-verbose
There really isn't much rocket science or brain surgery about this blog post but I hope it inspires someone who's been in similar trenches that a simple bash script can make all the difference.