How to count the most common lines in a file
October 7, 2022
0 comments Bash, MacOSX, Linux
tl;dr sort myfile.log | uniq -c | sort -n -r
I wanted to count recurring lines in a log file and started writing a complicated Python script but then wondered if I can just do it with bash basics.
And after some poking and experimenting I found a really simple one-liner that I'm going to try to remember for next time:
You can't argue with the nice results :)
▶ cat myfile.log
one
two
three
one
two
one
once
one
▶ sort myfile.log | uniq -c | sort -n -r
4 one
2 two
1 three
1 once
Find the largest node_modules directories with bash
September 30, 2022
0 comments Bash, MacOSX, Linux
tl;dr; fd -I -t d node_modules | rg -v 'node_modules/(\w|@)' | xargs du -sh | sort -hr
It's very possible that there's a tool that does this, but if so please enlighten me.
The objective is to find which of all your various projects' node_modules
directory is eating up the most disk space.
The challenge is that often you have nested node_modules
within and they shouldn't be included.
The command uses fd
which comes from brew install fd
and it's a fast alternative to the built-in find
. Definitely worth investing in if you like to live fast on the command line.
The other important command here is rg
which comes from brew install ripgrep
and is a fast alternative to built-in grep
. Sure, I think one can use find
and grep
but that can be left as an exercise to the reader.
▶ fd -I -t d node_modules | rg -v 'node_modules/(\w|@)' | xargs du -sh | sort -hr 1.1G ./GROCER/groce/node_modules/ 1.0G ./SHOULDWATCH/youshouldwatch/node_modules/ 826M ./PETERBECOM/django-peterbecom/adminui/node_modules/ 679M ./JAVASCRIPT/wmr/node_modules/ 546M ./WORKON/workon-fire/node_modules/ 539M ./PETERBECOM/chiveproxy/node_modules/ 506M ./JAVASCRIPT/minimalcss-website/node_modules/ 491M ./WORKON/workon/node_modules/ 457M ./JAVASCRIPT/battleshits/node_modules/ 445M ./GITHUB/DOCS/docs-internal/node_modules/ 431M ./GITHUB/DOCS/docs/node_modules/ 418M ./PETERBECOM/preact-cli-peterbecom/node_modules/ 418M ./PETERBECOM/django-peterbecom/adminui0/node_modules/ 399M ./GITHUB/THEHUB/thehub/node_modules/ ...
How it works:
fd -I -t d node_modules
: Find all directories callednode_modules
but ignore any.gitignore
directives in their parent directories.rg -v 'node_modules/(\w|@)'
: Exclude all finds where the wordnode_modules/
is followed by a@
or a[a-z0-9]
character.xargs du -sh
: For each line, rundu -sh
on it. That's like doingcd some/directory && du -sh
, wheredu
means "disk usage" and-s
means total and-h
means human-readable.sort -hr
: Sort by the first column as a "human numeric sort" meaning it understands that "1M" is more than "20K"
Now, if I want to free up some disk space, I can look through the list and if I recognize a project I almost never work on any more, I just send it to rm -fr
.
Create a large empty file for testing
September 8, 2022
0 comments Linux
Because I always end up Googling this and struggling to find it easily, I'm going to jot it down here so it's more present on the web for others (and myself!) to quickly find.
Suppose you want to test something like a benchmark; for example, a unit test that has to process a largish file. You can use the dd
command which is available on macOS and most Linuxes.
▶ dd if=/dev/zero of=big.file count=1024 bs=1024 ▶ ls -lh big.file -rw-r--r-- 1 peterbe staff 1.0M Sep 8 15:54 big.file
So the count=1024
creates a 1MB file. To create a 500KB one you simply use...
▶ dd if=/dev/zero of=big.file count=500 bs=1024 ▶ ls -lh big.file -rw-r--r-- 1 peterbe staff 500K Sep 8 15:55 big.file
It creates a binary file so you can't cat
view it. But if you try to use less
, for example, you'll see this:
▶ less big.file "big.file" may be a binary file. See it anyway? [Enter] ^@^@^@...snip...^@^@^@ big.file (END)
Comparing compression commands with hyperfine
July 6, 2022
0 comments Bash, MacOSX, Linux
Today I stumbled across a neat CLI for benchmark comparing CLIs for speed: hyperfine. By David @sharkdp Peter.
It's a great tool in your arsenal for quick benchmarks in the terminal.
It's written in Rust and is easily installed with brew install hyperfine
. For example, let's compare a couple of different commands for compressing a file into a new compressed file. I know it's comparing apples and oranges but it's just an example:
It basically executes the following commands over and over and then compares how long each one took on average:
apack log.log.apack.gz log.log
gzip -k log.log
zstd log.log
brotli -3 log.log
If you're curious about the ~results~ apples vs oranges, the final result is:
▶ ls -lSh log.log* -rw-r--r-- 1 peterbe staff 25M Jul 3 10:39 log.log -rw-r--r-- 1 peterbe staff 2.4M Jul 5 22:00 log.log.apack.gz -rw-r--r-- 1 peterbe staff 2.4M Jul 3 10:39 log.log.gz -rw-r--r-- 1 peterbe staff 2.2M Jul 3 10:39 log.log.zst -rw-r--r-- 1 peterbe staff 2.1M Jul 3 10:39 log.log.br
The point is that you type hyperfine
followed by each command in quotation marks. The --prepare
is run for each command and you can also use --cleanup="{cleanup command here}
.
It's versatile so it doesn't have to be different commands but it can be: hyperfine "python optimization1.py" "python optimization2.py"
to compare to Python scripts.
🎵 You can also export the output to a Markdown file. Here, I used:
▶ hyperfine "apack log.log.apack.gz log.log" "gzip -k log.log" "zstd log.log" "brotli -3 log.log" --prepare="rm -fr log.log.*" --export-markdown log.compress.md ▶ cat log.compress.md | pbcopy
and it becomes this:
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
apack log.log.apack.gz log.log |
291.9 ± 7.2 | 283.8 | 304.1 | 4.90 ± 0.19 |
gzip -k log.log |
240.4 ± 7.3 | 232.2 | 256.5 | 4.03 ± 0.18 |
zstd log.log |
59.6 ± 1.8 | 55.8 | 65.5 | 1.00 |
brotli -3 log.log |
122.8 ± 4.1 | 117.3 | 132.4 | 2.06 ± 0.09 |
./bin/huey-isnt-running.sh - A bash script to prevent lurking ghosts
June 10, 2020
0 comments Python, Linux, Bash
tl;dr; Here's a useful bash script to avoid starting something when its already running as a ghost process.
Huey is a great little Python library for doing background tasks. It's like Celery but much lighter, faster, and easier to understand.
What cost me almost an hour of hair-tearing debugging today was that I didn't realize that a huey
daemon process had gotten stuck in the background with code that wasn't updating as I made changes to the tasks.py
file in my project. I just couldn't understand what was going on.
The way I start my project is with honcho which is a Python Foreman clone. The Procfile
looks something like this:
elasticsearch: cd /Users/peterbe/dev/PETERBECOM/elasticsearch-7.7.0 && ./bin/elasticsearch -q
web: ./bin/run.sh web
minimalcss: cd minimalcss && PORT=5000 yarn run start
huey: ./manage.py run_huey --flush-locks --huey-verbose
adminui: cd adminui && yarn start
pulse: cd pulse && yarn run dev
And you start that with simply typing:
honcho start
When you Ctrl-C, it kills all those processes but somehow somewhere it doesn't always kill everything. Restarting the computer isn't a fun alternative.
So, to prevent my sanity from draining I wrote this script:
#!/usr/bin/env bash
set -eo pipefail
# This is used to make sure that before you start huey,
# there isn't already one running the background.
# It has happened that huey gets lingering stuck as a
# ghost and it's hard to notice it sitting there
# lurking and being weird.
bad() {
echo "Huey is already running!"
exit 1
}
good() {
echo "Huey is NOT already running"
exit 0
}
ps aux | rg huey | rg -v 'rg huey' | rg -v 'huey-isnt-running.sh' && bad || good
(If you're wondering what rg
is; it's short for ripgrep
)
And I change my Procfile
accordingly:
-huey: ./manage.py run_huey --flush-locks --huey-verbose
+huey: ./bin/huey-isnt-running.sh && ./manage.py run_huey --flush-locks --huey-verbose
There really isn't much rocket science or brain surgery about this blog post but I hope it inspires someone who's been in similar trenches that a simple bash script can make all the difference.
How I added brotli_static to nginx 1.17 in Ubuntu (Eoan Ermine) 19.10
April 9, 2020
0 comments Nginx, Linux
I knew I didn't want to download the sources to nginx
to install it on my new Ubuntu 19.10 server because I'll never have the discipline to remember to keep it upgraded. No, I'd rather just run apt update && apt upgrade
every now and then.
Why is this so hard?! All I need is the ability to set brotli_static on;
in my Nginx config so it'll automatically pick the .br
file if it exists on disk.
These instructions totally helped but here they are specifically for my version (all run as root
):
git clone --recursive https://github.com/google/ngx_brotli.git apt install brotli apt-get build-dep nginx # Note the version of which nginx you have installed nginx -v # ...which informs which URL to wget wget https://nginx.org/download/nginx-1.17.9.tar.gz aunpack nginx-1.17.9.tar.gz nginx -V 2>&1 >/dev/null | grep -o " --.*" | grep -oP .+?(?=--add-dynamic-module)| head -1 > nginx-1.17.9/build_args.txt cd nginx-1.17.9/ ./configure --with-compat $(cat build_args.txt) --add-dynamic-module=../ngx_brotli make install cp objs/ngx_http_brotli_filter_module.so /usr/lib/nginx/modules/ chmod 644 /usr/lib/nginx/modules/ngx_http_brotli_filter_module.so cp objs/ngx_http_brotli_static_module.so /usr/lib/nginx/modules/ chmod 644 /usr/lib/nginx/modules/ngx_http_brotli_static_module.so ls -l /etc/nginx/modules
Now I can edit my /etc/nginx/nginx.conf
(somewhere near the top) to:
load_module /usr/lib/nginx/modules/ngx_http_brotli_filter_module.so; load_module /usr/lib/nginx/modules/ngx_http_brotli_static_module.so;
And test that it works:
nginx -t