A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Linux. Clear filter

And bash basics

16 October 2015 2 comments   MacOSX, Linux

It's one of those things; not hard to understand and certainly not an advanced trick but I sometimes see people miss out on this.

In bash there are sort of two ways of saying "Do this and then do that". You can either say "Do this and no matter what happens then do that" or you can say "Do this and if that worked also do that".


Suppose you have two command executables you want to run. They can succeed or fail.

$ echo "Do this and no matter what happens then do that"
$ ./command1 ; ./command2

If you run that, ./command2 will run even if ./command1 failed.
The other one is...

$ echo "Do this and if that worked also do that"
$ ./command1 && ./command2

You might recognize the && thing from JavaScript or Java or C or one of those. If you recognize it you might quickly also conclude that you can do this too:

$ echo "Do this and only if it failed do that"
$ ./command1 || ./command2

In this latter case only one of those (or none!) will succeed.

So when does this come in handy?

Here are some examples that I often use:

Meaning, I know my code is good to push, iff the tests pass

$ nosetests && git commit -a -m "some feature" && git push peterbe mybranch

Or if you might want to be alerted if something failed after the first command slowly takes its time to finish:

$ nosetests && say "Tests finished" || say "Work harder"

(say is an OSX specific command and not a built-in in bash)

The ; is useful when you don't care if the first command finished and this is more rare. For example:

$ rm static/ ; ./ collectstatic --noinput

Why bother?

Perhaps it goes without saying, the reason for doing all of these is generally when the first command takes a long time and you don't want to sit and wait till it's finished to run the second time. By "piping them together" like this, the second command will safely start as soon as possible whilst you go away and pay attention to something else.

mozjpeg installation and sample

10 October 2015 3 comments   Mozilla, Web development, Linux

I've written about mozjpeg before where I showed what it can do to a sample directory full of different kinds of JPEGs. But let's get more real. Let's actually install it and look at one thumbnail and one big photo.

To install, I used the pre-compiled binaries from this wonderful site. Like this:

# wget
# dpkg -i mozjpeg_3.1_amd64.deb
# ls -l /opt/mozjpeg/bin/cjpeg
-rwxr-xr-x 1 root root 50784 Sep  3 19:03 /opt/mozjpeg/bin/cjpeg

I don't know why the binary executable becomes called cjpeg but that's fine. Let's put it in $PATH so other users can execute it:

# cd /usr/local/bin
# ln -s /opt/mozjpeg/bin/cjpeg

Now, let's actually use it for something. First we need a realistic lossy thumbnail that we can optimize.

$ wget

This was one of the thumbnails from a previous post called Panasonic Lumix from 2008 or a iPhone 5S from 2014.

Let's optimize!

$ jpeg -outfile ebf08e64e80170dc009e97f6f9681ceb.moz.jpg -optimise ebf08e64e80170dc009e97f6f9681ceb.jpg
$ ls -l ebf08e64e80170dc009e97f6f9681ceb.*
-rw-rw-r-- 1 django django 11391 Sep 26 17:04 ebf08e64e80170dc009e97f6f9681ceb.jpg
-rw-r--r-- 1 django django  9414 Oct 10 01:40 ebf08e64e80170dc009e97f6f9681ceb.moz.jpg

Yay! It's 17.4% smaller. Saving 1.93Kb.

So what do they look like? See for yourself:

I have to zoom in (⌘-+) 3 times until I can see any difference. But remember, the saving isn't massive but the usecase here is a thumbnail.

So, let's do the same with a non-thumbnail. Some huge JPEG.

$ time cjpeg -outfile Lumix-2.moz.jpg -optimise Lumix-2.jpg
real    0m3.285s
user    0m3.122s
sys     0m0.080s
$ ls -l Lumix*
-rw-rw-r-- 1 django django 4880446 Sep 26 17:20 Lumix-2.jpg
-rw-rw-r-- 1 django django 1546978 Oct 10 02:02 Lumix-2.moz.jpg
$ ls -lh Lumix*
-rw-rw-r-- 1 django django 4.7M Sep 26 17:20 Lumix-2.jpg
-rw-rw-r-- 1 django django 1.5M Oct 10 02:02 Lumix-2.moz.jpg

In other words, from 4.7Mb to 1.5Mb. It's 68.3% the size of the original. And the visual difference?

Again, I have to zoom in 3 times to be able to tell any difference and even when I've done that it's hard to tell which is which.

In conclusion, let's go ahead and use mozjpeg to optimize thumbnails.

How to test if gzip is working on your site or app

20 August 2015 0 comments   Web development, Linux

Suppose that you've enabled gzip delivery of your site and its static assets. How do you test it?

One obvious way is to load the site with the developer tools in your browser and look at the headers there. Like this for example:
Is it gzip'ed? Yes!

Another more hard code and geek-power way is to simply use curl.

It goes without saying, the ideal way to set up Nginx is to make it optional. Don't upload a gzipped file to your server and force gzip down on every client. Instead, let something like Nginx handle it on-the-fly (don't worry, it's ultrafast).

So to see if gzip is working, take your URL and run these two commands:

$ curl -v --compressed > /dev/null

And look for this line:

< Content-Encoding: gzip

Also you should look for this line:

Content-Length: 2403

(number will obviously vary)

Now run the same curl command but without the --compressed. E.g.

$ curl -v > /dev/null

Now there won't be a Content-Encoding header in the response. It defaults to plain text.
Also, now look for the Content-Length and amuse yourself in the profit that this number is likely to be much larger than before.

Here's a realistic example:

With --compressed

$ curl -v --compressed > /dev/null
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying
* Connected to ( port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate:
* Server certificate: DigiCert SHA2 Secure Server CA
* Server certificate: DigiCert Global Root CA
> GET /home/products/Firefox HTTP/1.1
> User-Agent: curl/7.37.1
> Host:
> Accept: */*
> Accept-Encoding: deflate, gzip
< HTTP/1.1 200 OK
< Content-Encoding: gzip
< Content-Type: text/html; charset=utf-8
< Date: Thu, 20 Aug 2015 18:38:00 GMT
* Server nginx/1.6.3 is not blacklisted
< Server: nginx/1.6.3
< Set-Cookie: anoncsrf=yieBMvzCn4fO4lmMQbjuq0Cibl9s7oxG; expires=Thu, 20-Aug-2015 20:38:00 GMT; httponly; Max-Age=7200; Path=/; secure
< Vary: Accept-Encoding
< Vary: Cookie
< X-Frame-Options: DENY
< Content-Length: 2403
< Connection: keep-alive
{ [data not shown]
100  2403  100  2403    0     0   4734      0 --:--:-- --:--:-- --:--:--  4730
* Connection #0 to host left intact


$ curl -v > /dev/null
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying
* Connected to ( port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate:
* Server certificate: DigiCert SHA2 Secure Server CA
* Server certificate: DigiCert Global Root CA
> GET /home/products/Firefox HTTP/1.1
> User-Agent: curl/7.37.1
> Host:
> Accept: */*
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< Date: Thu, 20 Aug 2015 18:38:05 GMT
* Server nginx/1.6.3 is not blacklisted
< Server: nginx/1.6.3
< Set-Cookie: anoncsrf=evG8kmoXjHv5aeyFIQHxNcnGahdxwIOy; expires=Thu, 20-Aug-2015 20:38:05 GMT; httponly; Max-Age=7200; Path=/; secure
< Vary: Accept-Encoding
< Vary: Cookie
< X-Frame-Options: DENY
< Content-Length: 12299
< Connection: keep-alive
{ [data not shown]
100 12299  100 12299    0     0  24314      0 --:--:-- --:--:-- --:--:-- 24306
* Connection #0 to host left intact

How I git

18 June 2015 1 comment   Linux, Python

tl;dr I use bgg to shortcut a lot of tedious git commands.

Once a certain pattern appears where you find yourself doing the same thing over and over the first thing that should spring to mind is: let's automate that!

So a couple of years ago I started writing simple Python scripts that would wrap various git operations so I could do things like G merge or G rebase. That has helped me tremendously and when I at first showed these scripts to some people I was amazed how unimpressed they were. I guess that's because they have their own scripts or a geeky reluctance to adopting someone elses shortcuts unless you've personally be apart of going from tedious to shortcut.

So, a crucial part of my work here at Mozilla is to look at a Bugzilla and start a topic branch based on it and when it's done, push that into a Pull Request on GitHub.

The first command is G start. It takes a single optional argument. If an argument is provided it has to be a Bugzilla bug number. If you supply a Bugzilla ID it will fetch the title of that bug (assuming you're online) and store that so that it can be used to mention it in the git commit message. For example:

(airmozilla):~/dev/MOZILLA/AIRMOZILLA/airmozilla (master)$ G start 1174316
You're currently on branch master
Summary ["Start duration fetching when stopping a live event"]:
Switched to a new branch 'bug-1174316-start-duration-fetching-when-stopping-a-live-event'

The git branch name becomes a "slugified" version of the bug summary. But note, it merely sets the default. I could override it if I want to.

Then you do some work on it and when you're done you type the next command; G commit. It basically runs git commit -a -m "..." using the bug number, the bug summary, optionally asking if you want to prefix the commit message with fixes and then pushed it to your fork. Example speaks for itself:

(airmozilla):~/dev/MOZILLA/AIRMOZILLA/airmozilla (bug-1174316-start-duration-fetching-when-stopping-a-live-event *)$ G commit
    bug 1174316 - Start duration fetching when stopping a live event

OK? [Y/n]
Add the 'fixes ' prefix? [N/y] y
NOW, feel free to run:

git checkout master
git merge bug-1174316-start-duration-fetching-when-stopping-a-live-event
git branch -d bug-1174316-start-duration-fetching-when-stopping-a-live-event


git push peterbe bug-1174316-start-duration-fetching-when-stopping-a-live-event

Run that push? [Y/n]
 * [new branch]      bug-1174316-start-duration-fetching-when-stopping-a-live-event -> bug-1174316-start-duration-fetching-when-stopping-a-live-event

You get the picture. It's interactive and mostly you just hit enter and it does stuff saving you copious milliseconds.

Other noteworthy commands:

G rebase - whilst on a branch, jumps over to the master branch, updates from the origin, then goes back to the branch you were on preparing you for an interactive git rebase.

G merge - goes over to the master branch, merges the branch you were on and if it works out, deletes the branch.

G getback - you're in a branch you know was merged (using GitHub's green merge button), it switches to the master branch, updates master and deletes the local topic branch (that was merged) and deletes the remote topic branch on your fork.

G cleanup [search] - you're on some other branch other than the one you search for. It finds that branch (if only 1 match) and does that G getback does.

G branches [search] - lists all your branches sorted by most recently worked on last also indicate how long ago you worked on it and if it has already been merged.

The reason I'm mentioning this isn't to convince you to use my tool to do your git but perhaps to inspire you to write your own scripts that wrap things you find yourself doing repetitively.

I know my own battle isn't over. I'm still finding things that I have to do additionally on an almost perfectly predictable basis. Thankfully I now have an infrastructure to add more scripting.

Fastest way to take screencaps out of videos

19 December 2014 0 comments   Mozilla, Web development, Linux

tl;dr Don't run ffmpeg over HTTP(S) and use ffmpegthumbnailer

UPDATE tl;dr Download the file then run ffmpeg with -ss HH:MM:SS first. Don't bother with ffmpegthumbnailer

At work I work on something called Air Mozilla. It's a site for hosting live video broadcasts and then archiving those so they can be retrieved later.

Unlike sites like YouTube we can't take a screencap from the video because many videos are future (aka. "upcoming") videos so instead we use a little placeholder thumbnail (for example, the Rust logo).

However, once it has been recorded we want to switch from the logo to an actual screen capture from the video itself. We set up a cronjob that uses ffmpeg to extract these as JPGs and then the users can go in and select whichever picture they like the best.

This is all work in progress by the way (as of December 2014).

One problem is that we have is that the command for extracting JPGs is really slow. So slow that we can't wrap the subprocess in a Django database connection because it's so slow that the database connection is often killed.

The command to extract them looks something like this:

ffmpeg -i -r 0.0143 /tmp/screencaps-%02d.jpg

Where the number r is based on the duration and how many pictures we want out. E.g. 0.0143 = 15 * 1049 where 15 is how many JPGs we want and 1049 is a duration of 17 minutes and 29 seconds.

The script I used first was:

My first experiment was to try to extract one picture at a time, hoping that way, internally, ffmpeg might be able to optimize something.

The second script I used was:

The third alternative was to try ffmpegthumbnailer which is an intricate wrapper on ffmpeg and it has the benefit that you can produce slightly higher picture quality too.

The third script I used was:

Bar chart comparing the 3 different scripts
And running these three depend very much on the state of my DSL at the time.

For a video clip that is 17 minutes long and a 138Mb mp4 file.   2m0.847s   11m46.734s   0m29.780s

Clearly it's not efficient to do one screenshot at a time.
Because with ffmpegthumbnailer you can tell it not to reduce the picture quality the total weight of the produced JPGs from was 784Kb and the total weight from was 1.5Mb.

Just to try again, I ran a similar experiment with a 35 minutes long and 890Mb mp4 file. And this time I didn't bother with The results were:   18m21.330s   2m48.656s

So that means that using ffmpegthumbnailer is about 5 times faster than ffmpeg. Huge difference!

And now, a curveball!

The reason for doing ffmpeg -i https://... was so that we don't have to first download the whole beast and run the command on a local file. However, in light of how so much longer this takes and my disdain to have to install and depend on a new tool (ffmpegthumbnailer) across all servers. Why not download the whole file and run the ffmpeg command locally.

So I download the file and it's slow because of my, currently, terrible home DSL. Then I run and time them again but just a local file instead:   0m20.426s   0m0.635s

Did you see that!? That's an insane difference. Clearly doing this command over HTTP(S) is a bad idea. It'll be worth downloading it first.


On Stackoverflow, LordNeckBeard gave a great tip of using the -ss option before in the input file and now it's much faster. At this point. I'm no longer interested in having to bother with ffmpegthumbnailer.

Let's fork into two versions. same as but a downloaded file instead of a remote HTTPS URL. as except we put the -ss HH:MM:SS before the input file.

Now, let's run them again on the 138Mb file:

# the 138Mb mp4.mp4 file   2m10.898s   0m0.672s

187 times faster

And again, I re-ran this again against a bigger file that is 1.4Gb:

# the 1.4Gb mp4-1.44Gb.mp4 file   10m1.143s   0m1.428s

420 times faster


08 December 2014 0 comments   Linux

When I build one of the biggest challenges was to image resizing of enourmous images. Primarily JPEGs.

The way Hugepic works is that it chops up images into tiles, but before it can crop and chop of the tiles it needs to resize the image to a certain size. Say 1024x1024. Now this is really slow and it's so CPU intensive that if you try to parallelize it you end up causing so much "swappage" that the time it takes to resize to large images in parallel is more than it takes to do them one at a time.

The tool I found that was the best possible was ImageMagick's tool convert.

Now there's a new tool that is much faster: vipsthumbnail

There are more comprehensive benchmarks abound the net, like this one for example, but here's a quick one to wet your appetite:

$ ls -lh 8/04/84c3e9.jpg
-rw-r--r--@ 1 peterbe  staff   253M Sep 16 12:00 8/04/84c3e9.jpg

$ time convert 8/04/84c3e9.jpg -resize 200 /tmp/converted-200.jpg
real    0m9.423s
user    0m8.893s
sys     0m0.521s

$ time vipsthumbnail 8/04/84c3e9.jpg -s 200x200 -o /tmp/vips.jpg
real    0m3.209s
user    0m3.051s
sys     0m0.138s

It supposedly has ports for Python but I'm quite happy to just a subprocess out to the command. You can install it on OSX with brew install vips.


Boy! Do I wish I knew about vips dzsave when I built

Bye bye AWS EC2, Hello Digital Ocean

03 November 2014 0 comments   Linux

Before trolls get inspired let me start with this: EC2 is awesome!

But, wanna know what's also awesome?: Digital Ocean

The reason I switched was two-fold: A) money and B) curiousity.

As part of a very generous special friendship I got a "m1.large" for free. That deal had to come to an end so I had to start paying that myself. It was well over $100 per month. I have about 10 servers running on that machine hovering around 3+Gb of RAM.

So I thought this is an excuse to do some spring cleaning and then switch to this newfangled Digital Ocean which is all SSD drives, got good reviews and has a fixed cost per month. First I decommissioned some servers and some sites that used to have multiple processors were reduced to just a single process. Now I got everything down to a steady 2+Gb.

I decided to splash out a bit and I went for the $40/month option which is 4GB, 2 core, 60GB SSD and 4TB transfer. Setting up all the servers on this new Ubuntu 14.04 was relatively easy (thank you pip freeze and rsync!).

So far, I have to say I'm wildly impressed. The interface is gorgeous. It's easy to do everything. I love that the price is fixed. That suits me more that corporations might care about but I'm just little old me.

If you get inspired to try it out please use my referral code. Then you get $10 free credit:

uwsgi and uid

03 November 2014 4 comments   Django, Linux, Python

So recently, I moved home for this blog. It used to be on AWS EC2 and is now on Digital Ocean. I wanted to start from scratch so I started on a blank new Ubuntu 14.04 and later rsync'ed over all the data bit by bit (no pun intended).

When I moved this site I copied the /etc/uwsgi/apps-enabled/peterbecom.ini file and started it with /etc/init.d/uwsgi start peterbecom. The settings were the same as before:

# this is /etc/uwsgi/apps-enabled/peterbecom.ini
virtualenv = /var/lib/django/django-peterbecom/venv
pythonpath = /var/lib/django/django-peterbecom
user = django
master = true
processes = 3
env = DJANGO_SETTINGS_MODULE=peterbecom.settings
module = django_wsgi2:application

But I kept getting this error:

Traceback (most recent call last):
  File "/var/lib/django/django-peterbecom/venv/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/", line 182, in _cursor
    self.connection = Database.connect(**conn_params)
  File "/var/lib/django/django-peterbecom/venv/local/lib/python2.7/site-packages/psycopg2/", line 164, in connect
    conn = _connect(dsn, connection_factory=connection_factory, async=async)
psycopg2.OperationalError: FATAL:  Peer authentication failed for user "django"

What the heck! I thought. I was able to connect perfectly fine with the same config on the old server and here on the new server I was able to do this:

django@peterbecom:~/django-peterbecom$ source venv/bin/activate
(venv)django@peterbecom:~/django-peterbecom$ ./ shell
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from peterbecom.apps.plog.models import *
>>> BlogItem.objects.all().count()

Clearly I've set the right password in the settings/ file. In fact, I haven't changed anything and I pg_dump'ed the data over from the old server as is.

I edit edited the file psycopg2/ and added a print "DSN=", dsn and those details were indeed correct.
I'm running the uwsgi app as user django and I'm connecting to Postgres as user django.

Anyway, what I needed to do to make it work was the following change:

# this is /etc/uwsgi/apps-enabled/peterbecom.ini
virtualenv = /var/lib/django/django-peterbecom/venv
pythonpath = /var/lib/django/django-peterbecom
user = django
uid = django   # THIS IS ADDED
master = true
processes = 3
env = DJANGO_SETTINGS_MODULE=peterbecom.settings
module = django_wsgi2:application

The difference here is the added uid = django.

I guess by moving across (I'm currently on uwsgi I get a newer version of uwsgi or something that simply can't just take the user directive but needs the uid directive too. That or something else complicated to do with the users and permissions that I don't understand.

Hopefully, by having blogged about this other people might find it and get themselves a little productivity boost.

Do you curl a lot to check headers?

05 September 2014 5 comments   Linux

I, multiple times per day, find myself wanting to find out what headers I get back on a URL but I don't care about the response payload. The command to use then is:

curl -v > /dev/null

That'll print out all the headers sent and received. Nice and crips.

So because I type this every day I made it into a shortcut script

cd ~/bin
echo '#!/bin/bash
> set -x
> curl -v "$@" > /dev/null
> ' > c
chmod +x c

If it's not clear what the code looks like, it's this:

set -x
curl -v "$@" > /dev/null

Now I can just type:


Or if I want to add some extra request headers for example:

c -H 'User-Agent: foobar'

set -ex - The most useful bash trick of the year

31 August 2014 3 comments   Linux

I just learned a really good bash trick which is something I've wanted to have but didn't really appreciate that it was possible so I never even searched for it.

set -ex

Ok, one thing at a time.

set -e

What this does, at the top of your bash script is that it exits as soon as any line in the bash script fails.
Suppose you have a script like this:

git pull origin master
find . | grep '\.pyc$' | xargs rm

If the first line fails you don't want the second line to execute and you don't want the third line to execute either. The naive solution is to "and" them:

git pull origin master && find . | grep '\.pyc$' | xargs rm && ./

but now it's just getting silly. (and is it even working?)

What set -e does is that it exits if any of the lines fail.

set -x

What this does is that it prints each command that is going to be executed with a little plus.
The output can look something like this:

+ rm -f pg_all.sql pg_all.sql.gz
+ pg_dumpall
+ apack pg_all.sql.gz pg_all.sql
++ date +%A
+ s3cmd put --reduced-redundancy pg_all.sql.gz s3://db-backups-peterbe/Sunday/
pg_all.sql.gz -> s3://db-backups-peterbe/Sunday/pg_all.sql.gz  [part 1 of 2, 15MB]
 15728640 of 15728640   100% in    0s    21.22 MB/s  done
pg_all.sql.gz -> s3://db-backups-peterbe/Sunday/pg_all.sql.gz  [part 2 of 2, 14MB]
 14729510 of 14729510   100% in    0s    21.50 MB/s  done
+ rm pg_all.sql pg_all.sql.gz

...when the script looks like this:

set -ex
rm -f pg_all.sql pg_all.sql.gz
pg_dumpall > pg_all.sql
apack pg_all.sql.gz pg_all.sql
s3cmd put --reduced-redundancy pg_all.sql.gz s3://db-backups-peterbe/`date +%A`/
rm pg_all.sql pg_all.sql.gz

And to combine these two gems you simply put set -ex at the top of your bash script.

Thanks @bramwelt for showing me this one.


Checkout out