Peterbe.com

A blog and website by Peter Bengtsson

I, multiple times per day, find myself wanting to find out what headers I get back on a URL but I don't care about the response payload. The command to use then is:

curl -v http://www.peterbe.com/ > /dev/null

That'll print out all the headers sent and received. Nice and crips.

So because I type this every day I made it into a shortcut script

cd ~/bin
echo '#!/bin/bash
> set -x
> curl -v "$@" > /dev/null
> ' > c
chmod +x c

If it's not clear what the code looks like, it's this:

#!/bin/bash
set -x
curl -v "$@" > /dev/null

Now I can just type:

c http://www.peterbe.com

Or if I want to add some extra request headers for example:

c -H 'User-Agent: foobar' http://www.peterbe.com

Because this took me quite a while to figure out, I thought I'd share in case somebody else is falling into the same pit of confusion.

When you write an attribute directive in angularjs you might want to have it fed by an attribute value.
For example, something like this:

<div my-attribute="somevalue"></div>

How then do you create a new scope that takes that in? It's not obvious. Any here's how you do it:

app.directive('myAttribute', function() {
    return {
        restrict: 'A',
        scope: {
            myAttribute: '='
        },
        template: '<div style="font-weight:bold">{{ myAttribute | number:2 }}</div>'
    };
});

The trick to notice is that the "self attribute" because the name of the attribute in camel case.

Thanks @mythmon for helping me figure this out.

I just learned a really good bash trick which is something I've wanted to have but didn't really appreciate that it was possible so I never even searched for it.

set -ex

Ok, one thing at a time.

set -e

What this does, at the top of your bash script is that it exits as soon as any line in the bash script fails.
Suppose you have a script like this:

git pull origin master
find . | grep '\.pyc$' | xargs rm
./restart_server.sh

If the first line fails you don't want the second line to execute and you don't want the third line to execute either. The naive solution is to "and" them:

git pull origin master && find . | grep '\.pyc$' | xargs rm && ./restart_server.sh

but now it's just getting silly. (and is it even working?)

What set -e does is that it exists if any of the lines fail.

set -x

What this does is that it prints each command that is going to be executed with a little plus.
The output can look something like this:

+ rm -f pg_all.sql pg_all.sql.gz
+ pg_dumpall
+ apack pg_all.sql.gz pg_all.sql
++ date +%A
+ s3cmd put --reduced-redundancy pg_all.sql.gz s3://db-backups-peterbe/Sunday/
pg_all.sql.gz -> s3://db-backups-peterbe/Sunday/pg_all.sql.gz  [part 1 of 2, 15MB]
 15728640 of 15728640   100% in    0s    21.22 MB/s  done
pg_all.sql.gz -> s3://db-backups-peterbe/Sunday/pg_all.sql.gz  [part 2 of 2, 14MB]
 14729510 of 14729510   100% in    0s    21.50 MB/s  done
+ rm pg_all.sql pg_all.sql.gz

...when the script looks like this:

#!/bin/bash
set -ex
rm -f pg_all.sql pg_all.sql.gz
pg_dumpall > pg_all.sql
apack pg_all.sql.gz pg_all.sql
s3cmd put --reduced-redundancy pg_all.sql.gz s3://db-backups-peterbe/`date +%A`/
rm pg_all.sql pg_all.sql.gz

And to combine these two gems you simply put set -ex at the top of your bash script.

Thanks @bramwelt for showing me this one.

Do you want to display some code in a Keynote presentation?

It's easy. All you need is Homebrew installed.

First you need to install the program highlight.

$ brew install highlight

So you have a piece of code. For example some Python code. The take that snippet of code and save it to a file like code.py. Now all you need to do is run this:

$ highlight -O rtf code.py | pbcopy

Then, switch back into Keynote and simply paste.

But if you don't want to create a file of the snippet, simply copy the snippet from within your editor and run this:

$ pbpaste | highlight -S py -O rtf | pbcopy

The -S py means "syntax is py (for python)".

You can use highlight for a bunch of other things like creating HTML. See man highlight for more tips.

First of all, Dropbox is awesome. When I plug in my iPhone to my laptop it automatically backs up all my photos to Dropbox fastly and conveniently. If I want to share a photo or a movie I can easily get a "Share link" to show friends and family.

The only disadvantage with Dropbox is that you only get 5Gb for free. Upgrading to the 100Gb Pro account is $10 per month. Not a massive amount but my inner geek uses that as an excuse to hack around it.

So, what I do is set up a S3 bucket. Let's call it camera-uploads-peterbe and then I use s3cmd to upload all the pictures by month. Like this:

$ cd Dropbox/Camera\ Uploads/
$ s3cmd put --reduced-redundancy 2014-01-* s3://camera-uploads-peterbe/2014/
$ rm 2014-01-*

I'm sure that by writing this here, people will write comments saying much better ways to do properly offside backups and I'm eager to hear that but for me S3 feels great. It's tools I'm very familiar with. It's a very established and mature business so it's unlikely to go away any time soon. It's secure and it's incredibly cheap because there's virtually no transfer of these files.

Also, I like how it's clearly explicit and simple. I don't have to worry about some obscure background app not working properly and me not noticing.

Let's see what the future holds. At the moment I'm just worrying about storing the files but it's a bit clunky to retrieve individual files.

One of my most popular GitHub Open Source projects is premailer. It's a python library for combining HTML and CSS into HTML with all its CSS inlined into tags. This is a useful and necessary technique when sending HTML emails because you can't send those with an external CSS file (or even a CSS style tag in many cases).

The project has had 23 contributors so far and as always people come in get some itch they have scratched and then leave. I really try to get good test coverage and when people come with code I almost always require that it should come with tests too.

But sometimes you miss things. Also, this project was born as a weekend hack that slowly morphed into an actual package and its own repository and I bet there was code from that day that was never fully test covered.

So today I combed through the code and plugged all the holes where there wasn't test coverage.
Also, I set up Coveralls (project page) which is an awesome service that hooks itself up with Travis CI so that on every build and every Pull Request, the tests are run with --with-cover on nosetests and that output is reported to Coveralls.

The relevant changes you need to do are:

1) You need to go to coveralls.io (sign in with your GitHub account) and add the repo.
2) Edit your .travis.yml file to contain the following:

before_install:
    - pip install coverage
...
after_success:
    - pip install coveralls
    - coveralls

And you need to execute your tests so that coverage is calculated (the coverage module stores everything in a .coverage file which coveralls analyzes and sends). So in my case I change to this:

script:
    - nosetests premailer --with-cover --cover-erase --cover-package=premailer

3) You must also give coveralls some clues. So it reports on only the relevant files. Here's what mine looked like:

[run]
source = premailer

[report]
omit = premailer/test*

Now, I get to have a cute "coverage: 100%" badge in the README and when people post pull requests Coveralls will post a comment to reflect how the pull request changes the test coverage.

I am so grateful for all these wonderful tools. And it's all free too!

I just rolled out a change here on my personal blog which I hope will make my few visitors happy.

Basically; when you hover over a link (local link) long enough it prefetches it (with AJAX) so that if you do click it's hopefully already cached in your browser.

If you hover over a link and almost instantly hover out it cancels the prefetching. The assumption here is that if you deliberately put your mouse cursor over a link and proceed to click on it you want to go there. Because your hand is relatively slow I'm using the opportunity to prefetch it even before you have clicked. Some hands are quicker than others so it's not going to help for the really quick clickers.

What I also had to do was set a Cache-Control header of 1 hour on every page so that the browser can learn to cache it.

The effect is that when you do finally click the link, by the time your browser loads it and changes the rendered output it'll hopefully be able to do render it from its cache and thus it becomes visually ready faster.

Let's try to demonstrate this with this horrible animated gif:
(or download the screencast.mov file)

Screencast
1. Hover over a link (in this case the "Now I have a Gmail account" from 2004)
2. Notice how the Network panel preloads it
3. Click it after a slight human delay
4. Notice that when the clicked page is loaded, its served from the browser cache
5. Profit!

So the code that does is is quite simply:

$(function() {
  var prefetched = [];
  var prefetch_timer = null;
  $('div.navbar, div.content').on('mouseover', 'a', function(e) {
    var value = e.target.attributes.href.value;
    if (value.indexOf('/') === 0) {
      if (prefetched.indexOf(value) === -1) {
        if (prefetch_timer) {
          clearTimeout(prefetch_timer);
        }
        prefetch_timer = setTimeout(function() {
          $.get(value, function() {
            // necessary for $.ajax to start the request :(
          });
          prefetched.push(value);
        }, 200);
      }
    }
  }).on('mouseout', 'a', function(e) {
    if (prefetch_timer) {
      clearTimeout(prefetch_timer);
    }
  });
});

Also, available on GitHub.

I'm excited about this change because of a couple of reasons:

  1. On mobile, where you might be on a non-wifi data connection you don't want this. There you don't have the mouse event onmouseover triggering. So people on such devices don't "suffer" from this optimization.
  2. It only downloads the HTML which is quite light compared to static assets such as pictures but it warms up the server-side cache if needs be.
  3. It's much more targetted than a general prefetch meta header.
  4. Most likely content will appear rendered to your eyes faster.

So I have a massive chunk of JSON that a Django view is sending to a piece of Angular that displays it nicely on the page. It's big. 674Kb actually. And it's likely going to be bigger in the near future. It's basically a list of dicts. It looks something like this:

>>> pprint(d['events'][0])
{u'archive_time': None,
 u'archive_url': u'/manage/events/archive/1113/',
 u'channels': [u'Main'],
 u'duplicate_url': u'/manage/events/duplicate/1113/',
 u'id': 1113,
 u'is_upcoming': True,
 u'location': u'Cyberspace - Pacific Time',
 u'modified': u'2014-08-06T22:04:11.727733+00:00',
 u'privacy': u'public',
 u'privacy_display': u'Public',
 u'slug': u'bugzilla-development-meeting-20141115',
 u'start_time': u'15 Nov 2014 02:00PM',
 u'start_time_iso': u'2014-11-15T14:00:00-08:00',
 u'status': u'scheduled',
 u'status_display': u'Scheduled',
 u'thumbnail': {u'height': 32,
                u'url': u'/media/cache/e7/1a/e71a58099a0b4cf1621ef3a9fe5ba121.png',
                u'width': 32},
 u'title': u'Bugzilla Development Meeting'}

So I thought one hackish simplification would be to convert each of these dicts into an list with a known sort order. Something like this:

>>> event = d['events'][0]
>>> pprint([event[k] for k in sorted(event)])
[None,
 u'/manage/events/archive/1113/',
 [u'Main'],
 u'/manage/events/duplicate/1113/',
 1113,
 True,
 u'Cyberspace - Pacific Time',
 u'2014-08-06T22:04:11.727733+00:00',
 u'public',
 u'Public',
 u'bugzilla-development-meeting-20141115',
 u'15 Nov 2014 02:00PM',
 u'2014-11-15T14:00:00-08:00',
 u'scheduled',
 u'Scheduled',
 {u'height': 32,
  u'url': u'/media/cache/e7/1a/e71a58099a0b4cf1621ef3a9fe5ba121.png',
  u'width': 32},
 u'Bugzilla Development Meeting']

So I converted my sample events.json file like that:

$ l -h events*
-rw-r--r--  1 peterbe  wheel   674K Aug  8 14:08 events.json
-rw-r--r--  1 peterbe  wheel   423K Aug  8 15:06 events.optimized.json

Excitingly the file is now 250Kb smaller because it no longer contains all those keys.

Now, I'd also send the order of the keys so I could do something like this in the AngularJS code:

 .success(function(response) {
   events = []
   response.events.forEach(function(event) {
     var new_event = {}
     response.keys.forEach(function(key, i) {
       new_event[k] = event[i]
     })
   })
 })

Yuck! Nested loops! It was just getting more and more complicated.
Also, if there are keys that are not present in every element, it means I'd have to replace them with None.

At this point I stopped and I could smell the hackish stink of sulfur of the hole I was digging myself into.
Then it occurred to me, gzip is really good at compressing repeated things which is something we have plenty of in a document store type data structure that a list of dicts is.

So I packed them manually to see what we could get:

$ apack events.json.gz events.json
$ apack events.optimized.json.gz events.optimized.json

And without further ado...

$ l -h events*
-rw-r--r--  1 peterbe  wheel   674K Aug  8 14:08 events.json
-rw-r--r--  1 peterbe  wheel    90K Aug  8 14:20 events.json.gz
-rw-r--r--  1 peterbe  wheel   423K Aug  8 15:06 events.optimized.json
-rw-r--r--  1 peterbe  wheel    81K Aug  8 15:07 events.optimized.json.gz

Basically, all that complicated and slow hoopla for saving 10Kb. No thank you.

Thank you gzip for existing!

Lots of Eriks
Sometimes it feels like there's almost no unique names amongst my Facebook friends. Whenever I use the search function and start typing someones name at least two friends' names pop up.
However, I only have one friend called ├ůsa and one friend called Conrad. So it got me wondering how many of my friends names are unique?

So I pulled down the names of all my 500+ friends and counted them. The results were suprising!

0 in common 291
1 in common 53
2 in common 15
3 in common 10
4 in common 1
5 in common 3
6 in common 2

That means that 56% of my friends' names are unique! Much more than I expected. I should add, the bulk of my friends are primarily from two countries: USA and Sweden. From all over the world too but these are the most common. However, because of this I have some friends who have the locale's spelling. I.e. "Michael" in USA versus "Mikael" in Sweden.

So, if I manually go through the list and look at some names and come up with some aliases so that Eric and Erik counts as the same name, I get a lot less uniqueness. The list I made is available here and it's very much a quick judgement call based on a rough idea of the "stemming" of these names.

It actually didn't help that much. The spread now looks like this:

0 in common 260
1 in common 44
2 in common 14
3 in common 12
4 in common 6
5 in common 2
6 in common 1
7 in common 4

And the number of people with unique names goes down to 50.1%. Still more than half.

Last but not least, the most common names for me were:

  • Michael (8 people)
  • Mat (8)
  • Eric (8)
  • Christopher (8)
  • Dave (7)
  • Sara (6)

Yesterday I had the good fortune to present Crontabber to the The San Francisco Bay Area PostgreSQL Meetup Group organized by my friend Josh Berkus.

To spare you having to watch the whole presentation I'm going to summarize some of it here.

My colleague Lars also has a blog post about Crontabber and goes into a bit more details about the nitty gritty of using PostgreSQL.

What is crontabber?

It's a tool for running cron jobs. It's written in Python and PostgreSQL and it's a tool we need for Socorro which has lots and lots of stored procedures and cron jobs.

So it's basically a script that gets started by UNIX crontab and runs every 5 minutes. Internally it keeps an index of all the apps it needs to run and it manages dependencies between jobs and is self-healing meaning that if something goes wrong during run-time it just retries again and again until it works. Amongst many of the juicy features it offers on top of regular "cron wrappers" is something we call "backfilling".

Backfill jobs are jobs that happen on a periodic basis and get given a date. If all is working this date is the same as "now()" but if something was to go wrong it remembers all the dates that did not work and re-attempts with those exact dates. That means that you can guarantee that the job gets started with every date possible even if it needs to catch up on past dates.

There's plenty of documentation on how to install and create jobs but it all starts with a simple pip install crontabber.

To get a feel for how you write crontabber "apps", checkout the ones for Socorro or flick through the slides in the PDF.

Is it mature?

Yes! It all started in early 2012 as a part of the Socorro code base and after some hard months of it stabalizing and maturing we decided to extract it out of Socorro and into its own project on GitHub under the Mozilla Public Licence 2.0 licence. Now it stands on its own legs and has no longer anything to do with Socorro and can be used for anything and anyone who has a lot of complicated cron jobs that need to run in Python with a PostgreSQL connection. In Socorro we use it primarily for executing stored procedures but that's just one type of app. You can also make it call out on to anything the command line with a "@with_subprocess" decorator.

Is it finished?

No. It works really well for us but there's a decent list of features we want to add. The hope is that by open sourcing it we can get other organizations to adopt it and not only find bugs but also contribute code to add more juicy features.

One of the soon-to-come features we have in mind is to "internalize locking". At the moment you have to wrap it in a bash script that prevents it from being run concurrently. Crontabber is single-threaded and we don't have to worry about "dependent jobs" starting before "parent jobs" because the depdendencies and their state is stored in the database. But you might need to worry about the same job (the one due next) to be started concurrently. By internalizing the locking we can store, in the state database, that a particular job is being started on and thus not have to worry about it starting the same job twice.

I really hope this project can grow and continue to support us in our needs.