Peterbe.com

A blog and website by Peter Bengtsson

I was going to title this blog post "I don't want your stinkin' password!" but realised that this isn't the first site that uses entirely OpenID, OAuth and stuff.

On Around The World you can now log in with either your Google account, your Twitter account or simply by entering your email. It looks like this:

Sign-in screen Screenshot of email

What's neat about this is that it works independent of if you've signed in before (aka. log in) or if you're new (aka. register).

What's not so neat about it is that people might not recognize it. We're so used to both registration forms and log in forms to ask for passwords. Often, you can quickly tell of it's log in because you expect two input fields.

Another slight flaw with this is the fact that my emails usually take several tens of seconds to send. This is because they're sent by a cron job async. So, people who enter their email address might get disappointed if they don't get the email immediately.

Anyway, let's wait and see if people actually use it. At least it means you don't really need a third party service and you don't need to type in a password.

The Poincaré Conjecture
I just finished a wondeful book, The Poincare Conjecture: In Search of the Shape of the Universe by Donal O'Shea, and because I'm not very good at writing I'm just going to quote a good chunk:

Mathematics reminds us how much we depend on one another, both on the insight and imagination of those who have lived before us, and on those who comprise the social and cultural institutions, schools and universities, that give children an education that allows them to fully engage the ideas of their times. It is up to all of us to ensure that the legacy of our times is a society that stewards and develops our common mathematical inheritance. For mathematics is one of the quintessentially human activities that makes us more fully human and, in so doing, leads us to transcend ourselves.

Looking up at the night sky, at the distant stars and galaxies and clusters of galaxies, it is inconceivable to me that there are not other intelligences out there, some far different then us. Hundreds of years hence, if we ever develop technologies that enable us to meet and to communicate, we will discover that they will know, or want to lknow, that the only compact three-dimensional manifold in which every loop can be shrunk to a point is a three-sphere. Count on it.

There were lots of mathematical concepts in this book that I didn't understand, but these two paragraphs I surely understood.

(if you're wondering what you're doing here, jed is a hardcore text based editor for programmers)

Thanks to fellow Jed user and hacker Ullrich Horlacher I can now have local settings per directory.

I personally prefer 2 spaces in my Javascript. And thankfully most projects I work on agrees with that standard. However, I have one Mozilla project I work on which uses 4 spaces for indentation. So, what I've had to get used to to is to edit my ~/.jedrc every time I switch to work on that particular project. I change: variable C_INDENT = 2; to variable C_INDENT = 4; and then back again when switching to another project.

No more of that. Now I just add a file into the project root like this:

$ cd dev/airmozilla
$ cat .jed.sl
variable C_INDENT = 4;

And whenever I work on any file in that tree it applies the local override setting.

Here's how you can do that too:

First, put this code into your <your jed lib>/defaults.sl: (on my OSX, the jed lib is /usr/local/Cellar/jed/0.99-19/jed/lib/)

% load .jed.sl from current or parent directories
% but only if the user is the same
define load_local_config() {
  variable dir = getcwd();
  variable uid = getuid;
  variable jsl,st;
  while (dir != "/" and strlen(dir) > 1) {
    st = stat_file(dir);
    if (st == NULL) return;
    if (st.st_uid != uid) return;
    jsl = dir + "/.jed.sl";
    st = stat_file(jsl);
    if (st != NULL) {
      if (st.st_uid == uid) {
        pop(evalfile(jsl));
        return;
      }
    }
    dir = path_dirname(dir);
  }
}

Then add this to the bottom of your ~/.jedrc:

define startup_hook() {
  load_local_config(); % .jed.sl
}

Now, go into a directory where you want to make local settings, create a file called .jed.sl and fill it to your hearts content!

This took me by surprise today!

If you run this unit test, it actually passes with flying colors:

import unittest

class BadAssError(TypeError):
    pass

def foo():
    raise BadAssError("d'oh")

class Test(unittest.TestCase):

    def test(self):
        self.assertRaises(BadAssError, foo)
        self.assertRaises(TypeError, foo)
        self.assertRaises(Exception, foo)

if __name__ == '__main__':
    unittest.main()

Basically, assertRaises doesn't just take the exception that is being raised and accepts it, it also takes any of the raised exceptions' parents.

I've only tested it with Python 2.6 and 2.7. And the same works equally with unittest2.

I don't really know how I feel about this. It did surprise me when I was changing one of the exceptions and expected the old tests to break but they didn't. I mean, if I want to write a test that really makes sure the exception really is BadAssError it means I can't use assertRaises().

Being a recruiter is hard work. A lot of pressure and having to deal with people's egos. Although I have no plans to leave Mozilla any time soon, it's still some sort of value in seeing that my skills are sought after in the industry. That's why I haven't yet completely cancelled my LinkedIn membership.

When I get automated emails from bots that just scrape LinkedIn I don't bother. Sometimes I get emails from recruiters who have actually studied my profile (my blog, my projects, my github, etc) and then I do take the time to reply and say "Hi Name! Thank you for reaching out. It looks really exciting but it's not for me at the moment. Keep up the good work!"

Then there's this new trend where people appear to try to automate what the bots do by doing it manually but without actually reading anything. I understand that recruiters are under a lot of pressure to deliver and try to reach out to as many potential candidates as possible but my advice is: if you're going to do, do it properly. You'll reach fewer candidates but it'll mean so much more.

I got this email the other day about a job offer at LinkedIn:
Shaming a stressed out recruiter from LinkedIn

  • I have a Swedish background. Not "Sweetish". And what difference does that make?
  • I haven't worked on "FriedZopeBase" (which is on my github) for several years
  • I haven't worked on "IssueTrackerProduct" for several years
  • Let's not "review [my] current employment". That's for me to think about.

So what can we learn from this? Well, for starters if you're going pretend to have taken time, do it properly! If you don't have time to do in-depth research on a candidate, then don't pretend that you have.

I got another recruiter emailing me personally yesterday and it was short and sweet. No mention of free lunch or other superficial trappings. The only personal thing about it was that it had my first name. I actually bothered to reply to them and thank them for reaching out.

People rarely type in long URLs. Therefore it's unlikely that one little typo in that long URL is the the deciding factor whether you get a 200 Found or a 404 Not Found.

However, what people often do is type in a domain name and hit enter. Sometimes they fumble and miss a character or accidentally add an additional one and ultimately land on this error:

One little typo and it looks like your Internet is down

Another thing I often do is I type the start of the domain name and fumble with the Awesome Bar and accidentally try to reach just the start of the domain. Like www.mozill for example.

The browser should in these cases be able to recognize the mistake and offer a nice "Did you mean this domain?" button or something that makes it one click to correct the innocent fumble.

How it could do this would be quite simple. It could record every domain you've visited based on your history. Then it could compute a an Edit distance and if it finds exactly one suggestion, offer it.

Here's how you can use an Edit distance algorithm:

>>> from edit_distance import EditDistance
>>> ed = EditDistance(('www.peterbe.com', 'www.mozilla.org', 'news.ycombinator.com',
...                    'twitter.com', 'www.facebook.com', 'github.com'))
>>> ed.match('www.peterbe.cm')
[u'www.peterbe.com']
>>> ed.match('twittter.com')
['twitter.com']
>>> ed.match('www.faecbook.com')
['www.facebook.com']
>>> ed.match('github.comm')
['github.com']
>>> ed.match('neverheardof')
[]

Here's the implementation I used.

Of course, this functionality should only kick in in the most desperate of cases. Ie. the URL can't resolve to anything. If someone is clever enough to buy the domain name facebok.com they deserve their traffic. And equally, if you type something like ww.peterbe.com or wwww.peterbe.com I've already set that up redirect to www.peterbe.com.

Here's what it could look like instead:
Here's what the improved error page could look like

First of all, the title is perhaps misleading. Basically, don't put plain script tags that are not async in the head tag.

If you put a piece of javascript in the head of HTML page, the browser will start to download that and proceed down the lines of HTML and download other resources too as it encounters them such as the CSS files.

Then, when all javascript and CSS has been downloaded it will start rendering the page and when it does that it will download any images referenced in the HTML. At roughly the same time it will start to display things on the screen. But it won't do this until the CSS and Javascript has been downloaded.

To repeat: The browser screen will appear blank. It won't start downloading any images if downloading a javascript URL referenced in the head gets stuck.

Here are two perfectly good examples from this morning's routine hunt for news:

Wired.com is relying on some external resource to load (forever!) until anything is rendered.
Wired.com is guilty

getharvest.com depends on Rackspace's CDN before anything is displayed
getharvest.com is guilty

Here's what getharvest.com does in their HTML:

<!DOCTYPE html>
<html lang="en">

  <head>
    <script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script>
    <script type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>
  ...

Why it gets stuck on connecting to c761485.r85.cf2.rackcdn.com I just don't know. But it does. The Internet is like that oftentimes. You simply can't connect to otherwise perfectly configured web servers.

Update-whilst-writing-this-text! As I was writing this text I gave getharvest.com a second chance thinking that most likely the squirrels in my internet tubes will be back up and running to rackcdn.com but then this happened!

So, what's the right thing to do? Simple: don't rely on external resources. For example, can you move the Javascript script tag to the very very bottom of the HTML page. That way it will render as much as it possibly can whilst waiting for the Javascript resource to get unstuck. Or, almost equally you can keep the script tag in the <head> but then but in async attribute on it like this:

<script async type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>

Another thing you can do is not use an external resource URL (aka. third-party domain). Instead of using cdn.superfast.com/file.js you instead use /file.js. Sure, that fancy CDN might be faster at serving up stuff than your server but looking up a CDN's domain is costing one more DNS lookup which we know can be very expensive for that first-time impression.

I know I'm probably guilty of this new on some of my (now) older projects. For example, if you open aroundtheworldgame.com it won't render anything until it has managed to connect to maps.googleapis.com and dn4avfivo8r6q.cloudfront.net but that's more of an app rather than a web site.

By the way... I wrote some basic code to play around with how this actually works. I decided to put this up in case you want to experiment with it too: https://github.com/peterbe/slowpage

Thanks to Theo Spears awesome effort premailer now support CSS specificity. What that means is that when linked and inline CSS blocks are transformed into tag style attributes the order is preserved as you'd expect.

When the browser applies CSS to elements it does it in a specific order. For example if you have this CSS:

p.tag { color: blue; }
p { color: red; }

and this HTML:

<p>Regular text</p>
<p class="tag">Special text</p>

the browser knows to draw the first paragraph tag in red and the second paragraph in red. It does that because p.tag is more specific that p.

Before, it would just blindly take each selector and set it as a style tag like this:

<p style="color:red">Regular text</p>
<p style="color:blue; color:red" class="tag">Special text</p>


which is not what you want.

The code in action is here.

Thanks Theo!

This morning I came across this site on Hacker News. It's a cute site with some basic tips on how to make your sites faster.

It's very much a for-beginners document as all the tips are quite basic. For example it doesn't even mention the use of CDNs.

One tip in particular stood out to me: "it can be useful to minify your HTML with automated tools."
And it links to the htmlcompressor project. Ignore this advice.

What matters 10 times more is Gzip compression. This is usually very easy to set up with Nginx or Apache. It's not something you do in your web framework and if you don't have a web framework, you don't need to manully Gzip HTML files on the filesystem.

For example, downloading the home page here on my blog, at the time of writing, this is: 66,770 bytes big. Hefty, sure, but with all excess whitespace removed it reduces down to 59,356 bytes. But that really doesn't matter when you Gzip.

Gzipped from original version: 18,470 bytes
Gzipped from whitespace trimmed version: 18,086 bytes

The gain is 2% which is definitely not worth the hassle of adding a whitespace compressor.

If you use django-fancy-cache you can either run with stats or without. With stats, you can get a number of how many times a cache key "hits" and how many times it "misses". Keeping stats incurs a small performance slowdown. But how much?

I created a simple page that either keeps stats or ignores it. I ran the benchmark over Nginx and Gunicorn with 4 workers. The cache server is a memcached running on the same host (my OSX 10.7 laptop).

With stats:

Average: 768.6 requests/second
Median: 773.5 requests/second
Standard deviation: 14.0

Without stats:

Average: 808.4 requests/second
Median: 816.4 requests/second
Standard deviation: 30.0

That means, roughly that running with stats incurs a 6% slower performance.

The stats is completely useless to your users. The stats tool is purely for your own curiousity and something you can switch on and off easily.

Note: This benchmark assumes that the memcached server is running on the same host as the Nginx and the Gunicorn server. If there was more network in between, obviously all the .incr() commands would cause more performance slowdown.