Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page! Currently only showing blog entries under the category: Web development. Clear filter

Being a recruiter is hard work. A lot of pressure and having to deal with people's egos. Although I have no plans to leave Mozilla any time soon, it's still some sort of value in seeing that my skills are sought after in the industry. That's why I haven't yet completely cancelled my LinkedIn membership.

When I get automated emails from bots that just scrape LinkedIn I don't bother. Sometimes I get emails from recruiters who have actually studied my profile (my blog, my projects, my github, etc) and then I do take the time to reply and say "Hi Name! Thank you for reaching out. It looks really exciting but it's not for me at the moment. Keep up the good work!"

Then there's this new trend where people appear to try to automate what the bots do by doing it manually but without actually reading anything. I understand that recruiters are under a lot of pressure to deliver and try to reach out to as many potential candidates as possible but my advice is: if you're going to do, do it properly. You'll reach fewer candidates but it'll mean so much more.

I got this email the other day about a job offer at LinkedIn:
Shaming a stressed out recruiter from LinkedIn

  • I have a Swedish background. Not "Sweetish". And what difference does that make?
  • I haven't worked on "FriedZopeBase" (which is on my github) for several years
  • I haven't worked on "IssueTrackerProduct" for several years
  • Let's not "review [my] current employment". That's for me to think about.

So what can we learn from this? Well, for starters if you're going pretend to have taken time, do it properly! If you don't have time to do in-depth research on a candidate, then don't pretend that you have.

I got another recruiter emailing me personally yesterday and it was short and sweet. No mention of free lunch or other superficial trappings. The only personal thing about it was that it had my first name. I actually bothered to reply to them and thank them for reaching out.

First of all, the title is perhaps misleading. Basically, don't put plain script tags that are not async in the head tag.

If you put a piece of javascript in the head of HTML page, the browser will start to download that and proceed down the lines of HTML and download other resources too as it encounters them such as the CSS files.

Then, when all javascript and CSS has been downloaded it will start rendering the page and when it does that it will download any images referenced in the HTML. At roughly the same time it will start to display things on the screen. But it won't do this until the CSS and Javascript has been downloaded.

To repeat: The browser screen will appear blank. It won't start downloading any images if downloading a javascript URL referenced in the head gets stuck.

Here are two perfectly good examples from this morning's routine hunt for news:

Wired.com is relying on some external resource to load (forever!) until anything is rendered.
Wired.com is guilty

getharvest.com depends on Rackspace's CDN before anything is displayed
getharvest.com is guilty

Here's what getharvest.com does in their HTML:

<!DOCTYPE html>
<html lang="en">

  <head>
    <script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script>
    <script type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>
  ...

Why it gets stuck on connecting to c761485.r85.cf2.rackcdn.com I just don't know. But it does. The Internet is like that oftentimes. You simply can't connect to otherwise perfectly configured web servers.

Update-whilst-writing-this-text! As I was writing this text I gave getharvest.com a second chance thinking that most likely the squirrels in my internet tubes will be back up and running to rackcdn.com but then this happened!

So, what's the right thing to do? Simple: don't rely on external resources. For example, can you move the Javascript script tag to the very very bottom of the HTML page. That way it will render as much as it possibly can whilst waiting for the Javascript resource to get unstuck. Or, almost equally you can keep the script tag in the <head> but then but in async attribute on it like this:

<script async type="text/javascript" src="http://c761485.r85.cf2.rackcdn.com/gascript.js"></script>

Another thing you can do is not use an external resource URL (aka. third-party domain). Instead of using cdn.superfast.com/file.js you instead use /file.js. Sure, that fancy CDN might be faster at serving up stuff than your server but looking up a CDN's domain is costing one more DNS lookup which we know can be very expensive for that first-time impression.

I know I'm probably guilty of this new on some of my (now) older projects. For example, if you open aroundtheworldgame.com it won't render anything until it has managed to connect to maps.googleapis.com and dn4avfivo8r6q.cloudfront.net but that's more of an app rather than a web site.

By the way... I wrote some basic code to play around with how this actually works. I decided to put this up in case you want to experiment with it too: https://github.com/peterbe/slowpage

This morning I came across this site on Hacker News. It's a cute site with some basic tips on how to make your sites faster.

It's very much a for-beginners document as all the tips are quite basic. For example it doesn't even mention the use of CDNs.

One tip in particular stood out to me: "it can be useful to minify your HTML with automated tools."
And it links to the htmlcompressor project. Ignore this advice.

What matters 10 times more is Gzip compression. This is usually very easy to set up with Nginx or Apache. It's not something you do in your web framework and if you don't have a web framework, you don't need to manully Gzip HTML files on the filesystem.

For example, downloading the home page here on my blog, at the time of writing, this is: 66,770 bytes big. Hefty, sure, but with all excess whitespace removed it reduces down to 59,356 bytes. But that really doesn't matter when you Gzip.

Gzipped from original version: 18,470 bytes
Gzipped from whitespace trimmed version: 18,086 bytes

The gain is 2% which is definitely not worth the hassle of adding a whitespace compressor.

This personal blog site of mine uses django-fancy-cache and mincss.

What that means is that I can cache the whole output of every blog post for weeks and when I do that I can first preprocess the HTML and convert every external CSS into one inline STYLE block which will only reference selectors that are actually used.

To see it in action, right-click and select "View Page Source". You'll see something like this:

/*
Stats about using github.com/peterbe/mincss
-------------------------------------------
Requests:         1 (now: 0)
Before:           81Kb
After:            11Kb
After (minified): 11Kb
Saving:           70Kb
*/
section{display:block}html{font-size:100%;-webkit-text-size-adjust:100%;-ms-tex...

The reason the saving is so huge, in my case, is because I'm using Twitter Bootstrap CSS framework which is awesome but as any framework, it will inevitably contain a bunch of stuff that I don't use. Some stuff I don't use on any page at all. Some stuff is used only on some pages and some other stuff is used only on some other pages.

What I gain by this, is faster page loads. What the browser does is that it, gets a URL, downloads all HTML, opens the HTML to look for referenced CSS (using the link tag) and downloads that too. Once all of that is downloaded, it starts to render the page. Approximately after that it starts to download all referenced Javascript and starts evaluating and executing that.

By not having to download the CSS the browser has one less thing to do. Only one request? Well, that request might be on a CDN (not a great idea actually) so even though it's just 1 request it will involve another DNS look-up.

Here's what the loading of the homepage looks like in Firefox from a US east coast IP.

Granted, a downloaded CSS file can be cached by the browser and used for other pages under the same domain. But, on my blog the bounce rate is about 90%. That doesn't necessarily mean that visitors leave as soon as they arrived, but it does mean that they generally just read one page and then leave. For those 10% of visitors who visit more than one page will have to download the same chunk of CSS more than once. But mind you, it's not always the same chunk of CSS because it's different for different pages. And the amount of CSS that is now in-line only adds about 2-3Kb on the HTML load when sent gzipped.

Getting to this point wasn't easy because I first had to develop mincss and django-fancy-cache and integrate it all. However, what this means is that you can have it done on your site too! All the code is Open Source and it's all Python and Django which are very popular tools.

First of all, to find out what mincss is read this blog post which explains what the heck this new Python tool is.

My personal website is an ideal candidate for using mincss because it uses an un-customized Bootstrap CSS which weighs over 80Kb (minified) and on every page hit, the rendered HTML is served directly from memcache so dynamic slowness is not a problem. With that, what I can do is run mincss just before the rendered (from Django) output HTML is stored in memcache. Also, what I can do is take ALL inline style blocks and all link tags and combine them into one big inline style block. That means that I can reduce any additional HTTP connections needed down to zero! Remember, "Minimize HTTP Requests" is the number one web performance optimization rule.

To get a preview of that, compare http://www.peterbe.com/about with http://www.peterbe.com/about3. Visually no difference. But view the source :)

Before:
Document size: Before

After:
Document size: After

Voila! One HTTP request less and 74Kb less!

Now, as if that wasn't good enough, let's now take into account that the browser won't start rendering the page until the HTML and ALL CSS is "downloaded" and parsed. Without further ado, let's look at how much faster this is now:

Before:
Waterfall view: Before
report

After:
Waterfall view: After
report

How cool is that! The "Start Render" event is fired after 0.4 seconds instead of 2 seconds!

Note how the "Content Download" isn't really changing. That's because no matter what the CSS is, there's still a tonne of images yet to download.

That example page is interesting too because it contains a piece of Javascript that is fired on the window.onload that creates little permalink links into the document and the CSS it needs is protected thanks to the /* no mincss */ trick as you can see here.

The code that actually implements mincss here is still very rough and is going to need some more polishing up until I publish it further.

Anyway, I'm really pleased with the results. I'm going to tune the implementation a bit further and eventually apply this to all pages here on my blog. Yes, I understand that the CSS, if implemented as a link, can be reused thanks to the browser's cache but visitors of my site rarely check out more than one page. In fact, the number of "pages per visit" on my blog is 1.17 according to Google Analytics. Even if this number was bigger I still think it would be a significant web performance boost.

UPDATE

Steve Souders points out a flaw in the test. See his full comment below. Basically, what appears to happen in the first report, IE8 downlads the file c98c3dfc8525.css twice even though it returns as a 200 the first time. No wonder that delays the "Start Render" time.

So, I re-ran the test with Firefox instead (still from the US East coast):

Before:
WebpageTest before (Firefox)
report

After:
WebpageTest after (Firefox)
report

That still shows a performance boost from 1.4 seconds down to 0.6 seconds when run using Firefox.

Perhaps it's a bug in Webpagetest or perhaps it's simply how IE8 works. In a sense it "simulates" the advantages of reducing the dependency on extra HTTP requests.

A project I started before Christmas (i.e. about a month ago) is now production ready.

mincss (code on github) is a tool that when given a URL (or multiple URLs) downloads that page and all its CSS and compares each and every selector in the CSS and finds out which ones aren't used. The outcome is a copy of the original CSS but with the selectors not found in the document(s) removed. It goes something like this:

>>> from mincss.processor import Processor
>>> p = Processor()
>>> p.process_url('http://www.peterbe.com')
>>> p.process()
>>> p.inlines
[]
>>> p.links
[<mincss.processor.LinkResult object at 0x10a3bbe50>, <mincss.processor.LinkResult object at 0x10a4d4e90>]
>>> one = p.links[0]
>>> one.href
'//d1ac1bzf3lrf3c.cloudfront.net/static/CACHE/css/c98c3dfc8525.css'
>>> len(one.before)
83108
>>> len(one.after)
10062
>>> one.after[:70]
u'header {display:block}html{font-size:100%;-webkit-text-size-adjust:100'

To whet your appetite, running it on any one of my pages here on my blog it goes from: 82Kb down to 7Kb. Before you say anything; yes I know its because I using a massive (uncustomized) Twitter Bootstrap file that contains all sorts of useful CSS that I'm not using more than 10% of. And yes, those 10% on one page might be different from the 10% on another page and between them it's something like 15%. Add a third page and it's 20% etc. But, because I'm just doing one page at a time, I can be certain it will be enough.

One way of using mincss is to run it on the command line and look at the ouput, then audit it and give yourself an idea of selectors that aren't used. A safer way is to just do one page at a time. It's safer.

The way it works is that it parses the CSS payload (from inline blocks or link tags) with a relatively advanced regular expression and then loops over each selector one at a time and runs it with cssselect (which uses lxml) to see if the selector is used anywhere. If the selector isn't used the selector is removed.

I know I'm not explaining it well so I put together a little example implementation which you can download and run locally just to see how it works.

Now, regarding Javascript and DOM manipulations and stuff; there's not a lot you can do about that. If you know exactly what your Javascript does, for example, creating a div with class loggedin-footer you can prepare your CSS to tell mincss to leave it alone by adding /* no mincss */ somewhere in the block. Again, look at the example implementation for how this can work.

An alternative is to instead of using urllib.urlopen() you could use a headless browser like PhantomJS which will run it with some Javascript rendering but you'll never cover all bases. For example, your page might have something like this:

$(function() {
  $.getJSON('/is-logged-in', function(res) {
    if (res.logged_in) {
      $('<div class="loggedin-footer">').appendTo($('#footer'));
    }
  });
});

But let's not focus on what it can not do.

I think this can be a great tool for all of us who either just download a bloated CSS framework or you have a legacy CSS that hasn't been updated as new HTML is added and removed.

The code is Open Source (of course) and patiently awaiting your pull requests. There's almost full test coverage and there's still work to be done to improve the code such as finding more bugs and optimizing.

Using the proxy with '?MINCSS_STATS=1'
Also, there's a rough proxy server you can start that attempts to run it on any URL. You start it like this:

pip install Flask
cd mincss/proxy
python app.py

and then you just visit something like http://localhost:5000/www.peterbe.com/about and you can see it in action. That script needs some love since it's using lxml to render the processed output which does weird things to some DOM elements.

I hope it's of use to you.

UPDATE

Published a blog post about using mincss in action

UPDATE 2

cssmin now supports downloading using PhantomJS which means that Javascript rendering will work. See this announcement

UPDATE 3

Version 0.8 is 500% faster now for large documents. Make sure you upgrade!

If the number 1 rule for making faster websites is to "Minimize HTTP Requests", then, let's try it.

On this site, almost all pages are served entirely from memcache. Django renders the template with the database content and the generated HTML is cached. So I thought I insert a little post processing script that converts all <img src="...something..."> into <img src="data:image/png;base64,iVBORw0KGgo..."> which basic means the HTML gets as fat as the sum of all referenced images combined.

It's either 10Kb HTML followed by (rougly) 10 x 30Kb images or it's 300Kb HTML and 0 images. The result is here: http://www.peterbe.com/about2 (open and view source)

You can read more about the Data URI scheme here if you're not familiar with how it works.

The code is a hack but that's after all what a personal web site is all about :)

So, how much slower is it to serve? Well, actual server-side render time is obviously slower but it's a process you only have to do a small fraction of the total time since the HTML can be nicely cached.

Running..
ab -n 1000 -c 10 http://www.peterbe.com/about

BEFORE:

Document Path:          /about
Document Length:        12512 bytes

Concurrency Level:      10
Time taken for tests:   0.314 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      12779000 bytes
HTML transferred:       12512000 bytes
Requests per second:    3181.36 [#/sec] (mean)
Time per request:       3.143 [ms] (mean)
Time per request:       0.314 [ms] (mean, across all concurrent requests)
Transfer rate:          39701.75 [Kbytes/sec] received

AFTER:

Document Path:          /about2
Document Length:        306965 bytes

Concurrency Level:      10
Time taken for tests:   1.089 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      307117000 bytes
HTML transferred:       306965000 bytes
Requests per second:    918.60 [#/sec] (mean)
Time per request:       10.886 [ms] (mean)
Time per request:       1.089 [ms] (mean, across all concurrent requests)
Transfer rate:          275505.06 [Kbytes/sec] received

So, it's basically 292Mb transferred instead of 12Mb in the test and the requests per second is a third of what it used to be. But it's not too bad. And with web site optimization, what matters is the individual user's impression, not how much or how little the server can serve multiple users.

Next, how does the waterfall of this look?

BEFORE:

WebPagetest WebpageTest before

Pingdom Tools Pingdom Tools before

AFTER:

WebPagetest WebpageTest after

Pingdom Tools Pingdom Tools after

Note! All images when served individually (the "before" version) are all served from a fast CDN. The HTML is served from London, United Kingdom and the Webpagetest was run from Virginia, USA.

What can we conclude from this:

  • It worked! There are less requests. 18 requests becomes 6 requests.
  • The "Start Render" time is significantly started earlier.
  • The "Document Complete" event happens slightly earlier
  • The total file size goes from 286Kb to 283Kb!
  • Before: First load takes 2 seconds, repeated view takes 0.4 seconds
  • After: First load takes 2 seconds, repeated view takes 2 secondsd :(
  • Pingdom Tools sums the kilobytes which gives a rounding error compared to WebPagetest

Some more thoughts and conclusions:

If you're wondering how the total file size is the same as before (sum of html + images) it's because all images are turned into base64 into one large document which gzip presumably does better on. If there were fewer images I'd suspect the second version would be slightly bigger in total.
Apparently the base64 version + gzip is supposed to be 2-5% bigger than the original JPG/PNG individually.

Don't do this at home kids if you don't have a good server-side cache and a good web server that serves the HTML gzipped.

Although the code I put in place to make this possible is, right now, pretty ugly it is after all pretty convenient to the developer because it's like a plugin you just add to the rendering. You don't even notice this going on in the template or in the view code. However...

More work is needed. And that is the IE <= 7 guys. Basically Internet Explorer 7 and worse don't support it at all so you need a shim for them that looks something like this:

<!--[if lt IE 7]> 
<script>
$('img').each(function() {
  $(this).attr(src, $(this).data('orig-src'));
})
</script>
<![endif]-->


It would need some love and work but the principle is there and it's sound.

Or, just ignore them. After all, only 3% of my visitors are on IE8 and only 0.5% are on IE7. At least they can read the text. This brutal exclusion isn't always and option. But the shim is.

I think I'm going to keep it. The code needs to be packaged up and made neat before I stick to it. There is a lot more interesting things one can do with this. For example, you could in a post processor optimize the CSS used by inspecting the DOM to see which selectors can be dumped.

UPDATE

Some really valuable comments below have pointed out that using data URIs cause a memory bloat in Gecko which means that it might be particularly harmful for people with multiple tabs or using mobile devices.

Hmm... back to the drawing board a bit I guess.

"Gamification is the use of game-thinking and game mechanics in non-game contexts in order to engage users and solve problems" -- wikipedia

Gamification sneaks into a software developer's life whether he/she likes it or not. Some work for me, some don't.

What works for me

  1. PyPI downloads on my packages
    Although clouded with inaccuracies and possible false positives (someone's build script could be pip installing over zealously), seeing your download count go up means that people actually depend on your code. Most likely, they're not just downloading to awe, they download to use it.

  2. Github followers and Starred projects
    Being followed on Github means people see your activity on their dashboard (aka. home page). Every commit and every gist you push gets potential eyes on it.
    When people star your project it probably means that they're thinking "oh neat! this could come in handy some day
    so I'll star it for now". That's kinda flattering to be honest.

  3. Twitter followers
    This doesn't apply to everyone of course but to me it does. I really try my best to write about work or code related stuff on Twitter and personal stuff on Facebook. Whenever a blog post of mine gets featured on HN or if I present at some conference I get a couple of new followers.
    Some people do a great job curating their followers, responding and keeping it very relevant. They deserve their followers.
    Yes, there are a lot of bogus Twitter accounts that follow you but since that happens to everyone it's easy to oversee. Since you probably skim through most of the "You have new follower(s)" emails, it's quite flattering when it's a real human being who does what you do or somewhat similar.

  4. Activity on Github projects
    This one is less about fame and fortune and more of a "damage prevention". Clicking into a project and seeing that the last commit was 3 years ago most definitely means the project is dead.
    I have some projects that I don't actively work on but the code might still be relevant and doesn't need much more maintenance. For those kind of projects it's good to have some sporadic activity just to signal to people it's not completely abandoned.

  5. Hacker News posts and comments "Show HN: ..."
    I've now had quite a few posts to HN that get promoted to the front page. Whenever this happens you get those almost embarrassing spikes in your Google Analytics account.
    However, it happened. Enough people thought it was interesting to vote it up to the front page.
    It's important to not count the number of comments as a measure of "success" because oftentimes comments aren't simply constructive feedback but just comments on other comments.
    Keep this one simple, the fact that you have built something that is "Show HN:..." means you probably have worked hard.

What does NOT work for me

  1. Unit test code coverage metrics
    Test coverage percentages are quite a private matter. Kinda like your stool. Unless something amazing happened, keep it to yourself.
    It's nice to see a general increase of the total percentage but do not dare to obsess about it. What matters is that you look through the report and take note that what matters is covered. Coverage on code that is allowed to break and isn't embarrassing if it does, does not need to be green all the way. Who are you trying to impress? The intern you're mentoring or the family you don't have time to spend time with because you're hunting perfection?
    I must, however, admit that I too have in the past inserted pragma: no cover in my code. Also, being able to say that you have 100% test coverage on a lib can be good "advertisement" in your README as it instills confidence in your potential users.

  2. Number of tests
    When you realize that 1 nicely packaged integration test can test just as much as 22 anally verbose unit tests you realize that number of tests is a stupid measure.
    A lot of junior test driven developers write tests that cover circumstances that are just absurd. For example "what if I pass a floating point number instead of a URL string which it's supposed to be??".
    Remember, results and quality count. Having too many tests also means more things to slow you down when you refactor.

  3. Commit counts
    On projects with multiple contributors commit counts is not a measure of anything. It has no valuable implications or deductions. Adding a newline character to a README can be 1 count.
    If you skim through the commit log on a Github project you'll notice that surprisingly many commits are trivial stuff such as style semantics or updating a CREDITS file.
    Yes, someone has to do that stuff too and we're always appreciative of that but it's not a measure of excellence over others. It's just a count.

  4. Resolved bugs/issues count
    If this mattered and was a measure of anything you could simple just swallow everything with a quick turnaround and resolve or close it.
    But not every bug deserves your attention. Even if it is a genuine bug it might still be really low priority which working on costs time and focus distraction away from much more important work.

  5. Number of releases
    It's nice to see projects making releases (or tags) but don't measure things by this. There's so much good quality software that doesn't really fit the release model.

From one of the monthly summary emails
Building a side project is fun. Launching it is fun. Improving and measuring it is fun. But marketing it is aweful!

Marketing your side project means you're not coding, instead you're walking around the interwebs with your pants down trying your hardest to get people to not only try your little project but to also get beyond that by tweeting about it, Facebook status update about it, blog about it or use whatever devices inside it to help the viral spread. Now that! ...is freckin hard.

I'm struggling to even get my best friends and my wife to even try my side projects. I can't blame them, unlike a lemonade stand at a farmers market it's very impersonal. When I tried to get my buddies to try Around The World several did but only very briefly and granted some few did give me feedback but it's really not much to go by.

So, today I'm launching the start of my new web marketing strategy: Begging

Or rather, politely asking people to help me. Instead of using the usual "we" or "our" language I'm referring to it in first person instead. The platform for this strategy experiment is on HUGEpic and it looks like this: hugepic.io/yourhelp/

I'm recently built a feature into HUGEpic that once a month emails everyone who uploaded a picture a little summary of their upload and the number of hits and comments and boldly in the footer of this email there's a link to the /yourhelp/ page (see screenshot above).

Let's see how this works out. Mostly likely it'll be just another noise in the highways of peoples' internet lifes but perhaps it can become successful too.

Mind you, the motives of all of this is for my "insert-sideproject-name-here" to become successful. And by successful I mean popular and lots of traffic. None of my side projects make me any money which makes it easier to beg. However, none of them make any money for the people I'm asking for help. Perhaps that's what could be the version 2.0 of my web marketing strategy.

Example of embedding a picture
New feature just landed! Now you can embed pictures from HUGEpic so it can be on your own site. See example below.

So to do this I opted for the simplest solution possible. It's basically just an iframe to the regular URL but with ?embedded=1 set. What this does is that it removes all buttons except the zoom navigation buttons. There are some other configurable things like like hide_download_counter=1|0 and hide_annotations=1|0. At the moment there's no UI to change these options but at least the functionality is there in case somebody wants it.

Here's what it can look like

One particular little feature I think is neat is that whilst your previewing your embedded code and you zoom in and pan around on your image, the position and zoom level is automatically inserted into the HTML code. The way this is done is by this pattern:

setInterval(pluck_position, 2 * 1000);

function pluck_position() {
    var url = iframe[0].contentWindow.location.href;
    var numbers = url.match(/\/([0-9\.]+)\/([-0-9\.]+)\/([-0-9\.]+)/g)[0];
    var zoom = parseFloat(numbers.split('/')[1]);
    var lat = parseFloat(numbers.split('/')[2]);
    var lng = parseFloat(numbers.split('/')[3]);
    ...
}

Code is here

To try it, click on any picture then click the little "Permalink" icon (the icon that looks like a chain link) in the upper right hand corner and follow the link that appears.

For example, here's an upload of a huge Minecraft world: