A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Web development. Clear filter

How to deploy a create-react-app

04 November 2016 0 comments   ReactJS, Javascript, Web development

First of all, create-react-app is an amazing kit. It's a zero configuration bundle that gives you a react app boilerplate with a dev server, linting and a deployment tool. All are awesome but not perfect.

I could go on giving this project praise but if you're here reading this you might be convinced already.

Anyway, the way you deploy a create-react-app project is actually stunningly simple, but there is one major caveat to look out for. Basically running yarn run build will first delete existing files in the ./build/ directory. Files that it indents to replace. For example your ./build/index.html or your ./build/static/js/main.94a86fe3.js.

So, what I suggest is that you deploy it like this:


# Go into the project where the package.json exists
cd myproject
# Upgrade any libraries
# Use 
yarn run build
mv build build_final

Note! This tip is only applicable if you deploy "in place" as opposed to building a whole new container/image and swapping an old container/image for a new one.

Now, for your Nginx point to the ./build_final directory instead. For example:

# /etc/nginx/sites-enabled/mysite.conf
server {
    root /full/path/to/myproject/build_final;

    location / {
        try_files $uri /index.html;
        add_header   Cache-Control public;
        expires      1d;

The whole point of this tip is that it's a good idea to not point Nginx to the ./build directory (but to a copy of it instead) because otherwise, during the seconds that yarn run build runs (1-5 seconds) a bunch of files will be missing and Nginx will send 404 errors to the clients unlucky enought to connect during the deployment.

django-html-validator - now locally, fast!

12 August 2016 1 comment   Django, Web development, Python

A couple of years ago I released a project called django-html-validator (GitHub link) and it's basically a Django library that takes the HTML generated inside Django and sends it in for HTML validation.

The first option is to send the HTML payload, over HTTPS, to Not only is this slow but it also means sending potentially revealing HTML. Ideally you don't have any passwords in your HTML and if you're doing HTML validation you're probably testing against some test data. But... it sucked.

The other alternative was to download a vnu.jar file from the project and executing it in a subprocess with java -jar vnu.jar /tmp/file.html. Problem with this is that it's really slow because java programs take such a long time to boot up.

But then, at the beginning of the year some contributors breathed fresh life into the project. Python 3 support and best of all; the ability to start the vnu.jar as a local server on http://localhost:8888 and HTTP post HTML over to that. Now you don't have to pay the high cost of booting up a java program and you don't have to rely on a remote HTTP call.

Now it becomes possible to have HTML validation checked on every rendered HTML response in the Django unit tests.

To try it, check out the new instructions on "Setting the vnu.jar path".

The contributor who's made this possible is Ville "scop" Skyttä, as well as others. Thanks!!

Premailer 3.0.0 - classes kept by default

07 June 2016 0 comments   Web development, Python

Today I released a new major version of premailer where the only difference is that one of the default options have changed from True to False.
The git commit for this change might look big but the only difference is that now, by default, the HTML class attribute is kept in the output HTML.

When premailer started, the land of HTML emails was very different. Basically, you used to not use CSS media queries, so, no reason to keep the class attribute. Now, these days, all pretty HTML emails need media queries and for that to work you need to have the class attribute kept in the HTML.

So fear not the major version upgrade! If you used to use premailer like this:

from premailer import Premailer

transformer = Premailer(html)
output_html = transformer.transform()

You now need to change it to:

from premailer import Premailer

transformer = Premailer(html, remove_classes=True)
output_html = transformer.transform()

As always, you can play with it on

CSS Bloat Comparison

03 June 2016 0 comments   Javascript, Web development

tl;dr; How much web performance negative overhead does including a CSS stylesheet (that you don't use) add to the rendering time? I don't know. But WebPagetest gives us some clues.

To jump straight to the results, checkout this video which is the slow motion rendering of 1 + 5 pages. Each page has one more big fat CSS stylesheet linked than the other. I.e. the 5th one links to 5 different .css URLs. There's also a 0th one which only loads 1 .css file which has nothing in it.

WebPagetest results
The full results are here.

I love CSS frameworks and use them ALL the time. But I'm also interested in web performance and using techniques like real-time static analysis to figure out what CSS that doesn't need to be loaded. Some of those techniques surely lead to less stuff needing to be downloaded but how big is the gain? Not sure.

So I made 6 pages that loads a CSS framework but doesn't actually use any of it. The browser will be instructed to download the file and parse it. That takes time and CPU-work will surely will have an effect on the total rendering time.

In every page, I lastly load a little piece of JavaScript just to make something appear on the page. That means, the page will not fully render until AFTER the it has loaded the .css files and one little .js file which prints something on the DOM. The reason there's a 5 second delay until it uses AJAX (fetch) to figure out their sizes is because I don't want that effect to affect the rendering on WebPagetest.

css Bytes


Notice that there are two cases of "outliers". According to the measurements, bloat3.html and bloat5.html take shorter time to render compared to their smaller files (bloat2.html and bloat4.html respectively). That seems to indicate that - even though WebPagetest does 3 runs each - that "network I/O luck" plays a big role.

Also, interesting is that the first file bloat1.html which loads 118Kb more CSS than bloat0.html but clearly it doesn't seem to have a big impact.


It does have an effect of reduced web performance. I.e. longer loading time. The page with just 1 .css file takes 0.5 seconds and the one with 5 .css files takes 0.8 seconds. However, the results also indicate that that much of the total time is spent waiting for the download. Once the brower has downloaded the payload, it appears to be very fast at parsing it to get ready to render the page accordingly. In other words, don't worry so much about the "bloat" of the content of the CSS file. Worry more about the excessive HTTP requests needed in total.

It would be interesting to inline every CSS file into the .html page and re-run. That means only the .html needs to be downloaded from the network and although the last one will be bigger, when all are gzipped the difference isn't huge.

In conclusion, I'm not sure it's a huge performance loss to add big bloated CSS frameworks to your site. Most likely your big wins lies with optimizing the images and JavaScript.


After publishing this, I decided to inline every .css as big <style> tags. For example, bloat4.html (View source).

Here's the result.

And here's the video.

What this proves is that the difference we saw earlier was almost entirely due to "network I/O luck". All pages are gzipped. The smallest one is only 0.19Kb and the largest one is 183Kb. But there's no noticable difference in the total time it takes to render these two. Basically, browser's ability to parse CSS is FAST! Don't worry so much about the size of the CSS payload itself. Go forth and make pretty web pages!

gg - A prototype to rule Git, GitHub and Bugzilla

06 May 2016 0 comments   Web development, Python

tl;dr; I'm starting a new "side-project". (I say side-project in quotation marks because I'm doing this for the sake of work ultimately). It's call gg and it's a command line program for doing various tasks to do with git, GitHub and Bugzilla.

Many years ago I noticed certain patterns of things I do. Usually work starts with a Bugzilla bug. I then need to make a new branch with the bug number in the branch name and when I'm done I need to push that branch to my fork and create a GitHub Pull Request. When it's been merged (or if I merge it manually myself) I have to go back to the master branch, fetch the upstream, delete the now merged branch and delete the remove branch. All of these things are tedious so I wrapped up my patterns in a little Python project called bgg, so I can do:

$ G start 123456789  # that's a bugzilla ID
# edit files and save
$ G commit
# now I wait for the Pull Request to be merged
$ G getback
# all things get cleaned up and I'm back on the master branch

This new project, gg, is a complete re-write of bgg but with some big changes:

  1. It's based on Click
  2. All sub-commands are separate projects with their own GitHub repo and PyPI submissions
  3. Instead of wrapping git commands in a subprocess, it uses GitPython.
  4. Audacious goals of major feature upgrades such as automatic bug/issue assignment, ability to see rebased branches and automatic Pull Request creation, etc.
  5. Audacious goal of documenting everything and succeed in getting other people to write their own plugins and share what they make. Or at least, make it easy to write your own plugin.

So far, I've only written 1 plugin. It's called gg-start. All it does is it creates branches for you. For example, you can type:

$ gg start
# or...
$ gg start
# or...
$ gg start

And it figures out a good branch name, remembers the issue title and checks out the new branch. All that stuff is saved in ~/.gg.json so that when you later (and this plugin hasn't been built yet) type gg commit it can use that title to automatically suggest a good git commit message and it should know where to push it and start the GitHub Pull Request etc.

My intention is to first get decent parity with bgg. So I'll need to create plugins called gg-commit, gg-branches, gg-rebase, gg-getback, gg-cleanup, gg-tag, gg-merge and gg-push. Once parity is achieved I'm going to add some more fancy features and work hard on making it clear how you can write your own plugin.

Wish me luck!

How to track Google Analytics pageviews on non-web requests (with Python)

03 May 2016 1 comment   Mozilla, Django, Web development, Python

tl;dr; Use raven's ThreadedRequestsHTTPTransport transport class to send Google Analytics pageview trackings asynchronously to Google Analytics to collect pageviews that aren't actually browser pages.

We have an API on our Django site that was not designed from the ground up. We had a bunch of internal endpoints that were used by the website. So we simply exposed those as API endpoints that anybody can query. All we did was wrap certain parts carefully as to not expose private stuff and we wrote a simple web page where you can see a list of all the endpoints and what parameters are needed. Later we added auth-by-token.

Now the problem we have is that we don't know which endpoints people use and, as equally important, which ones people don't use. If we had more stats we'd be able to confidently deprecate some (for easier maintanenace) and optimize some (to avoid resource overuse).

Our first attempt was to use statsd to collect metrics and display those with graphite. But it just didn't work out. There are just too many different "keys". Basically, each endpoint (aka URL, aka URI) is a key. And if you include the query string parameters, the number of keys just gets nuts. Statsd and graphite is better when you have about as many keys as you have fingers on one hand. For example, HTTP error codes, 200, 302, 400, 404 and 500.

Also, we already use Google Analytics to track pageviews on our website, which is basically a measure of how many people render web pages that have HTML and JavaScript. Google Analytic's UI is great and powerful. I'm sure other competing tools like Mixpanel, Piwik, Gauges, etc are great too, but Google Analytics is reliable, likely to stick around and something many people are familiar with.

So how do you simulate pageviews when you don't have JavaScript rendering? The answer; using plain HTTP POST. (HTTPS of course). And how do you prevent blocking on sending analytics without making your users have to wait? By doing it asynchronously. Either by threading or a background working message queue.

Threading or a message queue

If you have a message queue configured and confident in its running, you should probably use that. But it adds a certain element of complexity. It makes your stack more complex because now you need to maintain a consumer(s) and the central message queue thing itself. What if you don't have a message queue all set up? Use Python threading.

To do the threading, which is hard, it's always a good idea to try to stand on the shoulder of giants. Or, if you can't find a giant, find something that is mature and proven to work well over time. We found that in Raven.

Raven is the Python library, or "agent", used for Sentry, the open source error tracking software. As you can tell by the name, Raven tries to be quite agnostic of Sentry the server component. Inside it, it has a couple of good libraries for making threaded jobs whose task is to make web requests. In particuarly, the awesome ThreadedRequestsHTTPTransport. Using it basically looks like this:

import urlparse
from raven.transport.threaded_requests import ThreadedRequestsHTTPTransport

transporter = ThreadedRequestsHTTPTransport(

params = {
    ...more about this later...

def success_cb():
    print "Yay!"

def failure_cb(exception):
    print "Boo :("


The call isn't very different from regular plain old

About the parameters

This is probably the most exciting part and the place where you need some thought. It's non-trivial because you might need to put some careful thought into what you want to track.

Your friends is: This documentation page

There's also the Hit Builder tool where you can check that the values you are going to send make sense.

Some of the basic ones are easy:

"Protocol Version"

Just set to v=1

"Tracking ID"

That code thing you see in the regular chunk of JavaScript you put in the head, e.g tid=UA-1234-Z

"Data Source"

Optional word you call this type of traffic. We went with ds=api because we use it to measure the web API.

The user ones are a bit more tricky. Basically because you don't want to accidentally leak potentially sensitive information. We decided to keep this highly anonymized.

"Client ID"

A random UUID (version 4) number that identifies the user or the app. Not to be confused with "User ID" which is basically a string that identifies the user's session storage ID or something. Since in our case we don't have a user (unless they use an API token) we leave this to a new random UUID each time. E.g. cid=uuid.uuid4().hex This field is not optional.

"User ID"

Some string that identifies the user but doesn't reveal anything about the user. For example, we use the PostgreSQL primary key ID of the user as a string. It just means we can know if the same user make several API requests but we can never know who that user is. Google Analytics uses it to "lump" requests together. This field is optional.

Next we need to pass information about the hit and the "content". This is important. Especially the "Hit type" because this is where you make your manually server-side tracking act as if the user had clicked around on the website with a browser.

"Hit type"

Set this to t=pageview and it'll show up Google Analytics as if the user had just navigated to the URL in her browser. It's kinda weird to do this because clearly the user hasn't. Most likely she's used curl or something from the command line. So it's not really a pageview but, on our end, we have "views" in the webserver that produce information to the user. Some of it is HTML and some of it is JSON, in terms of output format, but either way they're sending us a URL and we respond with data.

"Document location URL"

The full absolute URL of that was used. E.g. So in our Django app we set this to dl=request.build_absolute_uri(). If you have a site where you might have multiple domains in use but want to collect them all under just 1 specific domain you need to set

"Document Host Name" and "Document Path"

I actually don't know what the point of this is if you've already set the "Document location URL".

"Document Title"

In Google Analytics you can view your Content Drilldown by title instead of by URL path. In our case we set this to a string we know from the internal Python class that is used to make the API endpoint. dt='API (%s)'%api_model.__class__.__name__.

There are many more things you can set, such as the clients IP, the user agent, timings, exceptions. We chose to NOT include the user's IP. If people using the JavaScript version of Google Analytics can set their browser to NOT include the IP, we should respect that. Also, it's rarely interesting to see where the requests for a web API because it's often servers' curl or requests that makes the query, not the human.

Sample implementation

Going back to the code example mentioned above, let's demonstrate a fuller example:

import urlparse
from raven.transport.threaded_requests import ThreadedRequestsHTTPTransport

transporter = ThreadedRequestsHTTPTransport(

# Remember, this is a Django, but you get the idea

if not domain or domain == 'auto':
    domain = RequestSite(request).domain

params = {
    'v': 1,
    'tid': settings.GOOGLE_ANALYTICS_ID,
    'dh': domain,
    't': 'pageview,
    'ds': 'api',
    'cid': uuid.uuid4().hext,
    'dp': request.path,
    'dl': request.build_request_uri(),
    'dt': 'API ({})'.format(model_class.__class__.__name__),
    'ua': request.META.get('HTTP_USER_AGENT'),

def success_cb():'Successfully informed Google Analytics (%s)', params)

def failure_cb(exception):


How to unit test this

The class we're using, ThreadedRequestsHTTPTransport has, as you might have seen, a method called async_send. There's also one, with the exact same signature, called sync_send which does the same thing but in a blocking fashion. So you could make your code look someting silly like this:

def send_tracking(page_title, request, async=True):
    # ...same as example above but wrapped in a function...
    function = async and transporter.async_send or transporter.sync_send

And then in your tests you pass in async=False instead.
But don't do that. The code shouldn't be sub-serviant to the tests (unless it's for the sake of splitting up monster-long functions).
Instead, I recommend you mock the inner workings of that ThreadedRequestsHTTPTransport class so you can make the whole operation synchronous. For example...

import mock
from django.test import TestCase
from django.test.client import RequestFactory

from import pageview_tracking

class TestTracking(TestCase):

    def test_pageview_tracking(self, rpost, aw):

        def mocked_queue(function, data, headers, success_cb, failure_cb):
            function(data, headers, success_cb, failure_cb)

        aw().queue.side_effect = mocked_queue

        request = RequestFactory().get('/some/page')
        with self.settings(GOOGLE_ANALYTICS_ID='XYZ-123'):
            pageview_tracking('Test page', request)

            # Now we can assert that '' was called.
            # Left as an exercise to the reader :)
            print rpost.mock_calls       

This is synchronous now and works great. It's not finished. You might want to write a side effect for the so you can have better control of that post. That'll also give you a chance to potentially NOT return a 200 OK and make sure that your failure_cb callback function gets called.

How to manually test this

One thing I was very curious about when I started was to see how it worked if you really ran this for reals but without polluting your real Google Analytics account. For that I built a second little web server on the side, whose address I used instead of So, change your code so that is not hardcoded but a variable you can change locally. Change it to http://localhost:5000/ and start this little Flask server:

import time
import random
from flask import Flask, abort, request

app = Flask(__name__)
app.debug = True

@app.route("/", methods=['GET', 'POST'])
def hello():
    print "- " * 40
    print request.method, request.path
    print "ARGS:", request.args
    print "FORM:", request.form
    print "DATA:", repr(
    if request.args.get('sleep'):
        sec = int(request.args['sleep'])
        print "** Sleeping for", sec, "seconds"
        print "** Done sleeping."
    if random.randint(1, 5) == 1:
    elif random.randint(1, 5) == 1:
        # really get it stuck now
    return "OK"

if __name__ == "__main__":

Now you get an insight into what gets posted and you can pretend that it's slow to respond. Also, you can get an insight into how your app behaves when this collection destination throws a 5xx error.

How to really test it

Google Analytics is tricky to test in that they collect all the stuff they collect then they take their time to process it and it then shows up the next day as stats. But, there's a hack! You can go into your Google Analytics account and click "Real-Time" -> "Overview" and you should see hits coming in as you're testing this. Obviously you don't want to do this on your real production account, but perhaps you have a stage/dev instance you can use. Or, just be patient :)

Web performance optimization's dark side

16 March 2016 0 comments   Web development

See this comment on Yoav Weiss's article on Preload: What Is It Good For?.

Xe (He or she) is being a bit of a jack ass and not respecting the fact that latency is still a big problem and the simple fact that a LOT of people still have slow Internet speeds. Even in the USA (which for the record generally sucks at broadband compared to many other western countries).

But the point being made here is that obsessing over saving milliseconds here and there drains the fun in web development.

I remember back in the days I used to love the web. Development was fun, entertaining, and provided many levels of enjoyment. To some extent it still is today. But it’s getting so obsessive that maybe it’s not me the one who needs counseling.


That’s how it feels with all these obsessive new techniques and tricks. Just to get that page loaded by an extra 6 milliseconds faster.

Can't think of a good defense to these comments. It's not going to stop me from trying to shave milliseconds here and there. Having a fast web app doesn't just make it faster. It makes it "better". When something feels fast, it feels like higher quality. And I think users of fast web apps have a more positive attitude towards features/bugs.

But xe is right. It does zap some of the fun of web development. It used to be about adding content and features. Now it's about this constant "dieting". We wouldn't have this problem if the sites didn't weight 2-3Mb of PNGs, 100Kb CSS and massive font files.

I too feel the blues sometimes especially since a lot of performance improvements are so hard to notice with a human pair of eyes. The point is. Uh. The point is.... Hmm.

The point is; the more advanced we make the web the harder it's going to be to keep up and every time a speed-freak blogs about some millisecond shavings, the beginner developers are going to think "Oh shit! I have to learn that too?!" But then again, pushing the envelope is just so much fun!

Ctags in Atom on OSX

26 February 2016 0 comments   MacOSX, Web development

Symbols View setting page
In Atom, by default there's a package called symbols-view. It basically allows you to search for particular functions, classes, variables etc. Most often not by typing but by search based on whatever word the cursor is currently on.

With this all installed and set up I can now press Cmd-alt-Down and it automatically jumps to the definition of that thing. If the result is ambiguous (e.g. two functions called get_user_profile) it'll throw up the usual search dialog at the top.

To have this set up you need to use something called ctags. It's a command line tool.

This Stack Overflow post helped tremendously. The ctags I had installed was something else (presumably put there by installing emacs). So I did:

$ brew install ctags

And then added
alias ctags="`brew --prefix`/bin/ctags" my ~/.bash_profile

Now I can run ctags -R . and it generates a binary'ish file called tags in the project root.

However, the index of symbols in a project greatly varies with different branches. So I need a different tags file for each branch. How to do that? By hihjacking the .git/hooks/post-checkout hook.

Now, this is where things get interesting. Every project has "junk". Stuff you have in your project that isn't files you're likely to edit. So you'll need to list those one by one. Anyway, here's what my post-checkout looks like:


set -x

ctags -R \
  --exclude=build \
  --exclude=.git \
  --exclude=webapp-django/static \
  --exclude=webapp-django/node_modules \

This'll be run every time I check out a branch, e.g. git checkout master.

Whatsdeployed on only one site

26 February 2016 0 comments   Python, Web development, Mozilla

Last year I developed a web app called "Whatsdeployed". It's one of those rare one-afternoon-hacks that turns out to be really really useful. I use it every [work]day. And I've heard many people say they use it too.

At the time I built it, it only supported comparing multiple instance. E.g. a production and a dev site. Or a test, stage and production. But oftentimes, especially for smaller projects, you might only just have your one deployed site.

So I've now made it possible so you can compare just 1 site against your master branch.

For example:


What these do, is simply comparing what git sha revision is deployed on those side-projects, compared to the latest git sha on the master branch on

How to no-mincss links with django-pipeline

03 February 2016 2 comments   Python, Web development, Django

This might be the kind of problem only I have, but I thought I'd share in case others are in a similar pickle.

Warming Up

First of all, the way my personal site works is that every rendered page gets cached as rendered HTML. Midway, storing the rendered page in the cache, an optimization transformation happens. It basically takes HTML like this:

<link rel="stylesheet" href="vendor.css">
<link rel="stylesheet" href="stuff.css">

into this:

/* optimized contents of vendor.css and stuff.css minified */

Just right-click and "View Page Source" and you'll see.

When it does this it also filters out CSS selectors in those .css files that aren't actually used in the rendered HTML. This makes the inlined CSS much smaller. Especially since so much of the CSS comes from a CSS framework.

However, there are certain .css files that have references to selectors that aren't in the generated HTML but are needed later when some JavaScript changes the DOM based on AJAX or user actions. For example, the CSS used by the Autocompeter widget. The program that does this CSS optimization transformation is called mincss and it has a feature where you can tell it to NOT bother with certain CSS selectors (using a CSS comment) or certain <link> tags entirely. It looks like this:

<link rel="stylesheet" href="ajaxstuff.css" data-mincss="no">

Where Does django-pipeline Come In?

So, setting that data-mincss="no" isn't easy when you use django-pipeline because you don't write <link ... in your Django templates, you write {% stylesheet 'name-of-bundle %}. So, how do you get it in?

Well, first let's define the bundle. In my case it looks like this:

  # Bundle of CSS that strictly isn't needed at pure HTML render-time
  'base_dynamic': {
        'source_filenames': (
        'extra_context': {
            'no_mincss': True,
        'output_filename': 'css/base-dynamic.min.css',

But that isn't enough. Next, I need to override how django-pipeline turn that block into a <link ...> tag. To do that, you need to create a directory and file called pipeline/css.html (or pipeline/css.jinja if you use Jinja rendering by default).

So take the default one from inside the pipeline package and copy it into your project into one of your apps's templates directory. For example, in my case, peterbecom/apps/base/templates/pipeline/css.jinja. Then, in that template add at the very end somehting like this:

{% if no_mincss %} data-mincss="no"{% endif %} />

The Point?

The point is that if you're in a similar situation where you want django-pipeline to output the <link> or <script> tag differently than it's capable of, by default, then this is a good example of that.