Peterbe.com

A blog and website by Peter Bengtsson

After a day of pushing 9 commits to a PR to finally get Travis to build a simple python package on python 2.6, 2.7, 3.3 and 3.4 I finally gave up and ripped out all of httpretty and replaced it with good old mock.patch()

I was getting all sorts of strange warnings in py3.3 and 3.4 got stuck all the time.
This is not the first time httpretty has been causing confusion so from now on I'm giving up on httpretty. Ithink it was too good to be true to work reliably. Honestly, it might be python's fault for not being better made available to cool libs like httpretty.

By the way, here's one of those errors where Python 3.4 just hangs which stopped being the case once I took out httpretty. And here you can see the clear failure to deactivate the monkeypatch even after the test is complete in Python 3.3.

Strangely this was hard. The solution looks easy but it took me a lot of time to get this right.

What I want to do is split a string up into a slice of parts. For example:

word := "peter"
for i := 1; i <= len(word); i++ {
    fmt.Println(word[0:i])
}

The output is:

p
pe
pet
pete
peter

Play here.

Now, what if the word you want to split contains non-ascii characters? As a string "é" is two characters internally. So you can't split them up. See this play and press "Run".

So, the solution is to iterate over the string and Go will give you an index of each unicode character. But you'll want to skip the first one.

word := "péter"
for i, _ := range word {
    if i > 0 {
        fmt.Println(word[0:i])
    }
}
fmt.Println(word)

You can play with it here in this play.

Now the output is the same as what we got in the first example but with unicode characters in it.

p
pé
pét
péte
péter

I bet this is obvious to the gurus who've properly read the documentation but it certainly took me some time to figure it out and so I thought I'd share it.

We're proud to announce that we've now published our first Roku channel; Air Mozilla

Browsing for Air Mozilla
We actually started this work in the third quarter of 2014 but the review process for adding a channel is really slow. The people we've talked to have been super friendly and provide really helpful feedback as to changes that need to be made. After the first submission, it took about a month for them to get back to us and after some procrastination we submitted it a second time about a month ago and yesterday we found out it's been fully published. I.e. gone live.

Obviously it would be nice if they could get back to us quicker but another thing they could improve is to appreciate that we're a team. All communication with Roku has been to just me and I always have to forward emails or add my teammates as CC when I communicate with them.

Anyway, now we can start on a version 2. We deliberately kept this first version ultra-simple just to prove that it's possible and not being held back due to feature creep.

What we're looking to add in version 2 are, in no particular order:

  1. Ability to navigate by search
  2. Ability to sign in and see restricted content
  3. Adding Trending events
  4. Ability to see what the upcoming events are

It's going to be much easier to find the energy to work on those features now that we know it's live.

Also, we currently have a problem watching live and archived streams on HTTPS. It's not a huge problem right now because we're not making any restricted content available and we're lucky in that the CDNs we use allow for HTTP traffic equally.

Remember, Air Mozilla is Open Source and we encourage people to jump in and contribute if you want to share your Python, design, Javascript or BrightScript skills.

By the way, the Air Mozilla Roku code is here and there's a README that'll get your started if you want to help out.

I'm a web engineer (aka. web developer) and I try my best to keep up with the latest and greatest in my field without minimal effort. I need good resources that pack a big bunch but doesn't take all day to digest.

Here are the newsletters I enjoy and almost depend on for my work (in no particular order):

  1. ng-newsletter
    "The free, weekly newsletter of the best AngularJS content on the web." Comes once a week with about 5 or so links to articles about AngularJS.

  2. Pycoder's Weekly
    "Your weekly dose of all things Python!" A mix of recent new projects that have become available on PyPI or GitHub. Also always a nice list of 3-6 articles revolving around Python.

  3. Web Development Reading List
    "A handcrafted, carefully selected list of web development related resources." Topics are usually split up by News, Generic/Tools, Web Performance, HTML/SVG, Javascript and CSS/Sass. Very high quality despite there being lots of links.

  4. The Changelog Weekly
    "Open Source moves fast. Keep up." They have a podcast too which comes out, I think, once a week too. The topics here are highly technical but often more broader and rather "about the tech" as opposed to "how the tech". They have a crips separation of curated content and sponsored content.

  5. Go Newsletter
    "A weekly newsletter about the Go programming language" This one I'm quite new to. It's jampacked with links to projects and articles.

  6. Explore GitHub
    It's configurable. I have it sent to me on a weekly basis. I get two sections "Trending repositories this week" and "Starred by people you follow this week". It's basically just links to GitHub projects. No articles or curated content. I guess it's maybe not a newsletter but it's so useful that I just have to include it.

Which ones do you depend on/love that I should consider?

If you haven't heard of jsDelivr then you've missed out on a great project! It's basically a free CDN to use for Open Source projects. I added my own yesterday and it was as easy as making a pull request with the initial file, some metadata and a file that tells them where to pick up new versions from (e.g. GitHub).

Anyway, they now host A LOT of files. 8,941 to be exact (1,927 unique file names), at the time of writing. So I thought I'd check out what the median size is.

The median size is: 7.4Kb

The average is 28.6Kb and the standard deviation is 73Kb! so we can basically ignore that and just focus on the median.

Check out the histogram

That's pretty big! If you exclude those bigger than 100Kb the median shrinks to 6.5Kb. Still pretty big.

I'm proud to say that my own is only 50% the size of the median size.

First of all; hashing is hard. But fortunately it gets a little bit easier if it doesn't have to cryptographic. A non-cryptographic hashing function is basically something that takes a string and converts it to another string in a predictable fashion and it tries to do it with as few clashes as possible and as fast as possible.

MD5 is a non-cryptographic hashing function. Unlike things like sha256 or sha512 the MD5 one is a lot more predictable.

Now, how do you make a hashing function that yields a string that is as short as possible? The simple answer is to make the output use as many different characters as possible. If a hashing function only returns integers you only have 10 permutations per character. If you instead use a-z and A-Z and 0-9 you now have 26 + 26 + 10 permutations per character.

A hex on the other hand only uses 0-9 and a-f which is only 10 + 6 permutations. So you need a longer string to be sure it's unique and can't clash with another hash output. Git for example uses a 40 character log hex string to prepresent a git commit. GitHub is using an appreviated version of that in some of the web UI of only 7 characters which they get away with because things are often in a context of a repo name or something like that. For example github.com/peterbe/django-peterbecom/commit/462ae0c

So, what other choices do you have when it comes to returning a hash output that is sufficiently long that it's "almost guaranteed" to be unique but sufficiently short that it becomes practical in terms of storage space? I have an app for example that turns URLs into unique IDs because they're shorter that way and more space efficient to store as values in a big database. One such solution is to use a base64 encoding.

Base64 uses a-zA-Z0-9 but you'll notice it doesn't have the "hashing" nature in that it's just a direct translation character by character. E.g.

>>> base64.encodestring('peterbengtsson')
'cGV0ZXJiZW5ndHNzb24=\n'
>>> base64.encodestring('peterbengtsson2')
'cGV0ZXJiZW5ndHNzb24y\n'

I.e. these two strings are different but suppose you were to take only the first 10 characters these would be the same. Basically, here's a terrible hashing function:

def hasher(s):  # this is not a good hashing function
    return base64.encodestring(s)[:10]

So, what we want is a hashing function that returns output that is short and very rarely clashing and does this as fast as possible.

To test this I wrote a script that tried a bunch of different ad-hoc hashing functions. I generate a list of 130,000+ different words with an average length of 15 characters. Then I loop over these words until a hashed output is repeated for a second time. And for each, I take the time it takes to generate the 130,000+ hashes and I multiply that with the total number of bytes. For example, if the hash output is 9 characters each in length that's (130000 * 9) / 1024 ~= 1142Kb. And if it took 0.25 seconds to generate all of those the combined score is 1142 * 0.24 ~= 286 bytes second.

Anyway, here are the results:

h11 100.00  0.217s  1184.4 Kb   257.52 kbs
h6  100.00  1.015s  789.6 Kb    801.52 kbs
h10 100.00  1.096s  789.6 Kb    865.75 kbs
h1  100.00  0.215s  4211.2 Kb   903.46 kbs
h4  100.00  1.017s  921.2 Kb    936.59 kbs

(kbs means "kilobytes seconds")

These are the functions that returned 0 clashes amongst 134,758 unique words. There were others too that I'm not bothering to include because they had clashes. So let's look at these functions:

def h11(w):
    return hashlib.md5(w).hexdigest()[:9]

def h6(w):
    h = hashlib.md5(w)
    return h.digest().encode('base64')[:6]

def h10(w):
    h = hashlib.sha256(w)
    return h.digest().encode('base64')[:6]

def h1(w):
    return hashlib.md5(w).hexdigest()

def h4(w):
    h = hashlib.md5(w)
    return h.digest().encode('base64')[:7]    

It's kinda arbitrary to say the "best" one is the one that takes the shortest time multipled by size. Perhaps the size matters more to you in that case the h6() function is better because it returns 6 character strings instead of 9 character strings in h11.

I'm apprehensive about publishing this blog post because I bet I'm doing this entirely wrong. Perhaps there are better ways to digest a hashing function that returns strings that don't need to be base64 encoded. I just haven't found any in the standard library yet.

When you write a Go webapp, you need to pick web "middleware" and a multiplexer. If you're used to things like Ruby on Rails or Django, they come with their own multiplexer built in. Also known as a router or URL router.
It's basically where you say...

If a request comes in with PATH that matches /some/page/:id then execute the SomePageHandler. And in SomePageHandler you want to get the actual value for that :id part of the PATH.

Anyway, in Go you're spoiled for choices and it took me some time to understand why there are so many and not just one blessed one. The reason for that is that they evolve and often in favor of getting faster and faster.
90% of the time, the I/O is the limiting factor in terms of performance, not the multiplexer, but if you have an endpoint that doesn't depend on much I/O and you're expecting hefty amounts of traffic then why chose a second best?

I first used gorilla/mux because that's what I saw being used in several places. It's very capable and makes it easy to specify which methods should be allowed for specific routes. A good thing about gorilla/mux is that it's compatible with the built-in http.Handler API. That means that can write a handler function that whose signature is w http.ResponseWriter, r *http.Request and if you want to get something out of the path you use vars := mux.Vars(request). For example:

package main

import (
    "fmt"
    "github.com/codegangsta/negroni"
    "github.com/gorilla/mux"
    "net/http"
)

func MyHandler(w http.ResponseWriter, r *http.Request) {
    vars := mux.Vars(r)
    fmt.Fprintf(w, "Hello %v\n", vars["id"])
}

func main() {
    mux := mux.NewRouter()
    mux.HandleFunc("/some/page/{id}", MyHandler).Methods("GET")
    n := negroni.Classic()
    n.UseHandler(mux)
    n.Run(":3000")
}

So apparently there's a much faster multiplexer that seems to be popular. It's called httprouter and it's quite pleasant to work with too except that you now have to change your handler to accept a third parameter so it now needs to accept a httprouter.Params parameter too. So, instead the handler(s) need to change. It looks like this:

package main

import (
    "fmt"
    "github.com/codegangsta/negroni"
    "github.com/julienschmidt/httprouter"
    "net/http"
)

func MyHandler(w http.ResponseWriter, r *http.Request, ps httprouter.Params) {
    fmt.Fprintf(w, "Hello %v\n", ps.ByName("id"))
}

func main() {
    mux := httprouter.New()
    mux.GET("/some/page/:id", MyHandler)
    n := negroni.Classic()
    n.UseHandler(mux)
    n.Run(":3000")
}

Not too shabby but I can't find out how to do more advanced patterns such as only matching digits e.g. you can do /some/page/{id:[0-9]+} with gorilla/mux.

There's even a benchmark by the author of httprouter that if it doesn't convince you that httprouter is fast, it'll convince you that there are a lot of multiplexers out there to chose from.

And then there's the new kid on the block. My new favorite: bone

It claims to be very fast too. Its README has a comparison between it and gorilla/mux and httprouter.

What attracted me to it is that it's fast and doesn't require the extract ps httprouter.Params on all my handlers. Instead you use val := bone.GetValue(req, "id") within the handler. Here's an example:

package main

import (
    "fmt"
    "github.com/codegangsta/negroni"
    "github.com/go-zoo/bone"
    "net/http"
)

func MyHandler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Hello %v\n", bone.GetValue(r, "id"))
}

func main() {
    mux := bone.New()
    mux.Get("/some/page/:id", http.HandlerFunc(MyHandler))
    n := negroni.Classic()
    n.UseHandler(mux)
    n.Run(":3000")
}

Yes, you have to manually wrap every handler function yourself but I don't mind that as much.

Now, for the moment you've all been waiting for, the benchmark.

I take the three sample apps here above and run them with wrk -c 100 -d10 -t10 "http://localhost:3000/some/page/123" at least three times each. The numbers I get are:

# averages across 3 runs each

gorilla/mux   17310.98 requests/sec  +0.7%
httprouter    19216.08 requests/sec  +11.8%
bone          17191.12 requests/sec  0%

I also did a similar benchmark but on a piece of code that doesn't use any parameters from the URL and that runs the server across 8 CPUs instead. The numbers I got from that was:

# averages across 3 runs each

gorilla/mux   43669.26 requests/sec  0%
httprouter    45087.31 requests/sec  +3.2%
bone          47929.04 requests/sec  +9.8%

I think that pretty much concludes that they're all plenty fast. Pick one that you like. I kind like bone myself because I don't need a third parameter on my handlers and the bone.GetValue(r, "id") API feels good.

However, httprouter has much better documentation and feels more mature and newer and fresher than gorilla/mux.

In airmozilla the tests almost all derive from one base class whose tearDown deletes the automatically generated settings.MEDIA_ROOT directory and everything in it.

Then there's some code that makes sure a certain thing from the fixtures has a picture uploaded to it.

That means it has do that shutil.rmtree(directory) and that shutil.copy(src, dst) on almost every single test. Some might also not need or depend on it but it's conveninent to put it here.

Anyway, I thought this is all a bit excessive and I could probably optimize that by defining a custom test runner that is first responsible for creating a clean settings.MEDIA_ROOT with the necessary file in it and secondly, when the test suite ends, it deletes the directory.

But before I write that, let's measure how many gazillion milliseconds this is chewing up.

Basically, the tearDown was called 361 times and the _upload_media 281 times. In total, this adds to a whopping total of 0.21 seconds! (of the total of 69.133 seconds it takes to run the whole thing).

I think I'll cancel that optimization idea. Doing some light shutil operations are dirt cheap.

From the It-Depends-on-What-You're-Building department.

As a web developer you have a job:

  1. Display a certain amount of database data on the screen
  2. Do it as fast as possible

The first point is these days easily taken care of with the likes of Django or Rails which makes it über easy to write queries that you then use in templates to generate the HTML and voila you have a web page.

The second point is taken care of with a myriad of techniques. It's almost a paradox. The fastest way to render something on the screen is to generate everything on the server and send it wholesome. It means the browser can very quickly (and boosted by GPU) render something on the screen. But if you have a lot of data that needs to be displayed it's often better to send just a little bit of HTML and then let some Javascript kick in and take care of extracting the rest of the information using AJAX.

Here I have prepared three different versions of ways to display a bunch of information on the screen:

http://www.peterbe.com/ajaxornot/

Visual comparison on WebPagetest
What you should note and take away from this little experimental playground:

  1. All server-side work is done in Django but it's served straight out of memcache so it should be fast server-side.

  2. The content is NOT important. It's just a list of blog posts and their categories and keywords.

  3. To make it somewhat realistic, each version needs to 1) display a JPG and 2) have a Javascript onclick event that throws a confirm() dialog box.

  4. The AngularJS version loads significantly slower but it's not because AngularJS is slow, but because it's able to do so much more later. Loading a Javascript framework is like an investment. Big cost upfront and small cost later when you need more magic to happen without having a complete server refresh.

  5. View 1, 2 and 3 are all three imperfect versions but they illustrate the three major groups of solving the problem stated at the top of this blog post. The other views are attempts of optimizations.

  6. Clearly the "visually fastest" version is the optimization version 5 which is a fork of version 2 which loads, on the server-side, everything that is above the fold and then take care of the content below the fold with AJAX.
    See this visual comparison

  7. Optimization version 4 was a silly optimization. It depends on the fact that JSON is more "compact" than HTML. When you Gzip the content, the difference in size doesn't matter anymore. However, it's an interesting technique because it means you can do all business logic rendering stuff in one language without having to depend on AJAX.

  8. Open the various versions in your browser and try to "feel" how pages the load. Ask your inner gutteral heart which version you prefer; do you prefer a completely blank screen and a browser loading spinner or do you prefer to see some skeleton structure first whilst waiting for the bulk content comes in?

  9. See this as a basis of thoughts and demonstration. Remember the very first sentence in this blog post.

tl;dr Don't run ffmpeg over HTTP(S) and use ffmpegthumbnailer

UPDATE tl;dr Download the file then run ffmpeg with -ss HH:MM:SS first. Don't bother with ffmpegthumbnailer

At work I work on something called Air Mozilla. It's a site for hosting live video broadcasts and then archiving those so they can be retrieved later.

Unlike sites like YouTube we can't take a screencap from the video because many videos are future (aka. "upcoming") videos so instead we use a little placeholder thumbnail (for example, the Rust logo).

However, once it has been recorded we want to switch from the logo to an actual screen capture from the video itself. We set up a cronjob that uses ffmpeg to extract these as JPGs and then the users can go in and select whichever picture they like the best.

This is all work in progress by the way (as of December 2014).

One problem is that we have is that the command for extracting JPGs is really slow. So slow that we can't wrap the subprocess in a Django database connection because it's so slow that the database connection is often killed.

The command to extract them looks something like this:

ffmpeg -i https://cdnexample.com/url/to/file.mp4 -r 0.0143 /tmp/screencaps-%02d.jpg

Where the number r is based on the duration and how many pictures we want out. E.g. 0.0143 = 15 * 1049 where 15 is how many JPGs we want and 1049 is a duration of 17 minutes and 29 seconds.

The script I used first was: ffmpeg1.sh

My first experiment was to try to extract one picture at a time, hoping that way, internally, ffmpeg might be able to optimize something.

The second script I used was: ffmpeg2.sh

The third alternative was to try ffmpegthumbnailer which is an intricate wrapper on ffmpeg and it has the benefit that you can produce slightly higher picture quality too.

The third script I used was: ffmpeg3.sh

Bar chart comparing the 3 different scripts
And running these three depend very much on the state of my DSL at the time.

For a video clip that is 17 minutes long and a 138Mb mp4 file.

ffmpeg1.sh   2m0.847s
ffmpeg2.sh   11m46.734s
ffmpeg3.sh   0m29.780s

Clearly it's not efficient to do one screenshot at a time.
Because with ffmpegthumbnailer you can tell it not to reduce the picture quality the total weight of the produced JPGs from ffmpeg1.sh was 784Kb and the total weight from ffmpeg3.sh was 1.5Mb.

Just to try again, I ran a similar experiment with a 35 minutes long and 890Mb mp4 file. And this time I didn't bother with ffmpeg2.sh. The results were:

ffmpeg1.sh   18m21.330s
ffmpeg3.sh   2m48.656s

So that means that using ffmpegthumbnailer is about 5 times faster than ffmpeg. Huge difference!

And now, a curveball!

The reason for doing ffmpeg -i https://... was so that we don't have to first download the whole beast and run the command on a local file. However, in light of how so much longer this takes and my disdain to have to install and depend on a new tool (ffmpegthumbnailer) across all servers. Why not download the whole file and run the ffmpeg command locally.

So I download the file and it's slow because of my, currently, terrible home DSL. Then I run and time them again but just a local file instead:

ffmpeg1.sh   0m20.426s
ffmpeg3.sh   0m0.635s

Did you see that!? That's an insane difference. Clearly doing this command over HTTP(S) is a bad idea. It'll be worth downloading it first.

UPDATE

On Stackoverflow, LordNeckBeard gave a great tip of using the -ss option before in the input file and now it's much faster. At this point. I'm no longer interested in having to bother with ffmpegthumbnailer.

Let's fork ffmpeg2.sh into two versions.

ffmpeg2.1.sh same as ffmpeg2.sh but a downloaded file instead of a remote HTTPS URL.

ffmpeg2.2.sh as ffmpeg2.1.sh except we put the -ss HH:MM:SS before the input file.

Now, let's run them again on the 138Mb file:

# the 138Mb mp4.mp4 file
ffmpeg2.1.sh   2m10.898s
ffmpeg2.2.sh   0m0.672s

187 times faster

And again, I re-ran this again against a bigger file that is 1.4Gb:

# the 1.4Gb mp4-1.44Gb.mp4 file
ffmpeg2.1.sh   10m1.143s
ffmpeg2.2.sh   0m1.428s

420 times faster