Peterbe.com

A blog and website by Peter Bengtsson

Now that Autocompeter.com is launched I can publish some preliminary benchmarks of "real" usage. It's all on my MacBook Pro on a local network with a local Redis but it's quite telling that it's pretty fast.

What I did was I started with a completely empty Redis database then I did the following things:

First of all, I bulk load in 1035 "documents" (110Kb of data). This takes about 0.44 seconds consistently!

  1. GET on the home page (not part of the API and thus quite unimportant in terms of performance)
  2. GET on a search with a single character ("p") expecting 10 results (e.g. /v1?d=mydomain&q=p)
  3. GET on a search with a full word ("python") expecting 10 results
  4. GET on a search with a full word that isn't in the index ("xxxxxxxx") expecting 0 results
  5. GET on a search with two words ("python", "te") expecting 4 results
  6. GET on a search with two words that aren't in the index ("xxxxxxx", "yyyyyy") expecting 0 results

In each benchmark I use wrk with 10 connections, lasting 5 seconds, using 5 threads.

And for each round I try with 1 processor, 2 processors and 8 processors (my laptop's max according to runtime.NumCPU()).

I ran it a bunch of times and recorded the last results for each number of processors.
The results are as follows:

Autocompeter Benchmark v1

Notes

  • Every search incurs a write in the form of incrementing a counter.
  • Searching on more than one word causes an ZINTERSTORE.
  • The home page does a bit more now since this benchmark was made. In particular looking for a secure cookie.
  • Perhaps interally Redis could get faster if you run the benchmarks repeatedly after the bulk load so it's internals could "warm up".
  • I used a pool of 100 Redis connections.
  • I honestly don't know if 10 connections, 5 seconds, 5 threads is an ideal test :)

Basically, this is a benchmark of Redis more so than Go but it's quite telling that running it in multiple processors does help.

If you're curious, the benchmark code is here and I'm sure there's things one can do to speed it up even more. It's just interesting that it's so freakin' fast out of the box!

In particular I'm very pleased with the fact that it takes less than half a second to bulk load in over 1,000 documents.

(For context, I released Autocompeter.com last week and now I'm thinking about improvements)

I posted a question on Twitter about which highlighting formatting people prefer and got some interesting feedback. More about that later.

The piece of feedback that really got my attention came from my friend Honza Král.
He wondered if not the whole word should be highlighted instead of just the beginning of the word.

I've actually been thinking about that too but never got around to trying it out. Until now.

Before

Before

After

After

What do you think?

I have the code in a branch and I'm still mulling it over. There's sort of a convention to just highlight based on what you've typed so far. I don't want to be too weird because when people don't feel familiar they don't like what they see even if the new actually is better.

For Autocompeter I develop with gulp. It's like Grunt but better.

One thing I wanted was that when it makes the src/autocompeter.js --> minify() --> dist/autocompeter.min.js step I also wanted to put in a little preample header into the minified file.

First I thought, since UglifyJS supports a --preamble option that that'd be the route to go. I didn't get very far.

Then I thought I had to write my own plugin. So I started reading the documentation about how to write a plugin and partially thinking "Oh I don't have time to do this" and also "Oh finally a chance to sit down and really understand how gulp plugins work". I was wrong. The documentation for writing plugins say:

"Your plugin shouldn't do things that other plugins are responsible for... ...It should not add headers, gulp-header does that"

Oh! So there is already a great plugin for this! Long story short; here's how I used it. The output is that the version number is now on the first line of autocompeter.min.js.

I'm starting to like gulp more and more. There's even a dedicate nice index of all available plugins.

One of the most constructive pieces of feedback I got when Autocompeter was on Hacker News was that when you type something with lots of predictable results the results overlay would "flicker".

E.g. you type "javascript" in a nice and steady pace and the overlay would shrink and grow and shrink and grow very rapidly.

The reason it happened was due to a bug in the javascript code that filtered results whilst waiting for the next AJAX request from the latest typed character. E.g. you type "ash" and the results comes back with "ashley", "ashes", "Ashford". Then you add a "l" so now we start a new AJAX query for "ashl" and whilst waiting for that output from the server we can start filtering out "ashes" and "Ashford" because we can pre-emptively know that that won't be in the new result set.

The bug was a bad function that filtered the existing results on a second rendering whilst waiting for the next AJAX. It was easy to fix and this is included in version 1.1.8.

The reason I failed to notice this was because I had inserted some necessary optimizations when the network latency was very very slow but hadn't tested it in a realistic network latency environment. E.g. a decent DSL connection but nevertheless something more advanced that just connecting to localhost.

About a year ago I found a great article about how to use Redis to index every prefix of every word as an index and thus a super fast way of building an autocomplete service. The idea is that you take all your titles and index them like this; if the title is "My Title" you store a key for m, my, t, ti, tit, titl and title. That means you can do very fast lookups as someone is typing unfinished words.

Anyway. I was running this merrily here on my personal blog but I liked it so much and I wanted to use it on aother site for work that I thought it'd be time to extract it into its own little microservice. All I needed was a name and my friend and colleague jezdez suggested I call it "autocompeter". So that it became.

The original implementation was written in Python and at the time I was learning Go and was eager to have something to build in Go. So I built this microservice in Go using the Negroni web framework.

The idea is that you own and run a website. You have a search feature on your website but you don't have a nifty autocomplete (aka. live search) thing on it. So, you send me all your titles, URLs and optionally their "popularity ranking" (basically a score). I'll index them on autocompeter.com under your domain. You have to sign in with GitHub to set up an API Auth Key.

Then, you put this into your HTML:

<script src="//cdn.jsdelivr.net/autocompeter/1/autocompeter.min.js"></script>
<script>
Autocompeter(document.querySelector('input[name="q"]');
</script>

Also, you'll need to download the CSS and put into your site. I don't recommend pointing to a CDN for CSS.

And that's all you have to do. The REST API, options for the Javascript integration and CSS integration in the documentation.

The Javascript is framework-free meaning it's just pure DOM manipulation and works in IE and modern browsers. The minified file is only 4.2Kb minified (2Kb gzipped).

All code is Open Source under a BSD license. Everything is free but there's no SLA as of yet.

I'm going to be blogging more and more about feature development, benchmarks and other curious things I learn developing this further.

After a day of pushing 9 commits to a PR to finally get Travis to build a simple python package on python 2.6, 2.7, 3.3 and 3.4 I finally gave up and ripped out all of httpretty and replaced it with good old mock.patch()

I was getting all sorts of strange warnings in py3.3 and 3.4 got stuck all the time.
This is not the first time httpretty has been causing confusion so from now on I'm giving up on httpretty. Ithink it was too good to be true to work reliably. Honestly, it might be python's fault for not being better made available to cool libs like httpretty.

By the way, here's one of those errors where Python 3.4 just hangs which stopped being the case once I took out httpretty. And here you can see the clear failure to deactivate the monkeypatch even after the test is complete in Python 3.3.

Strangely this was hard. The solution looks easy but it took me a lot of time to get this right.

What I want to do is split a string up into a slice of parts. For example:

word := "peter"
for i := 1; i <= len(word); i++ {
    fmt.Println(word[0:i])
}

The output is:

p
pe
pet
pete
peter

Play here.

Now, what if the word you want to split contains non-ascii characters? As a string "é" is two characters internally. So you can't split them up. See this play and press "Run".

So, the solution is to iterate over the string and Go will give you an index of each unicode character. But you'll want to skip the first one.

word := "péter"
for i, _ := range word {
    if i > 0 {
        fmt.Println(word[0:i])
    }
}
fmt.Println(word)

You can play with it here in this play.

Now the output is the same as what we got in the first example but with unicode characters in it.

p
pé
pét
péte
péter

I bet this is obvious to the gurus who've properly read the documentation but it certainly took me some time to figure it out and so I thought I'd share it.

We're proud to announce that we've now published our first Roku channel; Air Mozilla

Browsing for Air Mozilla
We actually started this work in the third quarter of 2014 but the review process for adding a channel is really slow. The people we've talked to have been super friendly and provide really helpful feedback as to changes that need to be made. After the first submission, it took about a month for them to get back to us and after some procrastination we submitted it a second time about a month ago and yesterday we found out it's been fully published. I.e. gone live.

Obviously it would be nice if they could get back to us quicker but another thing they could improve is to appreciate that we're a team. All communication with Roku has been to just me and I always have to forward emails or add my teammates as CC when I communicate with them.

Anyway, now we can start on a version 2. We deliberately kept this first version ultra-simple just to prove that it's possible and not being held back due to feature creep.

What we're looking to add in version 2 are, in no particular order:

  1. Ability to navigate by search
  2. Ability to sign in and see restricted content
  3. Adding Trending events
  4. Ability to see what the upcoming events are

It's going to be much easier to find the energy to work on those features now that we know it's live.

Also, we currently have a problem watching live and archived streams on HTTPS. It's not a huge problem right now because we're not making any restricted content available and we're lucky in that the CDNs we use allow for HTTP traffic equally.

Remember, Air Mozilla is Open Source and we encourage people to jump in and contribute if you want to share your Python, design, Javascript or BrightScript skills.

By the way, the Air Mozilla Roku code is here and there's a README that'll get your started if you want to help out.

I'm a web engineer (aka. web developer) and I try my best to keep up with the latest and greatest in my field without minimal effort. I need good resources that pack a big bunch but doesn't take all day to digest.

Here are the newsletters I enjoy and almost depend on for my work (in no particular order):

  1. ng-newsletter
    "The free, weekly newsletter of the best AngularJS content on the web." Comes once a week with about 5 or so links to articles about AngularJS.

  2. Pycoder's Weekly
    "Your weekly dose of all things Python!" A mix of recent new projects that have become available on PyPI or GitHub. Also always a nice list of 3-6 articles revolving around Python.

  3. Web Development Reading List
    "A handcrafted, carefully selected list of web development related resources." Topics are usually split up by News, Generic/Tools, Web Performance, HTML/SVG, Javascript and CSS/Sass. Very high quality despite there being lots of links.

  4. The Changelog Weekly
    "Open Source moves fast. Keep up." They have a podcast too which comes out, I think, once a week too. The topics here are highly technical but often more broader and rather "about the tech" as opposed to "how the tech". They have a crips separation of curated content and sponsored content.

  5. Go Newsletter
    "A weekly newsletter about the Go programming language" This one I'm quite new to. It's jampacked with links to projects and articles.

  6. Explore GitHub
    It's configurable. I have it sent to me on a weekly basis. I get two sections "Trending repositories this week" and "Starred by people you follow this week". It's basically just links to GitHub projects. No articles or curated content. I guess it's maybe not a newsletter but it's so useful that I just have to include it.

Which ones do you depend on/love that I should consider?

If you haven't heard of jsDelivr then you've missed out on a great project! It's basically a free CDN to use for Open Source projects. I added my own yesterday and it was as easy as making a pull request with the initial file, some metadata and a file that tells them where to pick up new versions from (e.g. GitHub).

Anyway, they now host A LOT of files. 8,941 to be exact (1,927 unique file names), at the time of writing. So I thought I'd check out what the median size is.

The median size is: 7.4Kb

The average is 28.6Kb and the standard deviation is 73Kb! so we can basically ignore that and just focus on the median.

Check out the histogram

That's pretty big! If you exclude those bigger than 100Kb the median shrinks to 6.5Kb. Still pretty big.

I'm proud to say that my own is only 50% the size of the median size.