01 September 2015 5 comments Web development, Mozilla
I'm currently working on a Django library that uses mozjpeg to optimize thumbnails that are generated from stored images. I first wanted to get a feel for how good
~/Downloads directory I have all sorts of "junk" from all sorts of saves and experiments. It'll work as a good testbed of relatively random JPEG images of all sorts of sizes and qualities. Without further ado, here's the results:
FILENAME OPTIMIZE ORIGINAL SAVING PERCENT
180697_1836563311933_3364808_n.jpg 45.2Kb 50.4Kb 5.1Kb 10.2%
2014-03-20 17.35.39.jpg 2040.1Kb 2207.8Kb 167.7Kb 7.6%
2015-03-04 21.18.16.jpg 1521.5Kb 1629.2Kb 107.7Kb 6.6%
2015-03-04 21.19.16.jpg 1602.4Kb 1720.0Kb 117.6Kb 6.8%
2015-03-04 21.23.16.jpg 1181.7Kb 1272.1Kb 90.4Kb 7.1%
2015-03-05 06.03.00.jpg 1426.7Kb 1557.7Kb 131.0Kb 8.4%
20150626_200629_001.jpg 1566.4Kb 1717.3Kb 151.0Kb 8.8%
20150626_200631.jpg 2157.6Kb 2319.6Kb 162.0Kb 7.0%
Boba_Fett_by_RobD4E.jpg 96.2Kb 104.3Kb 8.1Kb 7.8%
Horse_Play.jpg 170.4Kb 185.2Kb 14.9Kb 8.0%
Image (107).jpg 344.9Kb 390.6Kb 45.7Kb 11.7%
Misc Candle Holder NECA FOTR Balrog Dec2002.jpg 37.1Kb 37.7Kb 0.6Kb 1.5%
Mozilla_Lightbeam.jpg 55.1Kb 79.7Kb 24.6Kb 30.8%
Photo on 12-17-14 at 5.55 PM.jpg 168.5Kb 187.7Kb 19.2Kb 10.2%
dev.jpg 17.5Kb 30.8Kb 13.3Kb 43.2%
dev2.jpg 41.1Kb 54.3Kb 13.3Kb 24.4%
dev3.jpg 35.3Kb 49.0Kb 13.7Kb 28.0%
dev4.jpg 42.0Kb 56.0Kb 14.0Kb 25.0%
dev5.jpg 24.6Kb 37.9Kb 13.2Kb 35.0%
dev6.jpg 28.9Kb 42.8Kb 13.9Kb 32.4%
hr_0570_220_135__0570220135006.jpg 3124.3Kb 3467.8Kb 343.5Kb 9.9%
hr_0570_220_158__0570220158006.jpg 3010.0Kb 3319.1Kb 309.1Kb 9.3%
hr_0570_220_175__0570220175006.jpg 2245.5Kb 2442.6Kb 197.0Kb 8.1%
hr_0570_227_599__0570227599006.jpg 2561.7Kb 2809.8Kb 248.1Kb 8.8%
hr_0596_622_701__0596622701006.jpg 3238.8Kb 3453.6Kb 214.7Kb 6.2%
hr_0596_623_849__0596623849006.jpg 2902.9Kb 3102.1Kb 199.3Kb 6.4%
hr_0622_219_873__0622219873006.jpg 985.3Kb 1066.9Kb 81.7Kb 7.7%
logo.jpg 43.5Kb 51.2Kb 7.7Kb 15.1%
mvm-header.jpg 8.5Kb 12.4Kb 3.9Kb 31.6%
mvm-postcard-picture.jpg 72.2Kb 73.4Kb 1.3Kb 1.7%
overhang_pixels.jpg 3014.3Kb 3370.8Kb 356.4Kb 10.6%
peterbe copy.jpg 4.2Kb 10.4Kb 6.2Kb 59.7%
peterbe.jpg 36.7Kb 44.3Kb 7.5Kb 17.0%
pjt-mcguinty-2.jpg 96.8Kb 101.6Kb 4.8Kb 4.8%
sl1.jpg 28.7Kb 35.4Kb 6.7Kb 18.9%
That's an median of 9.3% (average of 15.3%) savings.
It's not very fast though. Some of the large files take more than a second. In total it took 23.7 seconds to create all of those optimized files. Do what you want with that fact, bear in mind that these are hopefully "once in a lifetime" operations (depending on the ephemerality of your thumbnail storage). Mind you, the really large JPEGs skew that since the median is 72.1 milliseconds and average is 527.0 milliseconds. Also, when I look through the numbers I find that the large JPGs take the longest but had the least benefit in terms of byte savings.
Chris Adams, in the comment below, inspired me to compare my trials with jpegoptim and jpegrescan
. So, I took my script that generated a directory of 45 JPEGs and changed it to use
mozjpeg total size of that output directory is 34.1Mb and it took a total of 23.3 seconds (median 76.4 milliseconds).
jpegoptim & jpegrescan total size of that output directory is 35.6Mb and it took a total of 4.6 seconds (median 32.1 milliseconds).
In other words, roughly speaking
mozjpeg is 4.2% more space effective and 58% slower than
jpegoptim & jpegrescan.
25 August 2015 0 comments Web development, Mozilla
tl;dr Crash-stats is Mozilla's crash reporter dashboard. Simply fixing the static assets made the site 25% faster.
(The "First Byte Time" is still terrible but that's for another discussion. We're working on a re-write of the underlying data model for that particular report.)
Note how the SpeedIndex dropped from 2823 to 2098 which basically means, you can see stuff sooner.
The Load Time used to be 5.7 seconds on average. Now it takes
It used to weigh 717 KB to load the whole thing. Now it weighs 326 KB.
The only thing we changed was a long overdue correction of static asset headers and Gzip compression. Now, files with unique URLs (e.g.
/static/CACHE/css/23a811f100bc.css) have maximum aggressive cache headers. And now all
text/html is Gzipped.
Was it easy to do? Hell no!
Does it matter? Hell yeah! We don't have a lot of users or traffic on these reports but the people who use them do this for a living and making the site feel snappier for them would make their lives more productive.
Starting today, (almost) all the thumbnails below the fold on Air Mozilla are not loaded.
The way it works, is that I use a library called Lazyr.js which notices when you scroll down and when certain pictures are going to be in view, it changes the
So it basically looks like this:
<img src="placeholder.png" data-lazyr="event4.png">
<img src="placeholder.png" data-lazyr="event5.png">
<img src="placeholder.png" data-lazyr="event6.png">
That means that to load this page it needs to download, only:
Only 4 images instead of the otherwise 6 (in this example).
When you scroll down to see the rest of the list, it then also downloads:
The actual numbers on Air Mozilla is that there are 10 events page page and I lazy load 6 of them.
You can see the results when comparing this WebPageTest with this one.
There is more work to do though. At the moment, the thumbnails in the sidebar (Trending and Upcoming events) are above the fold when you're browsing but below the fold when you're viewing an individual event. That's something I have yet to implement.
05 March 2015 2 comments Mozilla
We're proud to announce that we've now published our first Roku channel; Air Mozilla
We actually started this work in the third quarter of 2014 but the review process for adding a channel is really slow. The people we've talked to have been super friendly and provide really helpful feedback as to changes that need to be made. After the first submission, it took about a month for them to get back to us and after some procrastination we submitted it a second time about a month ago and yesterday we found out it's been fully published. I.e. gone live.
Obviously it would be nice if they could get back to us quicker but another thing they could improve is to appreciate that we're a team. All communication with Roku has been to just me and I always have to forward emails or add my teammates as CC when I communicate with them.
Anyway, now we can start on a version 2. We deliberately kept this first version ultra-simple just to prove that it's possible and not being held back due to feature creep.
What we're looking to add in version 2 are, in no particular order:
- Ability to navigate by search
- Ability to sign in and see restricted content
- Adding Trending events
- Ability to see what the upcoming events are
It's going to be much easier to find the energy to work on those features now that we know it's live.
Also, we currently have a problem watching live and archived streams on HTTPS. It's not a huge problem right now because we're not making any restricted content available and we're lucky in that the CDNs we use allow for HTTP traffic equally.
By the way, the
Air Mozilla Roku code is here and there's a README that'll get your started if you want to help out.
19 December 2014 1 comment Linux, Web development, Mozilla
tl;dr Don't run ffmpeg over HTTP(S) and use ffmpegthumbnailer
UPDATE tl;dr Download the file then run ffmpeg with -ss HH:MM:SS first. Don't bother with ffmpegthumbnailer
At work I work on something called Air Mozilla
. It's a site for hosting live video broadcasts and then archiving those so they can be retrieved later.
Unlike sites like YouTube we can't take a screencap from the video because many videos are future (aka. "upcoming") videos so instead we use a little placeholder thumbnail (for example, the Rust logo).
However, once it has been recorded we want to switch from the logo to an actual screen capture from the video itself. We set up a cronjob that uses
ffmpeg to extract these as JPGs and then the users can go in and select whichever picture they like the best.
This is all work in progress by the way (as of December 2014).
One problem is that we have is that the command for extracting JPGs is really slow. So slow that we can't wrap the subprocess in a Django database connection because it's so slow that the database connection is often killed.
The command to extract them looks something like this:
ffmpeg -i https://cdnexample.com/url/to/file.mp4 -r 0.0143 /tmp/screencaps-%02d.jpg
Where the number
r is based on the duration and how many pictures we want out. E.g.
0.0143 = 15 * 1049 where 15 is how many JPGs we want and
1049 is a duration of 17 minutes and 29 seconds.
The script I used first was: ffmpeg1.sh
My first experiment was to try to extract one picture at a time, hoping that way, internally,
ffmpeg might be able to optimize something.
The second script I used was: ffmpeg2.sh
The third alternative was to try ffmpegthumbnailer which is an intricate wrapper on
and it has the benefit that you can produce slightly higher picture quality too.
The third script I used was: ffmpeg3.sh
And running these three depend very much on the state of my DSL at the time.
For a video clip that is 17 minutes long and a 138Mb mp4 file.
Clearly it's not efficient to do one screenshot at a time.
ffmpegthumbnailer you can tell it not to reduce the picture quality the total weight of the produced JPGs from
ffmpeg1.sh was 784Kb and the total weight from
Just to try again, I ran a similar experiment with a 35 minutes long and 890Mb mp4 file. And this time I didn't bother with
ffmpeg2.sh. The results were:
So that means that using
ffmpegthumbnailer is about 5 times faster than
ffmpeg. Huge difference!
And now, a curveball!
The reason for doing
ffmpeg -i https://... was so that we don't have to first download the whole beast and run the command on a local file. However, in light of how so much longer this takes and my disdain to have to install and depend on a new tool (
ffmpegthumbnailer) across all servers. Why not download the whole file and run the
ffmpeg command locally.
So I download the file and it's slow because of my, currently, terrible home DSL. Then I run and time them again but just a local file instead:
Did you see that!? That's an insane difference. Clearly doing this command over HTTP(S) is a bad idea. It'll be worth downloading it first.
On Stackoverflow, LordNeckBeard gave a great tip of using the
-ss option before in the input file and now it's much faster. At this point. I'm no longer interested in having to bother with
Let's fork ffmpeg2.sh into two versions.
ffmpeg2.1.sh same as ffmpeg2.sh but a downloaded file instead of a remote HTTPS URL.
ffmpeg2.2.sh as ffmpeg2.1.sh except we put the
-ss HH:MM:SS before the input file.
Now, let's run them again on the 138Mb file:
# the 138Mb mp4.mp4 file
187 times faster
And again, I re-ran this again against a bigger file that is 1.4Gb:
# the 1.4Gb mp4-1.44Gb.mp4 file
420 times faster
12 June 2014 2 comments Python, Mozilla, PostgreSQL
Yesterday I had the good fortune to present Crontabber to the The San Francisco Bay Area PostgreSQL Meetup Group organized by my friend Josh Berkus.
To spare you having to watch the whole presentation I'm going to summarize some of it here.
My colleague Lars also has a blog post about Crontabber and goes into a bit more details about the nitty gritty of using PostgreSQL.
What is crontabber?
It's a tool for running cron jobs. It's written in Python and PostgreSQL and it's a tool we need for Socorro
which has lots and lots of stored procedures and cron jobs.
So it's basically a script that gets started by UNIX crontab and runs every 5 minutes. Internally it keeps an index of all the apps it needs to run and it manages dependencies between jobs and is self-healing meaning that if something goes wrong during run-time it just retries again and again until it works. Amongst many of the juicy features it offers on top of regular "cron wrappers" is something we call "backfilling".
Backfill jobs are jobs that happen on a periodic basis and get given a date. If all is working this date is the same as "now()" but if something was to go wrong it remembers all the dates that did not work and re-attempts with those exact dates. That means that you can guarantee that the job gets started with every date possible even if it needs to catch up on past dates.
There's plenty of documentation on how to install and create jobs but it all starts with a simple
pip install crontabber.
To get a feel for how you write crontabber "apps", checkout the ones for Socorro or flick through the slides in the PDF.
Is it mature?
Yes! It all started in early 2012 as a part of the Socorro code base and after some hard months of it stabalizing and maturing we decided to extract it out of Socorro and into its own project on GitHub
under the Mozilla Public Licence 2.0 licence. Now it stands on its own legs and has no longer anything to do with Socorro and can be used for anything and anyone who has a lot of complicated cron jobs that need to run in Python with a PostgreSQL connection. In Socorro we use it primarily for executing stored procedures but that's just one type of app. You can also make it call out on to anything the command line with a "@with_subprocess" decorator.
Is it finished?
No. It works really well for us but there's a decent list of features we want to add. The hope is that by open sourcing it we can get other organizations to adopt it and not only find bugs but also contribute code to add more juicy features.
One of the soon-to-come features we have in mind is to "internalize locking"
. At the moment you have to wrap it in a bash script that prevents it from being run concurrently. Crontabber is single-threaded and we don't have to worry about "dependent jobs" starting before "parent jobs" because the depdendencies and their state is stored in the database. But you might need to worry about the same job (the one due next) to be started concurrently. By internalizing the locking we can store, in the state database, that a particular job is being started on and thus not have to worry about it starting the same job twice.
I really hope this project can grow and continue to support us in our needs.