I'm currently working on a Django library that uses mozjpeg to optimize thumbnails that are generated from stored images. I first wanted to get a feel for how good mozjpeg really is.

In my ~/Downloads directory I have all sorts of "junk" from all sorts of saves and experiments. It'll work as a good testbed of relatively random JPEG images of all sorts of sizes and qualities. Without further ado, here's the results:

FILENAME                                          OPTIMIZE   ORIGINAL     SAVING  PERCENT
-----------------------------------------------------------------------------------------
180697_1836563311933_3364808_n.jpg                  45.2Kb     50.4Kb      5.1Kb    10.2%
2014-03-20 17.35.39.jpg                           2040.1Kb   2207.8Kb    167.7Kb     7.6%
2015-03-04 21.18.16.jpg                           1521.5Kb   1629.2Kb    107.7Kb     6.6%
2015-03-04 21.19.16.jpg                           1602.4Kb   1720.0Kb    117.6Kb     6.8%
2015-03-04 21.23.16.jpg                           1181.7Kb   1272.1Kb     90.4Kb     7.1%
2015-03-05 06.03.00.jpg                           1426.7Kb   1557.7Kb    131.0Kb     8.4%
20150626_200629_001.jpg                           1566.4Kb   1717.3Kb    151.0Kb     8.8%
20150626_200631.jpg                               2157.6Kb   2319.6Kb    162.0Kb     7.0%
Boba_Fett_by_RobD4E.jpg                             96.2Kb    104.3Kb      8.1Kb     7.8%
Horse_Play.jpg                                     170.4Kb    185.2Kb     14.9Kb     8.0%
Image (107).jpg                                    344.9Kb    390.6Kb     45.7Kb    11.7%
Misc Candle Holder NECA FOTR Balrog Dec2002.jpg     37.1Kb     37.7Kb      0.6Kb     1.5%
Mozilla_Lightbeam.jpg                               55.1Kb     79.7Kb     24.6Kb    30.8%
Photo on 12-17-14 at 5.55 PM.jpg                   168.5Kb    187.7Kb     19.2Kb    10.2%
dev.jpg                                             17.5Kb     30.8Kb     13.3Kb    43.2%
dev2.jpg                                            41.1Kb     54.3Kb     13.3Kb    24.4%
dev3.jpg                                            35.3Kb     49.0Kb     13.7Kb    28.0%
dev4.jpg                                            42.0Kb     56.0Kb     14.0Kb    25.0%
dev5.jpg                                            24.6Kb     37.9Kb     13.2Kb    35.0%
dev6.jpg                                            28.9Kb     42.8Kb     13.9Kb    32.4%
hr_0570_220_135__0570220135006.jpg                3124.3Kb   3467.8Kb    343.5Kb     9.9%
hr_0570_220_158__0570220158006.jpg                3010.0Kb   3319.1Kb    309.1Kb     9.3%
hr_0570_220_175__0570220175006.jpg                2245.5Kb   2442.6Kb    197.0Kb     8.1%
hr_0570_227_599__0570227599006.jpg                2561.7Kb   2809.8Kb    248.1Kb     8.8%
hr_0596_622_701__0596622701006.jpg                3238.8Kb   3453.6Kb    214.7Kb     6.2%
hr_0596_623_849__0596623849006.jpg                2902.9Kb   3102.1Kb    199.3Kb     6.4%
hr_0622_219_873__0622219873006.jpg                 985.3Kb   1066.9Kb     81.7Kb     7.7%
logo.jpg                                            43.5Kb     51.2Kb      7.7Kb    15.1%
mvm-header.jpg                                       8.5Kb     12.4Kb      3.9Kb    31.6%
mvm-postcard-picture.jpg                            72.2Kb     73.4Kb      1.3Kb     1.7%
overhang_pixels.jpg                               3014.3Kb   3370.8Kb    356.4Kb    10.6%
peterbe copy.jpg                                     4.2Kb     10.4Kb      6.2Kb    59.7%
peterbe.jpg                                         36.7Kb     44.3Kb      7.5Kb    17.0%
pjt-mcguinty-2.jpg                                  96.8Kb    101.6Kb      4.8Kb     4.8%
sl1.jpg                                             28.7Kb     35.4Kb      6.7Kb    18.9%

That's an median of 9.3% (average of 15.3%) savings.

It's not very fast though. Some of the large files take more than a second. In total it took 23.7 seconds to create all of those optimized files. Do what you want with that fact, bear in mind that these are hopefully "once in a lifetime" operations (depending on the ephemerality of your thumbnail storage). Mind you, the really large JPEGs skew that since the median is 72.1 milliseconds and average is 527.0 milliseconds. Also, when I look through the numbers I find that the large JPGs take the longest but had the least benefit in terms of byte savings.

UPDATE

Chris Adams, in the comment below, inspired me to compare my trials with jpegoptim and jpegrescan. So, I took my script that generated a directory of 45 JPEGs and changed it to use jpegoptim and jpegrescan.

The mozjpeg total size of that output directory is 34.1Mb and it took a total of 23.3 seconds (median 76.4 milliseconds).

The jpegoptim & jpegrescan total size of that output directory is 35.6Mb and it took a total of 4.6 seconds (median 32.1 milliseconds).

In other words, roughly speaking mozjpeg is 4.2% more space effective and 58% slower than jpegoptim & jpegrescan.

Comments

Chris Adams

Have you compared mozjpeg's lossless optimization to other tools like jpegoptim / jpegrescan? I was recently testing that (https://gist.github.com/acdha/d85c927d35ee6df2c57d#file-optimize-images-sh) and found similar results (big saves for smaller files, uncompeling time/benefit trade-off for larger ones).

One optimization which I've considered but have not yet implemented was basically a hybrid approach: generate a thumbnail quickly with limited optimization, queue a task to run the optimizer, and serve it with a lower TTL until the optimized version is available.

Peter Bengtsson

See update above. Thanks for the comment.

I would advice against doing the optimization in the background thread because the complexity of that is enormous. Also, you probably have a CDN that works like CloudFront in that it picks up the thumbnails once and caches it under that filename for a long time. So unless you change the thumbnail file name after the background task is done, the CDN will have a stale copy.

If speed is a concern, and you can't wait 76 milliseconds (or less!) for each thumbnail, you could perhaps create the thumbnails as part of the CMS or some other script that doesn't mind waiting.

Chris Adams

Yeah, I wouldn't recommend that as a general strategy but my rationale was roughly that while I have a CDN it has a hit rate somewhere around 60% and so all but the consistently most popular content refreshes more regularly than the TTLs. Having a queued worker looking at the more popular images and optimizing them really aggressively would be nice both for gradually improving things over time.

I agree, though, that this is an edge case. The first thing I was looking into doing was much easier, optimizing the DZI files we use with OpenSeadragon (http://openseadragon.github.io) since those are already generated off-line and stored durably.

The other thing I wanted to look into was the latency of calling a binary vs. doing this in process to see if it'd be worth using a cffi binding to optimize the images in memory (Pillow -> mozjpeg -> disk).

Peter Bengtsson

One thing to consider too, and maybe it's off-topic, but if you hand over the optimization work to someone like Kraken.io, then you have all the network overhead to worry about and that might make ~70ms into several seconds.

Peter Bengtsson

With the UPDATE numbers above in mind, I'm inclined to conclude that whichever is easiest to install on your server, use that.

Your email will never ever be published.

Related posts