I would advice against doing the optimization in the background thread because the complexity of that is enormous. Also, you probably have a CDN that works like CloudFront in that it picks up the thumbnails once and caches it under that filename for a long time. So unless you change the thumbnail file name after the background task is done, the CDN will have a stale copy.
If speed is a concern, and you can't wait 76 milliseconds (or less!) for each thumbnail, you could perhaps create the thumbnails as part of the CMS or some other script that doesn't mind waiting.
Have you compared mozjpeg's lossless optimization to other tools like jpegoptim / jpegrescan? I was recently testing that (https://gist.github.com/acdha/d85c927d35ee6df2c57d#file-optimize-images-sh) and found similar results (big saves for smaller files, uncompeling time/benefit trade-off for larger ones).
One optimization which I've considered but have not yet implemented was basically a hybrid approach: generate a thumbnail quickly with limited optimization, queue a task to run the optimizer, and serve it with a lower TTL until the optimized version is available.
Yeah, I wouldn't recommend that as a general strategy but my rationale was roughly that while I have a CDN it has a hit rate somewhere around 60% and so all but the consistently most popular content refreshes more regularly than the TTLs. Having a queued worker looking at the more popular images and optimizing them really aggressively would be nice both for gradually improving things over time.
I agree, though, that this is an edge case. The first thing I was looking into doing was much easier, optimizing the DZI files we use with OpenSeadragon (http://openseadragon.github.io) since those are already generated off-line and stored durably.
The other thing I wanted to look into was the latency of calling a binary vs. doing this in process to see if it'd be worth using a cffi binding to optimize the images in memory (Pillow -> mozjpeg -> disk).
One thing to consider too, and maybe it's off-topic, but if you hand over the optimization work to someone like Kraken.io, then you have all the network overhead to worry about and that might make ~70ms into several seconds.
Comment
See update above. Thanks for the comment.
I would advice against doing the optimization in the background thread because the complexity of that is enormous. Also, you probably have a CDN that works like CloudFront in that it picks up the thumbnails once and caches it under that filename for a long time. So unless you change the thumbnail file name after the background task is done, the CDN will have a stale copy.
If speed is a concern, and you can't wait 76 milliseconds (or less!) for each thumbnail, you could perhaps create the thumbnails as part of the CMS or some other script that doesn't mind waiting.
Parent comment
Have you compared mozjpeg's lossless optimization to other tools like jpegoptim / jpegrescan? I was recently testing that (https://gist.github.com/acdha/d85c927d35ee6df2c57d#file-optimize-images-sh) and found similar results (big saves for smaller files, uncompeling time/benefit trade-off for larger ones). One optimization which I've considered but have not yet implemented was basically a hybrid approach: generate a thumbnail quickly with limited optimization, queue a task to run the optimizer, and serve it with a lower TTL until the optimized version is available.
Replies
Yeah, I wouldn't recommend that as a general strategy but my rationale was roughly that while I have a CDN it has a hit rate somewhere around 60% and so all but the consistently most popular content refreshes more regularly than the TTLs. Having a queued worker looking at the more popular images and optimizing them really aggressively would be nice both for gradually improving things over time.
I agree, though, that this is an edge case. The first thing I was looking into doing was much easier, optimizing the DZI files we use with OpenSeadragon (http://openseadragon.github.io) since those are already generated off-line and stored durably.
The other thing I wanted to look into was the latency of calling a binary vs. doing this in process to see if it'd be worth using a cffi binding to optimize the images in memory (Pillow -> mozjpeg -> disk).
One thing to consider too, and maybe it's off-topic, but if you hand over the optimization work to someone like Kraken.io, then you have all the network overhead to worry about and that might make ~70ms into several seconds.