Gzip rules the world of optimization, often

09 August 2014   4 comments   Python, Javascript

Powered by Fusion×

So I have a massive chunk of JSON that a Django view is sending to a piece of Angular that displays it nicely on the page. It's big. 674Kb actually. And it's likely going to be bigger in the near future. It's basically a list of dicts. It looks something like this:

>>> pprint(d['events'][0])
{u'archive_time': None,
 u'archive_url': u'/manage/events/archive/1113/',
 u'channels': [u'Main'],
 u'duplicate_url': u'/manage/events/duplicate/1113/',
 u'id': 1113,
 u'is_upcoming': True,
 u'location': u'Cyberspace - Pacific Time',
 u'modified': u'2014-08-06T22:04:11.727733+00:00',
 u'privacy': u'public',
 u'privacy_display': u'Public',
 u'slug': u'bugzilla-development-meeting-20141115',
 u'start_time': u'15 Nov 2014 02:00PM',
 u'start_time_iso': u'2014-11-15T14:00:00-08:00',
 u'status': u'scheduled',
 u'status_display': u'Scheduled',
 u'thumbnail': {u'height': 32,
                u'url': u'/media/cache/e7/1a/e71a58099a0b4cf1621ef3a9fe5ba121.png',
                u'width': 32},
 u'title': u'Bugzilla Development Meeting'}

So I thought one hackish simplification would be to convert each of these dicts into an list with a known sort order. Something like this:

>>> event = d['events'][0]
>>> pprint([event[k] for k in sorted(event)])
 u'Cyberspace - Pacific Time',
 u'15 Nov 2014 02:00PM',
 {u'height': 32,
  u'url': u'/media/cache/e7/1a/e71a58099a0b4cf1621ef3a9fe5ba121.png',
  u'width': 32},
 u'Bugzilla Development Meeting']

So I converted my sample events.json file like that:

$ l -h events*
-rw-r--r--  1 peterbe  wheel   674K Aug  8 14:08 events.json
-rw-r--r--  1 peterbe  wheel   423K Aug  8 15:06 events.optimized.json

Excitingly the file is now 250Kb smaller because it no longer contains all those keys.

Now, I'd also send the order of the keys so I could do something like this in the AngularJS code:

 .success(function(response) {
   events = []
   response.events.forEach(function(event) {
     var new_event = {}
     response.keys.forEach(function(key, i) {
       new_event[k] = event[i]

Yuck! Nested loops! It was just getting more and more complicated.
Also, if there are keys that are not present in every element, it means I'd have to replace them with None.

At this point I stopped and I could smell the hackish stink of sulfur of the hole I was digging myself into.
Then it occurred to me, gzip is really good at compressing repeated things which is something we have plenty of in a document store type data structure that a list of dicts is.

So I packed them manually to see what we could get:

$ apack events.json.gz events.json
$ apack events.optimized.json.gz events.optimized.json

And without further ado...

$ l -h events*
-rw-r--r--  1 peterbe  wheel   674K Aug  8 14:08 events.json
-rw-r--r--  1 peterbe  wheel    90K Aug  8 14:20 events.json.gz
-rw-r--r--  1 peterbe  wheel   423K Aug  8 15:06 events.optimized.json
-rw-r--r--  1 peterbe  wheel    81K Aug  8 15:07 events.optimized.json.gz

Basically, all that complicated and slow hoopla for saving 10Kb. No thank you.

Thank you gzip for existing!


Anonymous Coward
In my experiments with obese JSON, I found that lzma beats gzip and bzip2 on extremely small or large data. Otherwise, bzip2 always beats gzip. Unfortunately, only gzip is the standard across web browsers, right?
Peter Bengtsson
Yes, gzip is the only standard that browsers use.
Hi, instead of using apack, you can use the zlib module in the standard library which would do the same from within python.
Stephen Chung
When you use GZip, all your "keys" are essentially free except for one copy. So your gzipped vs gzipped optimized should be the zipped sizes of all the keys, which you're going to send over the wire in one way or another. In other words, the optimized version has no benefit over the version with keys when gzipped.
Thank you for posting a comment

Your email will never ever be published

Related posts

Common names amongst my Facebook friends 26 June 2014
Aggressively prefetching everything you might click 20 August 2014
json-schema-reducer 02 August 2016
How to no-mincss links with django-pipeline 03 February 2016
mozjpeg installation and sample 10 October 2015
Examples of mozjpeg savings 01 September 2015
Crash-stats just became a whole lot faster 25 August 2015
How to test if gzip is working on your site or app 20 August 2015
Introducing optisorl 18 August 2015
Python slow-down of exception handling or condition checking 14 May 2015
AJAX or not 22 December 2014
Optimizing MozTrap 04 June 2014