A blog and website by Peter Bengtsson

Switching from AWS S3 (boto3) to Google Cloud Storage (google-cloud-storage) in Python

12 October 2018 0 comments   Python

I'm in the midst of rewriting a big app that currently uses AWS S3 and will soon be switched over to Google Cloud Storage. This blog post is a rough attempt to log various activities in both Python libraries:

Disclaimer: I'm manually copying these snippets from a real project and I have to manually scrub the code clean of unimportant quirks, hacks, and other unrelated things that would just add noise.



$ pip install boto3
$ emacs ~/.aws/credentials


$ pip install google-cloud-storage
$ cat ./google_service_account.json

Note: You need to create a service account and then that gives you a .json file which you download and make sure you pass its path when you create a client.

I suspect there are more/other ways to do this with environment variables alone but I haven't got there yet.

Making a "client"


Note, there are easier shortcuts for this but with this pattern you can have full control over things like like read_timeout, connect_timeout, etc. with that confi_params keyword.

import boto3
from botocore.config import Config

def get_s3_client(region_name=None, **config_params):
    options = {"config": Config(**config_params)}
    if region_name:
        options["region_name"] = region_name
    session = boto3.session.Session()
    return session.client("s3", **options)


from import storage

def get_gcs_client():
    return storage.Client.from_service_account_json(

Checking if a bucket exists and if you have access to it

boto3 (for s3_client here, see above)

from botocore.exceptions import ClientError, EndpointConnectionError


except ClientError as exception:
    if exception.response["Error"]["Code"] in ("403", "404"):
        raise BucketHardError(
            f"Unable to connect to bucket={bucket_name!r} "
            f"ClientError ({exception.response!r})"
except EndpointConnectionError:
    raise BucketSoftError(
        f"Unable to connect to bucket={!r} "
    print("It exists and we have access to it.")


from google.api_core.exceptions import BadRequest

except BadRequest as exception:
    raise BucketHardError(
        f"Unable to connect to bucket={bucket_name!r}, "
        f"because bucket not found due to {exception}"
    print("It exists and we have access to it.")

Checking if an object exists


from botocore.exceptions import ClientError

def key_existing(client, bucket_name, key):
    """return a tuple of (
        key's size if it exists or 0,
        S3 key metadata
    If the object doesn't exist, return None for the metadata.
        response = client.head_object(Bucket=bucket_name, Key=key)
        return response["ContentLength"], response.get("Metadata")
    except ClientError as exception:
        if exception.response["Error"]["Code"] == "404":
            return 0, None

Note, if you do this a lot and often find that the object doesn't exist the using list_objects_v2 is probably faster.


def key_existing(client, bucket_name, key):
    """return a tuple of (
        key's size if it exists or 0,
        S3 key metadata
    If the object doesn't exist, return None for the metadata.
    bucket = client.get_bucket(bucket_name)
    blob = bucket.get_blob(key)
    if blob:
        return blob.size, blob.metadata
    return 0, None

Uploading a file with a special Content-Encoding

Note: You have to use your imagination with regards to the source. In this example, I'm assuming that the source is a file on disk and that it might have already been compressed with gzip.


def upload(file_path, bucket_name, key_name, metadata=None, compressed=False):
    content_type = get_key_content_type(key_name)

    metadata = metadata or {}

    # boto3 will raise a botocore.exceptions.ParamValidationError
    # error if you try to do something like:
    #  s3.put_object(Bucket=..., Key=..., Body=..., ContentEncoding=None)
    # ...because apparently 'NoneType' is not a valid type.
    # We /could/ set it to something like '' but that feels like an
    # actual value/opinion. Better just avoid if it's not something
    # really real.
    extras = {}
    if content_type:
        extras["ContentType"] = content_type
    if compressed:
        extras["ContentEncoding"] = "gzip"
    if metadata:
        extras["Metadata"] = metadata

     with open(file_path, "rb") as f:
         s3_client.put_object(Bucket=bucket_name, Key=key_name, Body=f, **extras)


def upload(file_path, bucket_name, key_name, metadata=None, compressed=False):
    content_type = get_key_content_type(key_name)

    metadata = metadata or {}
    bucket = gcs_client.get_bucket(bucket_name)
    blob = bucket.blob(key_name)

    if content_type:
        blob.content_type = content_type
    if compressed:
        blob.content_encoding = "gzip"
    blob.metadata = metadata

Downloading and uncompressing a gzipped object


from io import BytesIO
from gzip import GzipFile
from botocore.exceptions import ClientError

from .utils import iter_lines

def get_stream(bucket_name, key_name):
        response = source.s3_client.get_object(
            Bucket=bucket_name, Key=key
    except ClientError as exception:
        if exception.response["Error"]["Code"] == "NoSuchKey":
            raise KeyHardError("key not in bucket")

    stream = response["Body"]
    # But if the content encoding is gzip we have re-wrap the stream.
    if response.get("ContentEncoding") == "gzip":
        body = response["Body"].read()
        bytestream = BytesIO(body)
        stream = GzipFile(None, "rb", fileobj=bytestream)

    for line in iter_lines(stream):
        yield line.decode("utf-8")


from io import BytesIO
from gzip import GzipFile
from botocore.exceptions import ClientError

from .utils import iter_lines

def get_stream(bucket_name, key_name):
    bucket = gcs_client.get_bucket(bucket_name)
    blob = bucket.get_blob(key)
    if blob is None:
        raise KeyHardError("key not in bucket")

    bytestream = BytesIO()

    for line in iter_lines(bytestream):
        yield line.decode("utf-8")

Note That here blob.download_to_file works a bit like requests.get() in that it automatically notices the Content-Encoding metadata and does the gunzip on the fly.


It's not fair to compare them on style because I think boto3 came out of boto which probably started back in the day when Google was just web search and web emails.

I wanted to include a section about how to unit test against these. Especially how to mock them. But what I had for a draft was getting ugly. Yes, it works for the testing needs I have in my app but it's very personal taste (aka. appropriate for the context) and admittedly quite messy.

Fancy linkifying of text with Bleach and domain checks (with Python)

10 October 2018 2 comments   Python, Web development

Bleach is awesome. Thank you for it @willkg! It's a Python library for sanitizing text as well as "linkifying" text for HTML use. For example, consider this:

>>> import bleach
>>> bleach.linkify("Here is some text with a")
'Here is some text with a <a href="" rel="nofollow"></a>.'

Note that sanitizing is separate thing, but if you're curious, consider this example:

>>> bleach.linkify(bleach.clean("Here is <script> some text with a"))
'Here is &lt;script&gt; some text with a <a href="" rel="nofollow"></a>.'

With that output you can confidently template interpolate that string straight into your HTML.

Getting fancy

That's a great start but I wanted a more. For one, I don't always want the rel="nofollow" attribute on all links. In particular for links that are within the site. Secondly, a lot of things look like a domain but isn't. For example This is a the start which would naively become...:

>>> bleach.linkify("This is a the start")
'This is a <a href="" rel="nofollow"></a> the start'

...because looks like a domain.

So here is how I use it here on to linkify blog comments:

def custom_nofollow_maker(attrs, new=False):
    href_key = (None, u"href")

    if href_key not in attrs:
        return attrs

    if attrs[href_key].startswith(u"mailto:"):
        return attrs

    p = urlparse(attrs[href_key])
    if p.netloc not in settings.NOFOLLOW_EXCEPTIONS:
        # Before we add the `rel="nofollow"` let's first check that this is a
        # valid domain at all.
        root_url = p.scheme + "://" + p.netloc
            response = requests.head(root_url)
            if response.status_code == 301:
                redirect_p = urlparse(response.headers["location"])
                # If the only difference is that it redirects to https instead
                # of http, then amend the href.
                if (
                    redirect_p.scheme == "https"
                    and p.scheme == "http"
                    and p.netloc == redirect_p.netloc
                    attrs[href_key] = attrs[href_key].replace("http://", "https://")

        except ConnectionError:
            return None

        rel_key = (None, u"rel")
        rel_values = [val for val in attrs.get(rel_key, "").split(" ") if val]
        if "nofollow" not in [rel_val.lower() for rel_val in rel_values]:
        attrs[rel_key] = " ".join(rel_values)

    return attrs

html = bleach.linkify(text, callbacks=[custom_nofollow_maker])

This basically taking the default nofollow callback and extending it a bit.

By the way, here is the complete code I use for sanitizing and linkifying blog comments here on this site: render_comment_text.


This is slow because it requires network IO every time a piece of text needs to be linkified (if it has domain looking things in it) but that's best alleviated by only doing it once and either caching it or persistently storing the cleaned and rendered output.

Also, the check uses try: requests.head() except requests.exceptions.ConnectionError: as the method to see if the domain works. I considered doing a whois lookup or something but that felt a little wrong because just because a domain exists doesn't mean there's a website there. Either way, it could be that the domain/URL is perfectly fine but in that very unlucky instant you checked your own server's internet or some other DNS lookup thing is busted. Perhaps wrapping it in a retry and doing try: requests.head() except requests.exceptions.RetryError: instead.

Lastly, the business logic I chose was to rewrite all http:// to https:// only if the URL http://domain does a 301 redirect to https://domain. So if the original link was it leaves it as is. Perhaps a fancier version would be to look at the domain name ending. For example HEAD 301 redirects to so you could use the fact that "".endswith("").

UPDATE Oct 10 2018

Moments after publishing this, I discovered a bug where it would fail badly if the text contained a URL with an ampersand in it. Turns out, it was a known bug in Bleach. It only happens when you try to pass a filter to the bleach.Cleaner() class.

So I simplified my code and now things work. Apparently, using bleach.Cleaner(filters=[...]) is faster so I'm losing that. But, for now, that's OK in my context.

Also, in another later fix, I improved the function some more by avoiding non-HTTP links (with the exception of mailto: and tel:). Otherwise it would attempt to run requests.head('ssh://') which doesn't make sense.

The ideal number of workers in Jest

08 October 2018 0 comments   Python, ReactJS

tl;dr; Use --runInBand when running jest in CI and use --maxWorkers=3 on your laptop.

Running out of memory on CircleCI

We have a test suite that covers 236 tests across 68 suites and runs mainly a bunch of enzyme rendering of React component but also some plain old JavaScript function tests. We hit a problem where tests utterly failed in CircleCI due to running out of memory. Several individual tests, before it gave up or failed, reported to take up to 45 seconds.
Turns out, jest tried to use 36 workers because the Docker OS it was running was reporting 36 CPUs.

> circleci@9e4c489cf76b:~/repo$ node
> var os = require('os')
> os.cpus().length

After forcibly setting --maxWorkers=2 to the jest command, the tests passed and it took 20 seconds. Yay!

But that got me thinking, what is the ideal number of workers when I'm running the suite here on my laptop? To find out, I wrote a Python script that would wrap the call CI=true yarn run test --maxWorkers=%(WORKERS) repeatedly and report which number is ideal for my laptop.

After leaving it running for a while it spits out this result:

3 8.47s
4 8.59s
6 9.12s
5 9.18s
2 9.51s
7 10.14s
8 10.59s
1 13.80s

The conclusion is vague. There is some benefit to using some small number greater than 1. If you attempt a bigger number it might backfire and take longer than necessary and if you do do that your laptop is likely to crawl and cough.

Notes and conclusions

Inline scripts in create-react-app 2.0 and CSP hashes

05 October 2018 0 comments   Web development, Javascript, ReactJS


My understanding of how to generate the CSP nonces was wrong. What I initially posted was a confusion between nonces and hashes. Sorry. The blog post has been updated to use hashing.


Shortly after publishing this I changed my mind entirely. I decided I don't want any inline scripts no matter how small. Reasons are: 1) with HTTP2 it's cheap to send another file and thus that critical precious first HTML document becomes smaller and 2) when you load it as an external you have the power to load it async if it's applicable.

Check out this new script, it's hackish but works: uninline_scripts.js


I have an app that is hosted on github-pages and because I can't control Content Security Policy HTTP headers I have to do it with a <meta http-equiv="Content-Security-Policy" content="${csp}"> tag in the HTML. That's working fine and the way I do it is that I have a script that looks like this:

#!/usr/bin/env node
const fs = require("fs");
const crypto = require("crypto");

const CSP_TEMPLATE = `
default-src 'none';
connect-src 'self';
img-src 'self' https://*;
script-src 'self'%SCRIPT_HASHES%;
style-src 'self' 'unsafe-inline';
font-src 'self' data:;
manifest-src 'self'

const htmlFile = process.argv[2];
if (!htmlFile) throw new Error("missing file argument");
let html = fs.readFileSync(htmlFile, "utf8");

let hashes = "";
let csp = CSP_TEMPLATE;
const matches = html.match(/<script>.*<\/script>/g);
if (matches) {
  matches.forEach(scriptTag => {
    const hash = crypto.createHash("sha256");
    hash.update(scriptTag.replace(/<script>/, "").replace("</script>", ""));
    const digest = hash.digest("hex");
    hashes += ` 'sha256-${digest.toString("base64")}'`;
csp = csp.replace(/%SCRIPT_HASHES%/, hashes);

const metatag = `
  <meta http-equiv="Content-Security-Policy" content="${csp}">
  .replace(/\n/g, "")
if ( > -1)
  throw new Error("already has CSP metatag in HTML");
const anchor = '<meta charset="utf-8">';
const newHtml = html.replace(anchor, `${anchor}${metatag}`);
fs.writeFileSync(htmlFile, newHtml, "utf8");

Laugh all you like at my hurried node scripting but it works. It finds any <script>ANYTHING</script> tags (which means it disregards any <script src="... tags), calculates a sha256 hash string out of it and then puts that into the CSP block.

The output becomes something like this:

<!DOCTYPE html>
<html lang="en">
    <meta charset="utf-8">
      content="default-src 'none';script-src 'self' 'sha256-bb84aa7f904e73495b9e99f08531053f3a86f3c1b2e232e3abbac252bf723f1f';">

I don't know if I've done it right but at least what didn't use to work now works; the page loads in my browsers now.

An awesome snippet to web performance test a page programmatically

01 October 2018 0 comments   Web development, Javascript, Web Performance

I found this in an issue discussing measuring page performance with puppeteer and it's pure gold. Especially because it's so accessible and easy to use.

Here's the code:

const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('');

  console.log('\n==== performance.getEntries() ====\n');
    await page.evaluate(() =>
      JSON.stringify(performance.getEntries(), null, '  ')

  console.log('\n==== performance.toJSON() ====\n');
    await page.evaluate(() => JSON.stringify(performance.toJSON(), null, '  '))

  console.log('\n==== page.metrics() ====\n');
  const perf = await page.metrics();
  console.log(JSON.stringify(perf, null, '  '));



Network waterfall Google Chrome

When you run it you get this output:
To run it you need to have a decently up-to-date version of puppeteer installed.

I don't claim (far from it actually!) to understand all the metrics points in there but I believe this is basically what the Network panel in the Google Chrome Dev tools is built upon. But some details and facts are easy to figure out and use in your analysis. For example, the fact that the getEntries() lists all the resources that had to be downloaded in the order they were downloaded. Also, at the end of getEntries() you get the first-paint which is often a useful metric.

Anyway, give it a spin. Wrap this up in a platform and see if you can build something really simple and really tailored to your web projects web performance testing.

Comparing KeyCDN and DigitalOcean's new Spaces CDN

28 September 2018 0 comments   Web development

The News

This week DigitalOcean added a free CDN option for retrieving public files on their Spaces product. If you haven't heard about it, Spaces is like AWS S3 but Digital Ocean instead. You use the same tools as you use for S3, like s3cmd. It's super easy to use in the admin control panel and you can do all sorts of neat and nifty things via the web. And the documentation about how to use setup and use s3cmd is very good.

If we just focus on the CDN functionality, it's just a URL to a distributed resource that can be reached and retrieved faster because the server you get it from is hopefully geographically as near to you as possible. That's also what AWS CloudFront, Akamai CDN, and KeyCDN do. However, what you can do with the likes of KeyCDN is that you can have what's called a "pull zone". You basically host the files on your regular (aka. "origin server") and the CDN will automatically pick it up from there.

With DigitalOcean Spaces CDN...

1) GET
2) CDN doesn't have it because it's never been requested before
3) CDN does GET and serves it
4) If it didn't exist on the client gets a 404 Not Found.

With KeyCDN CDN...

1) GET
2) CDN doesn't have it because it's never been requested before
3) CDN does GET and serves it
4) If it didn't exist the client gets a 404 Not Found

The critical differences is that with DigitalOcean Spaces CDN, it won't go and check your web server. On your web server it's not enough to have the files on disk. You have to upload them to the Spaces.

That's often not a problem because disks are not something you want to rely on anyway. It's better to store all files to something like Spaces or S3 and then feel free to just throw away the web server and recreate it. However, it's an important distinction.

On Performance

It's hard to measure these things but when shopping around for a CDN provider/solution you want to make sure you get one that is fast and reliable. There are lots of sites that compare CDNs, like but I often find that they either don't have the CDN I could chose or there's something else weird about the comparison.

So I set up a test using I created a URL that uses KeyCDN and a URL that uses DigitalOcean Spaces CDN. It's the same 32KB image JPEG. Then, I created two monitors on Hyperping, both from San Francisco, New York, London, Frankfurt, Mumbai, São Paulo and Sydney.

Now they've been doing GET requests to these respective URLs for about 24h and the results so far are as follows:

Side by side

I don't know how much response times are "skewed" because of the long tail of response times for Mumbai and Sydney.
But if you take an average of the times for San Francisco, New York, London and Frankfurt you get:

KeyCDN on Hyperping

DigitalOcean Spaces on Hyperping

In Conclusion

Being able to offload all the files from disk and put them somewhere safe is an important feature. In my side project Song Search I actually, currently, host about 7GB of images (~500k files), directly on disk and it's making me nervous. I need to move them off to something like Spaces. But it's a cumbersome to have to make sure every single file generated on disk is correctly synced. Especially if post-processing is happening with the file using the local file system. (This project runs mozjpeg and guetzli on every time).

Either way, this is a non-trivial dev-ops topic with many angles and opinions. I just thought I'd share about my quick research and performance testing.

And last but not least, the difference between 20ms and 30ms isn't important if 90+% of your visitors are from US. All of these considerations depend on your context.

UPDATE - Oct 1, 2018

I'll continue to take the risk of looking like an idiot who doesn't understand networks and the science of data analysis.

I wrote a Python script which downloads random images from each CDN URL. You leave it running for a long time and hopefully the median will start to even out and be less affected by your own network saturation.

I ran it on my laptop; from South Carolina, USA:

DOMAIN                                             MEDIAN     MEAN         0.140s     0.335s                          0.165s     0.250s
2,705 iterations.

And my colleague Mathieu ran it; from Barcelona, Spain:

DOMAIN                                             MEDIAN     MEAN         1.027s     1.168s                          0.574s     0.648s    
134 iterations.

And my colleague Ethan ran it; from New York City, USA:

DOMAIN                                             MEDIAN     MEAN         0.335s     0.595s                          0.066s     0.076s
122 iterations.

I think that if you're in Barcelona you're so far away from the nearest edge location that the latency is "king of the difference". Mathieu found the KeyCDN median download times to be almost twices as good!

If you're in New York, where I suspect DigitalOcean has an edge location and I know KeyCDN has one the latency probably matters less and what matters more is the speed of the CDN web servers. In Ethan's case (only 122 measurement points) the median for KeyCDN is almost five times better! Not sure what that means but it definitely raises thoughts about how this actually matters a lot.

Also, if you're in Barcelona you would definitely want to download the little JPEGs half a second faster if you could.

Merge two arrays without duplicates in JavaScript

20 September 2018 0 comments   Javascript

Here's how you do it if you don't care about the order:

const array1 = [1, 2, 3];
const array2 = [2, 3, 4];
console.log([ Set([...array1, ...array2])]);
// prints [1, 2, 3, 4]

It merges two arrays first. Then it creates a set out of that merged array and lastly convers the set back out to an array.

I searched for a solution and all I found was dated or wrong. This oneliner works and I'm using it to make it possible to add a list of product versions to another list and I don't want to mutate existing arrays because of React state stuff.

If you want to see the ES5 version, check out this Babel repl.

A darn good search filter function in JavaScript

12 September 2018 0 comments   Web development, Javascript

Demo here. The demo uses React and a list of blog post titles that get immediately filtered when you type in a search. I.e. you have the whole list but show less when a search term is entered.

That the demo uses React isn't important. What's important is the search function. It looks like this:

function filterList
(q, list) {
  function escapeRegExp(s) {
    return s.replace(/[-/\\^$*+?.()|[\]{}]/g, "\\$&");
  const words = q
    .map(s => s.trim())
    .filter(s => !!s);
  const hasTrailingSpace = q.endsWith(" ");
  const searchRegex = new RegExp(
      .map((word, i) => {
        if (i + 1 === words.length && !hasTrailingSpace) {
          // The last word - ok with the word being "startswith"-like
          return `(?=.*\\b${escapeRegExp(word)})`;
        } else {
          // Not the last word - expect the whole word exactly
          return `(?=.*\\b${escapeRegExp(word)}\\b)`;
      .join("") + ".+",
  return list.filter(item => {
    return searchRegex.test(item.title);

In action
I use this in a single-page content management app. There's a list of records and a search input. Every character you put into the search bar updates the list of records shown.

What it does is that it allows you to search texts based on multiple whole words. But the key feature is that the last word doesn't have to be whole. For example, it will positively match "This is a blog post about JavaScript" if the search is "post javascript" or "post javasc". But it won't match on "pos blog".

The idea is that if a user has typed in a full word followed by a space, all previous words needs to be matched fully. For example if the input is "java " it won't match on "This is a blog post about JavaScript" because the word java, alone, isn't in the search text.

Sure, there are different ways to write this but I think this functionality is good for this kind of filtering search. A different implementation would have a function that returns the regex and then it can be used both for filtering and for highlighting.

Hope it helps.

Replace an item in an array, by number, without mutation in JavaScript (ES6)

23 August 2018 0 comments   Javascript

Suppose you have an array like this:

const items = 
["B", "M", "X"];

And now you want to replace that second item ("J" instead of "M") and suppose that you already know it's position as opposed to finding its position by doing an Array.prototype.find.

Here's how you do it:

const index = 1;
const replacementItem = "J";

const newArray = Object.assign([], items, {[index]: replacementItem});

console.log(items); // ["B", "M", "X"]
console.log(newArray); //  ["B", "J", "X"]

Wasn't immediately obvious to me but writing it down will help me remember.


There's a much faster way and that's to use slice and it actually looks nicer too:

function replaceAt(array, index, value) {
  const ret = array.slice(0);
  ret[index] = value;
  return ret;
const newArray = replaceAt(items, index, "J");

See this codepen.

django-pipeline and Zopfli

15 August 2018 0 comments   Python, Web development, Django

tl;dr; I wrote my own extension to django-pipeline that uses Zopfli to create .gz files from static assets collected in Django. Here's the code.

Nginx and Gzip

What I wanted was to continue to use django-pipeline which does a great job of reading a settings.BUNDLES setting and generating things like /static/js/myapp.min.a206ec6bd8c7.js. It has configurable options to not just make those files but also generate /static/js/myapp.min.a206ec6bd8c7.js.gz which means that with gzip_static in Nginx, Nginx doesn't have to Gzip compress static files on-the-fly but can basically just read it from disk. Nginx doesn't care how the file got there but an immediate advantage of preparing the file on disk is that the compression can be higher (smaller .gz files). That means smaller responses to be sent to the client and less CPU work needed from Nginx. Your job is to set gzip_static on; in your Nginx config (per location) and make sure every compressable file exists on disk with the same name but with the .gz suffix.

In other words, when the client does GET Nginx quickly does a read on the file system to see if there exists a ROOT/static/foo.js.gz and if so, return that. If the files doesn't exist, and you have gzip on; in your config, Nginx will read the ROOT/static/foo.js into memory, compress it (usually with a lower compression level) and return that. Nginx takes care of figuring out whether to do this, at all, dynamically by reading the Accept-Encoding header from the request.


The best solution today to generate these .gz files is Zopfli. Zopfli is slower than good old regular gzip but the files get smaller. To manually compress a file you can install the zopfli executable (e.g. brew install zopfli or apt install zopfli) and then run zopfli $ROOT/static/foo.js which creates a $ROOT/static/foo.js.gz file.

So your task is to build some pipelining code that generates .gz version of every static file your Django server creates.
At first I tried django-static-compress which has an extension to regular Django staticfiles storage. The default staticfiles storage is and that's what django-static-compress extends.

But I wanted more. I wanted all the good bits from django-pipeline (minification, hashes in filenames, concatenation, etc.) Also, in django-static-compress you can't control the parameters to zopfli such as the number of iterations. And with django-static-compress you have to install Brotli which I can't use because I don't want to compile my own Nginx.


So I wrote my own little mashup. I took some ideas from how django-pipeline does regular gzip compression as a post-process step. And in my case, I never want to bother with any of the other files that are put into the settings.STATIC_ROOT directory from the collectstatic command.

Here's my implementation: Check it out. It's very tailored to my personal preferences and usecase but it works great. To use it, I have this in my STATICFILES_STORAGE = ""

I know what you're thinking

Why not try to get this into django-pipeline or into django-compress-static. The answer is frankly laziness. Hopefully someone else can pick up this task. I have fewer and fewer projects where I use Django to handle static files. These days most of my projects are single-page-apps that are 100% static and using Django for XHR requests to get the data.