Hosting Django static images with Amazon Cloudfront (CDN) using django-static

Friday, Jul 9, 2010
4 comments Django

About a month ago I add a new feature to django-static that makes it possible to define a function that all files of django-static goes through.

First of all a quick recap. django-static is a Django plugin that you use from your templates to reference static media. django-static takes care of giving the file the optimum name for static serving and if applicable compresses the file by trimming all whitespace and what not. For more info, see The awesomest way possible to serve your static stuff in Django with Nginx

The new, popular, kid on the block for CDN (Content Delivery Network) is Amazon Cloudfront. It's a service sitting on top of the already proven Amazon S3 service which is a cloud file storage solution. What a CDN does is that it registers a domain for your resources such that with some DNS tricks, users of this resource URL download it from the geographically nearest server. So if you live in Sweden you might download myholiday.jpg from a server in Frankfurk and if you live in North Carolina, USA you might download the very same picture from Virgina, USA. That assures the that the distance to the resource is minimized. If you're not convinced or sure about how CDNs work check out THE best practice guide for faster webpages by Steve Sounders (it's number two)

A disadvantage with Amazon Cloudfront is that it's unable to negotiate with the client to compress downlodable resources with GZIP. GZIPping a resource is considered a bigger optimization win than using CDN. So, I continue to serve my static CSS and Javascript files from my Nginx but put all the images on Amazon Cloudfront. How to do this with django-static? Easy: add this to your settings:


DJANGO_STATIC = True
...other DJANGO_STATIC_... settings...
# equivalent of 'from cloudfront import file_proxy' in this PYTHONPATH
DJANGO_STATIC_FILE_PROXY = 'cloudfront.file_proxy'

Then you need to write that function that get's a chance to do something with every static resource that django-static prepares. Here's a naive first version:


# in cloudfront.py

conversion_map = {} # global variable
def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
    if filepath and (new or changed):
        if filepath.lower().split('.')[-1] in ('jpg','gif','png'):
            conversion_map[uri] = _upload_to_cloudfront(filepath)
    return conversion_map.get(uri, uri)

The files are only sent through the function _upload_to_cloudfront() the first time they're "massaged" by django-static. On consecutive calls nothing is done to the file since django-static remembers, and sticks to, the way it dealt with it the first time if you see what I mean. Basically, when you have restarted your Django server the file is prepared and checked for a timestamp but the second time the template is rendered to save time it doesn't check the file again and just passes through the resulting file name. If this is all confusing you can start with a much simpler proxy function that looks like this:


def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
    print "Debugging and learning"
    print uri
    print "New", new,
    print "Filepath", filepath,
    print "Changed", changed,
    print "Other arguments:", kwargs
    return uri

The function to upload to Amazon Cloudfront is pretty straight forward thanks to the boto project. Here's my version:


import re
from django.conf import settings
import boto

_cf_connection = None
_cf_distribution = None

def _upload_to_cloudfront(filepath):
   global _cf_connection
   global _cf_distribution

   if _cf_connection is None:
       _cf_connection = boto.connect_cloudfront(settings.AWS_ACCESS_KEY,
                                                settings.AWS_ACCESS_SECRET)

   if _cf_distribution is None:
       _cf_distribution = _cf_connection.create_distribution(
           origin='%s.s3.amazonaws.com' % settings.AWS_STORAGE_BUCKET_NAME,
           enabled=True,
           comment=settings.AWS_CLOUDFRONT_DISTRIBUTION_COMMENT)

   # now we can delete any old versions of the same file that have the
   # same name but a different timestamp
   basename = os.path.basename(filepath)
   object_regex = re.compile('%s\.(\d+)\.%s' % \
       (re.escape('.'.join(basename.split('.')[:-2])),
        re.escape(basename.split('.')[-1])))
   for obj in _cf_distribution.get_objects():
       match = object_regex.findall(obj.name)
       if match:
           old_timestamp = int(match[0])
           new_timestamp = int(object_regex.findall(basename)[0])
           if new_timestamp == old_timestamp:
               # an exact copy already exists
               return obj.url()
           elif new_timestamp > old_timestamp:
               # we've come across the same file but with an older timestamp
               #print "DELETE!", obj_.name
               obj.delete()
               break

   # Still here? That means that the file wasn't already in the distribution

   fp = open(filepath)

   # Because the name will always contain a timestamp we set faaar future
   # caching headers. Doesn't matter exactly as long as it's really far future.
   headers = {'Cache-Control':'max-age=315360000, public',
              'Expires': 'Thu, 31 Dec 2037 23:55:55 GMT',
              }

   #print "\t\t\tAWS upload(%s)" % basename
   obj = _cf_distribution.add_object(basename, fp, headers=headers)
   return obj.url()

Moving on, unfortunately this isn't good enough. You see, from the time you have issued an upload to Amazon Cloudfront you immediately get a full URL for the resource but if it's a new distribution it will take a little while until the DNS propagates and becomes globally available. Therefore, the URL that you get back will most likely yield you a 404 Page not found if you try it immediately.

So to solve this problem I wrote a simple alternative to the Python dict() type that works roughly the same except that myinstance.get(key) will depend on time. 1 hour in this case. So it works something like this:


>>> slow_map = SlowMap(10)
>>> slow_map['key'] = "Value"
>>> print slow_map['key']
None
>>> from time import sleep
>>> sleep(10)
>>> print slow_map['key']
"Value"

And here's the code for that:


from time import time

class SlowMap(object):
   """
   >>> slow_map = SlowMap(60)
   >>> slow_map[key] = value
   >>> print slow_map.get(key)
   None

   Then 60 seconds goes past:
   >>> slow_map.get(key)
   value

   """
   def __init__(self, timeout_seconds):
       self.timeout = timeout_seconds

       self.guard = dict()
       self.data = dict()

   def get(self, key, default=None):
       value = self.data.get(key)
       if value is not None:
           return value

       value, expires = self.guard.get(key)

       if expires < time():
           # good to release
           self.data[key] = value
           del self.guard[key]
           return value
       else:
           # held back
           return default

   def __setitem__(self, key, value):
       self.guard[key] = (value, time() + self.timeout)

With all of that ready willing and able you should now be able to serve your images from Amazon Cloudfront simply by doing this in your Django templates:


{% staticfile "/img/mysprite.gif" %}

To test this I've deployed this technique on my ~~money making site~~ code guinea pig Crosstips. Go ahead, visit that site and use Firebug or view the source and check out the URLs used for the images. They look something like this: http://dpv9al5z7o7rq.cloudfront.net/ctw-screenshot.1242930552.png

If you want to look at my code used for Crosstips download this file. It's pretty generic to anybody who wants to achieve the same thing.

Have fun and happy CDN'ing!

Here's a screenshot of the wonderful Amazon AWS Console

Comments

David De Sousa July 9, 2010

is there a way to upload all the FileFields and the ImageFields to the Amazon Cloudfront using this app? that's one thing I need to do in a near future

Peter Bengtsson July 9, 2010

There's an app called django-storage which I've used in another project to upload FileFields to Amazon S3. If it doesn't have Cloudfront support yet, that package would be the best place to start.

David De Sousa July 9, 2010

thanks, I'll look up to it.

Steve Schwarz July 21, 2010

Nice post and explanation. I just came across django-queued-storage that has a slightly different approach to storing data on Cloudfront: http://github.com/seanbrant/django-queued-storage

Not sure if the celery scheduled task responsible for pushing to Cloudfront provides all the functionality of your SlowMap though.

Previous:: People's reactions to Gates and Buffet's $600 billion challenge June 17, 2010 Politics
Next:: Where I live August 16, 2010

Related by category:: Combining Django signals with in-memory LRU cache August 9, 2025 Django; Native connection pooling in Django 5 with PostgreSQL June 25, 2025 Django; How to avoid a count query in Django if you can February 14, 2024 Django; How to have default/initial values in a Django form that is bound and rendered January 10, 2020 Django

Related by keyword:: Fastest way to find out if a file exists in S3 (with boto3) June 16, 2017 Python, Web development; How much faster is Redis at storing a blob of JSON compared to PostgreSQL? September 28, 2019 Python, PostgreSQL, Redis; Fastest way to download a file from S3 March 29, 2017 Python; How I simulate a CDN with Nginx May 15, 2019 Python, Nginx

Hosting Django static images with Amazon Cloudfront (CDN) using django-static

Comments

Related posts