Kwissle

My real-time quiz battle game Kwissle.com

Crosstips.org

My fun Crossword solver project. Crosstips.org & Krysstips.se

Kung Fu

Fujian White Crane Kung Fu

Photos

Photoalbum, both old and new.

Twitter

Follow me on Twitter

Contact me

My contact details and how to contact me.

 

KungFuPeople.com
Do you train Kung Fu?
Or know someone who does?
Then check out KungFuPeople.com


Mobile version of this page Mobile version of this page


 
Django

Hosting Django static images with Amazon Cloudfront (CDN) using django-static


9th of July 2010

About a month ago I add a new feature to django-static that makes it possible to define a function that all files of django-static goes through.

First of all a quick recap. django-static is a Django plugin that you use from your templates to reference static media. django-static takes care of giving the file the optimum name for static serving and if applicable compresses the file by trimming all whitespace and what not. For more info, see The awesomest way possible to serve your static stuff in Django with Nginx

The new, popular, kid on the block for CDN (Content Delivery Network) is Amazon Cloudfront. It's a service sitting on top of the already proven Amazon S3 service which is a cloud file storage solution. What a CDN does is that it registers a domain for your resources such that with some DNS tricks, users of this resource URL download it from the geographically nearest server. So if you live in Sweden you might download myholiday.jpg from a server in Frankfurk and if you live in North Carolina, USA you might download the very same picture from Virgina, USA. That assures the that the distance to the resource is minimized. If you're not convinced or sure about how CDNs work check out THE best practice guide for faster webpages by Steve Sounders (it's number two)

A disadvantage with Amazon Cloudfront is that it's unable to negotiate with the client to compress downlodable resources with GZIP. GZIPping a resource is considered a bigger optimization win than using CDN. So, I continue to serve my static CSS and Javascript files from my Nginx but put all the images on Amazon Cloudfront. How to do this with django-static? Easy: add this to your settings:

 DJANGO_STATIC = True
 ...other DJANGO_STATIC_... settings...
 # equivalent of 'from cloudfront import file_proxy' in this PYTHONPATH
 DJANGO_STATIC_FILE_PROXY = 'cloudfront.file_proxy'

Then you need to write that function that get's a chance to do something with every static resource that django-static prepares. Here's a naive first version:

 # in cloudfront.py

 conversion_map = {} # global variable
 def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
     if filepath and (new or changed):
         if filepath.lower().split('.')[-1] in ('jpg','gif','png'):
             conversion_map[uri] = _upload_to_cloudfront(filepath)
     return conversion_map.get(uri, uri)

The files are only sent through the function _upload_to_cloudfront() the first time they're "massaged" by django-static. On consecutive calls nothing is done to the file since django-static remembers, and sticks to, the way it dealt with it the first time if you see what I mean. Basically, when you have restarted your Django server the file is prepared and checked for a timestamp but the second time the template is rendered to save time it doesn't check the file again and just passes through the resulting file name. If this is all confusing you can start with a much simpler proxy function that looks like this:

 def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
     print "Debugging and learning"
     print uri
     print "New", new,
     print "Filepath", filepath,
     print "Changed", changed,
     print "Other arguments:", kwargs
     return uri

The function to upload to Amazon Cloudfront is pretty straight forward thanks to the boto project. Here's my version:

 import re
 from django.conf import settings
 import boto

 _cf_connection = None
 _cf_distribution = None

 def _upload_to_cloudfront(filepath):
    global _cf_connection
    global _cf_distribution

    if _cf_connection is None:
        _cf_connection = boto.connect_cloudfront(settings.AWS_ACCESS_KEY,
                                                 settings.AWS_ACCESS_SECRET)

    if _cf_distribution is None:
        _cf_distribution = _cf_connection.create_distribution(
            origin='%s.s3.amazonaws.com' % settings.AWS_STORAGE_BUCKET_NAME,
            enabled=True,
            comment=settings.AWS_CLOUDFRONT_DISTRIBUTION_COMMENT)

    # now we can delete any old versions of the same file that have the
    # same name but a different timestamp
    basename = os.path.basename(filepath)
    object_regex = re.compile('%s\.(\d+)\.%s' % \
        (re.escape('.'.join(basename.split('.')[:-2])),
         re.escape(basename.split('.')[-1])))
    for obj in _cf_distribution.get_objects():
        match = object_regex.findall(obj.name)
        if match:
            old_timestamp = int(match[0])
            new_timestamp = int(object_regex.findall(basename)[0])
            if new_timestamp == old_timestamp:
                # an exact copy already exists
                return obj.url()
            elif new_timestamp > old_timestamp:
                # we've come across the same file but with an older timestamp
                #print "DELETE!", obj_.name
                obj.delete()
                break

    # Still here? That means that the file wasn't already in the distribution

    fp = open(filepath)

    # Because the name will always contain a timestamp we set faaar future
    # caching headers. Doesn't matter exactly as long as it's really far future.
    headers = {'Cache-Control':'max-age=315360000, public',
               'Expires': 'Thu, 31 Dec 2037 23:55:55 GMT',
               }

    #print "\t\t\tAWS upload(%s)" % basename
    obj = _cf_distribution.add_object(basename, fp, headers=headers)
    return obj.url()

Moving on, unfortunately this isn't good enough. You see, from the time you have issued an upload to Amazon Cloudfront you immediately get a full URL for the resource but if it's a new distribution it will take a little while until the DNS propagates and becomes globally available. Therefore, the URL that you get back will most likely yield you a 404 Page not found if you try it immediately.

So to solve this problem I wrote a simple alternative to the Python dict() type that works roughly the same except that myinstance.get(key) will depend on time. 1 hour in this case. So it works something like this:

 >>> slow_map = SlowMap(10)
 >>> slow_map['key'] = "Value"
 >>> print slow_map['key']
 None
 >>> from time import sleep
 >>> sleep(10)
 >>> print slow_map['key']
 "Value"

And here's the code for that:

 from time import time

 class SlowMap(object):
    """
    >>> slow_map = SlowMap(60)
    >>> slow_map[key] = value
    >>> print slow_map.get(key)
    None

    Then 60 seconds goes past:
    >>> slow_map.get(key)
    value

    """

    def __init__(self, timeout_seconds):
        self.timeout = timeout_seconds

        self.guard = dict()
        self.data = dict()

    def get(self, key, default=None):
        value = self.data.get(key)
        if value is not None:
            return value

        value, expires = self.guard.get(key)

        if expires < time():
            # good to release
            self.data[key] = value
            del self.guard[key]
            return value
        else:
            # held back
            return default

    def __setitem__(self, key, value):
        self.guard[key] = (value, time() + self.timeout)

With all of that ready willing and able you should now be able to serve your images from Amazon Cloudfront simply by doing this in your Django templates:

 {% staticfile "/img/mysprite.gif" %}

To test this I've deployed this technique on my money making site code guinea pig Crosstips. Go ahead, visit that site and use Firebug or view the source and check out the URLs used for the images. They look something like this: http://dpv9al5z7o7rq.cloudfront.net/ctw-screenshot.1242930552.png

If you want to look at my code used for Crosstips download this file. It's pretty generic to anybody who wants to achieve the same thing.

Have fun and happy CDN'ing!

Hosting Django static images with Amazon Cloudfront (CDN) using django-static Here's a screenshot of the wonderful Amazon AWS Console



Comment

David De Sousa - 9th July 2010  [«« Reply to this]
is there a way to upload all the FileFields and the ImageFields to the Amazon Cloudfront using this app? that's one thing I need to do in a near future
Peter Bengtsson - 9th July 2010   [«« Reply to this]
There's an app called django-storage which I've used in another project to upload FileFields to Amazon S3. If it doesn't have Cloudfront support yet, that package would be the best place to start.
David De Sousa - 9th July 2010   [«« Reply to this]
thanks, I'll look up to it.
Steve Schwarz - 21st July 2010  [«« Reply to this]
Nice post and explanation. I just came across django-queued-storage that has a slightly different approach to storing data on Cloudfront: http://github.com/seanbrant/django-queued-storage

Not sure if the celery scheduled task responsible for pushing to Cloudfront provides all the functionality of your SlowMap though.
 
Name:
Email:
hide my email address.

Your email address will be encoded to prevent email-extraction spiders from reading it so you won't get spammed if you decide to show your email address.