Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Django. Clear filter

Hosting Django static images with Amazon Cloudfront (CDN) using django-static

09 July 2010 4 comments   Django


About a month ago I add a new feature to django-static that makes it possible to define a function that all files of django-static goes through.

First of all a quick recap. django-static is a Django plugin that you use from your templates to reference static media. django-static takes care of giving the file the optimum name for static serving and if applicable compresses the file by trimming all whitespace and what not. For more info, see The awesomest way possible to serve your static stuff in Django with Nginx

The new, popular, kid on the block for CDN (Content Delivery Network) is Amazon Cloudfront. It's a service sitting on top of the already proven Amazon S3 service which is a cloud file storage solution. What a CDN does is that it registers a domain for your resources such that with some DNS tricks, users of this resource URL download it from the geographically nearest server. So if you live in Sweden you might download myholiday.jpg from a server in Frankfurk and if you live in North Carolina, USA you might download the very same picture from Virgina, USA. That assures the that the distance to the resource is minimized. If you're not convinced or sure about how CDNs work check out THE best practice guide for faster webpages by Steve Sounders (it's number two)

A disadvantage with Amazon Cloudfront is that it's unable to negotiate with the client to compress downlodable resources with GZIP. GZIPping a resource is considered a bigger optimization win than using CDN. So, I continue to serve my static CSS and Javascript files from my Nginx but put all the images on Amazon Cloudfront. How to do this with django-static? Easy: add this to your settings:

DJANGO_STATIC = True
...other DJANGO_STATIC_... settings...
# equivalent of 'from cloudfront import file_proxy' in this PYTHONPATH
DJANGO_STATIC_FILE_PROXY = 'cloudfront.file_proxy'

Then you need to write that function that get's a chance to do something with every static resource that django-static prepares. Here's a naive first version:

# in cloudfront.py

conversion_map = {} # global variable
def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
    if filepath and (new or changed):
        if filepath.lower().split('.')[-1] in ('jpg','gif','png'):
            conversion_map[uri] = _upload_to_cloudfront(filepath)
    return conversion_map.get(uri, uri)

The files are only sent through the function _upload_to_cloudfront() the first time they're "massaged" by django-static. On consecutive calls nothing is done to the file since django-static remembers, and sticks to, the way it dealt with it the first time if you see what I mean. Basically, when you have restarted your Django server the file is prepared and checked for a timestamp but the second time the template is rendered to save time it doesn't check the file again and just passes through the resulting file name. If this is all confusing you can start with a much simpler proxy function that looks like this:

def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
    print "Debugging and learning"
    print uri
    print "New", new,
    print "Filepath", filepath,
    print "Changed", changed,
    print "Other arguments:", kwargs
    return uri

The function to upload to Amazon Cloudfront is pretty straight forward thanks to the boto project. Here's my version:

import re
from django.conf import settings
import boto

_cf_connection = None
_cf_distribution = None

def _upload_to_cloudfront(filepath):
   global _cf_connection
   global _cf_distribution

   if _cf_connection is None:
       _cf_connection = boto.connect_cloudfront(settings.AWS_ACCESS_KEY,
                                                settings.AWS_ACCESS_SECRET)

   if _cf_distribution is None:
       _cf_distribution = _cf_connection.create_distribution(
           origin='%s.s3.amazonaws.com' % settings.AWS_STORAGE_BUCKET_NAME,
           enabled=True,
           comment=settings.AWS_CLOUDFRONT_DISTRIBUTION_COMMENT)

   # now we can delete any old versions of the same file that have the
   # same name but a different timestamp
   basename = os.path.basename(filepath)
   object_regex = re.compile('%s\.(\d+)\.%s' % \
       (re.escape('.'.join(basename.split('.')[:-2])),
        re.escape(basename.split('.')[-1])))
   for obj in _cf_distribution.get_objects():
       match = object_regex.findall(obj.name)
       if match:
           old_timestamp = int(match[0])
           new_timestamp = int(object_regex.findall(basename)[0])
           if new_timestamp == old_timestamp:
               # an exact copy already exists
               return obj.url()
           elif new_timestamp > old_timestamp:
               # we've come across the same file but with an older timestamp
               #print "DELETE!", obj_.name
               obj.delete()
               break

   # Still here? That means that the file wasn't already in the distribution

   fp = open(filepath)

   # Because the name will always contain a timestamp we set faaar future
   # caching headers. Doesn't matter exactly as long as it's really far future.
   headers = {'Cache-Control':'max-age=315360000, public',
              'Expires': 'Thu, 31 Dec 2037 23:55:55 GMT',
              }

   #print "\t\t\tAWS upload(%s)" % basename
   obj = _cf_distribution.add_object(basename, fp, headers=headers)
   return obj.url()

Moving on, unfortunately this isn't good enough. You see, from the time you have issued an upload to Amazon Cloudfront you immediately get a full URL for the resource but if it's a new distribution it will take a little while until the DNS propagates and becomes globally available. Therefore, the URL that you get back will most likely yield you a 404 Page not found if you try it immediately.

So to solve this problem I wrote a simple alternative to the Python dict() type that works roughly the same except that myinstance.get(key) will depend on time. 1 hour in this case. So it works something like this:

>>> slow_map = SlowMap(10)
>>> slow_map['key'] = "Value"
>>> print slow_map['key']
None
>>> from time import sleep
>>> sleep(10)
>>> print slow_map['key']
"Value"

And here's the code for that:

from time import time

class SlowMap(object):
   """
   >>> slow_map = SlowMap(60)
   >>> slow_map[key] = value
   >>> print slow_map.get(key)
   None

   Then 60 seconds goes past:
   >>> slow_map.get(key)
   value

   """
   def __init__(self, timeout_seconds):
       self.timeout = timeout_seconds

       self.guard = dict()
       self.data = dict()

   def get(self, key, default=None):
       value = self.data.get(key)
       if value is not None:
           return value

       value, expires = self.guard.get(key)

       if expires < time():
           # good to release
           self.data[key] = value
           del self.guard[key]
           return value
       else:
           # held back
           return default

   def __setitem__(self, key, value):
       self.guard[key] = (value, time() + self.timeout)

With all of that ready willing and able you should now be able to serve your images from Amazon Cloudfront simply by doing this in your Django templates:

{% staticfile "/img/mysprite.gif" %}

To test this I've deployed this technique on my money making site code guinea pig Crosstips. Go ahead, visit that site and use Firebug or view the source and check out the URLs used for the images. They look something like this: http://dpv9al5z7o7rq.cloudfront.net/ctw-screenshot.1242930552.png

If you want to look at my code used for Crosstips download this file. It's pretty generic to anybody who wants to achieve the same thing.

Have fun and happy CDN'ing!

Hosting Django static images with Amazon Cloudfront (CDN) using django-static Here's a screenshot of the wonderful Amazon AWS Console

Correction: running Django tests with MongoDB is NOT slow

30 May 2010 1 comment   Django, MongoDB


At Euro DjangoCon I met lots of people and talked a lot about MongoDB as the backend. I even did a presentation on the subject which led to a lot of people asking me more questions about MongoDB.

I did mention to some people that one of the drawbacks of using MongoDB which doesn't have transactions is that you have to create and destroy the collections (like SQL tables) each time for every single test runs. I thought this was slow. It's not

Today I've been doing some more profiling and testing and debugging and I can conclude that it's not a problem. Creating the database has a slight delay but it's something you only have to do once and actually it's very fast. Here's how I tear down the collections in between each test:

class BaseTest(TestCase):

   def tearDown(self):
       for name in self.database.collection_names():
           if name not in ('system.indexes',):
               self.database.drop_collection(name)

For example, running test of one of my apps looks like this:

$ ./manage.py test myapp
...........lots.............
----------------------------------------------------------------------
Ran 55 tests in 3.024s

So, don't fear writing lots of individual unit tests. MongoDB will not slow you down.

"Using MongoDB in your Django app - implications and benefits"

25 May 2010 7 comments   Django

http://www.peterbe.com/plog/using-mongodb-in-your-django-app/django-mongodb-html5-slides/html5.html


Straight from DjangoCon 2010 here in Berlin. Slides from my talk on "Using MongoDB in your Django app - implications and benefits" are available as a HTML5 web page so you'll need one of those fancy browsers like Chrome to be able to view it. Sorry.

mongoengine vs. django-mongokit

24 May 2010 3 comments   Python, Django


django-mongokit is the project you want to use if you want to connect your Django project to your MongoDB database via the pymongo Python wrapper. An alternative (dare I say competing alternative) is MongoEngine which is bridge between Django and straight to pymongo. The immediate difference you notice is the syntax. django-mongokit looks like MongoKit syntax and MongoEngine looks like Django ORM. They both accomplish pretty much the same thing. So, which one is fastest?

First of all, remember this? where I showed how django-mongokit sped past the SQL ORM like a lightning bullet. Well appears MongoEngine is even faster.

mongoengine vs. django-mongokit

That's an average of 23% faster for all three operations!

Review: Django 1.1 Testing and Debugging

20 May 2010 0 comments   Django

http://www.packtpub.com/django-1-1-testing-and-debugging/book


The lovely people of Packt Publishing asked me to review Karen Tracey's latest book Django 1.1 Testing and Debugging.

I didn't actually read the book but rather skimmed it, apart from some selected parts and from what I read it's obvious that Karen has an ability to write to people who are not experts on the subject. Years of being a top contributor on the Django users mailing list must have something to do with it.

But here's the cracker. I didn't learn anything from this book (actually, I wasn't aware of the pp command in the pdb debugger). Is that a complaint about the book? No! It just means that the book was aimed at beginners and apparently I'm not a beginner any more. Great!

One thing I would have liked to see is more about testing strategy since this is something beginners often have problems with. I don't know if there even is such a word as "testing strategy" but I'm referring to the thinking behind what to test and more importantly sometimes what not to test. Beginners have a tendency to write tests for the most specific things and thus spending all their time assuring the most unrealistic scenarios are covered. Also, a lot of beginner tests I see check basic things like types which the semi-compiler will just automatically cover for you. Perhaps for a beginner, just getting some tests up and running this is a big step forward.

I'm a little bit disappointed that my lovely gorun wasn't mentioned in the book :) Perhaps the next version Karen?

Who was logged in during a Django exception

15 April 2010 5 comments   Django


In lack of a fancier solution here's how I solved a problem of knowing who was logged in when an error occurred. I'm building a Intranet like system for a close group of people and if an error occurs I get an email that reminds me to add more tests. So I fix the bugs and upgrade the server. But I often want to know what poor sucker was logged in at the time the exception happened so that I can email them and say something like "Hi! I noticed your stumbled across a bug. My bad. Just wanted to let you know I've fixed that now"

So to do this I installed a silly little piece of middleware:

from django.conf import settings
class ExceptionExtraMiddleware(object):
   def process_exception(self, request, exception):
       if settings.DEBUG:
           return
       try:
           logged_in_info = ''
           if request.user and request.user.is_authenticated():
               logged_in_info = "%s" % request.user
               if request.user.email:
                   logged_in_info += ' %s' % request.user.email
               if request.user.first_name or request.user.last_name:
                   logged_in_info += ' (%s %s)' % \
                     (request.user.first_name, request.user.last_name)
           if logged_in_info:
               request.META['ERROR-X-LOGGED-IN'] = logged_in_info
       except:
           # don't make matters worse in these sensitive times
           logging.debug("Unable to debug who was logged in", exc_info=True)

This means that when I get an email with the traceback and snapshot of the request object I get this included:

...
'ERROR-X-LOGGED-IN': u'anita (Anita Test)',
...

UPDATE

The code above had a bug in it. Doing an if on request.user will return true even if there is no logged in user. The safest thing is to change it to:

if request.user and request.user.is_authenticated():

fcgi vs. gunicorn vs. uWSGI

09 April 2010 30 comments   Python, Django, Linux


uwsgi is the latest and greatest WSGI server and promising to be the fastest possible way to run Nginx + Django. Proof here But! Is it that simple? Especially if you're involving Django herself.

So I set out to benchmark good old threaded fcgi and gunicorn and then with a source compiled nginx with the uwsgi module baked in I also benchmarked uwsgi. The first mistake I did was testing a Django view that was using sessions and other crap. I profiled the view to make sure it wouldn't be the bottleneck as it appeared to take only 0.02 seconds each. However, with fcgi, gunicorn and uwsgi I kept being stuck on about 50 requests per second. Why? 1/0.02 = 50.0!!! Clearly the slowness of the Django view was thee bottleneck (for the curious, what took all of 0.02 was the need to create new session keys and putting them into the database).

So I wrote a really dumb Django view with no sessions middleware enabled. Now we're getting some interesting numbers:

fcgi (threaded)              640 r/s
fcgi (prefork 4 processors)  240 r/s (*)
gunicorn (2 workers)         1100 r/s
gunicorn (5 workers)         1300 r/s
gunicorn (10 workers)        1200 r/s (?!?)
uwsgi (2 workers)            1800 r/s
uwsgi (5 workers)            2100 r/s
uwsgi (10 workers)           2300 r/s

(* this made my computer exceptionally sluggish as CPU when through the roof)

fcgi vs. gunicorn vs. uwsgi If you're wondering why the numbers appear to be rounded it's because I ran the benchmark multiple times and guesstimated an average (also obviously excluded the first run).

Misc notes

Conclusion

gunicorn is the winner in my eyes. It's easy to configure and get up and running and certainly fast enough and I don't have to worry about stray threads being created willy nilly like threaded fcgi. uwsgi definitely worth coming back to the day I need to squeeze few more requests per second but right now it just feels to inconvenient as I can't convince my sys admins to maintain compiled versions of nginx for the little extra benefit.

Having said that, the day uwsgi becomes available as a Debian package I'm all over it like a dog on an ass-flavored cookie.

And the "killer benefit" with gunicorn is that I can predict the memory usage. I found, on my laptop: 1 worker = 23Mb, 5 workers = 82Mb, 10 workers = 155Mb and these numbers stayed like that very predictably which means I can decide quite accurately how much RAM I should let Django (ab)use.

UPDATE:

Since this was publish we, in my company, have changed all Djangos to run over uWSGI. It's proven faster than any alternatives and extremely stable. We actually started using it before it was merged into core Nginx but considering how important this is and how many sites we have it's not been a problem to run our own Nginx package.

Hail uWSGI!

Voila! Now feel free to flame away about the inaccuracies and what multitude of more wheels and knobs I could/should twist to get even more juice out.

The awesomest way possible to serve your static stuff in Django with Nginx

24 March 2010 19 comments   Django

http://github.com/peterbe/django-static


I'm the proud creator of django-static which is a Django app that takes care of how you serve your static media the best way possible. Although some of these things are subjective generally this is the ideal checklist of servicing your static media:

  1. Cache headers must be set to infinity
  2. URLs must be unique so that browsers never have to depend on refreshing
  3. The developer (who decided which media to include) should not have to worry himself with deployment
  4. The developer/artist (who makes the media) should not have to worry himself with deployment
  5. All Javascript and CSS must be whitespace optimized in a safe way and served with Gzip
  6. All images referenced inside CSS should be taken care of too
  7. It must be possible to combine multiple resources of Javascript or CSS into one
  8. It must be possible to easily test production deployment in development environment without too much effort
  9. A sysadmin shouldn't have to understand a developers Django application
  10. A development environment must be unhindered by this optimization
  11. Processing overhead of must be kept to a minimum
  12. Must be possible to easily say which resources can be whitespace optimized and which can not

So let's get started setting all of this up in your Django + Nginx environment. Let's start with the Django side of things.

Download and install django-static by first running something like easy_install django-static then add django_static to INSTALLED_APPS in your settings.py. and add this to enable 'django-static':

DJANGO_STATIC = True

Then edit your base.html template from this:

<html>
<link rel="stylesheet" href="/css/screen.css">
<link rel="stylesheet" href="/css/typo.css">
<link rel="stylesheet" href="/css/print.css" media="print">
<body>
<img src="/img/logo.png" alt="Logo">
{% block body %}{% endblock %}
<script type="text/javascript" src="/js/site.js"></script>
<script type="text/javascript">
window.onload = function() {
   dostuff();
};
</script>
</body>

To this new optimized version:

{% load django_static %}
<html>
{% slimall %}
<link rel="stylesheet" href="/css/screen.css">
<link rel="stylesheet" href="/css/typo.css">
<link rel="stylesheet" href="/css/print.css" media="print">
{% endslimall %}
<body>
<img src="{% staticfile "/img/logo.png" %}" alt="Logo">
{% block body %}{% endblock %}
<script type="text/javascript" src="{% slimfile "/js/site.js" %}"></script>
<script type="text/javascript">
{% slimcontent %}
window.onload = function() {
   dostuff();
};
{% endslimcontent %}
</script>
</body>
</html>

django_static when loaded offers you the following tags:

  1. staticfile <filename>
  2. slimfile <filename>
  3. slimcontent ... endslimcontent
  4. staticall ... endstaticall
  5. slimall ... endslimall

All the tags with the word slim are copies of the equivalent without; but on its way to publication it attempts to whitespace optimize the content. Now, rendering this, what do you get? It will look something like this if you view the rendered source:

<html>
<link rel="stylesheet" href="/css/screen_typo.1269174558.css">
<link rel="stylesheet" href="/css/print.1269178381.css" media="print">
<body>
<img src="/img/logo.1269170122.png" alt="Logo">
[[[ MAIN CONTENT SNIPPED ]]]
<script type="text/javascript" src="/js/site.1269198161.js"></script>
<script type="text/javascript">
indow.onload=function(){dostuff()};
</script>
</body>

As you can see timestamps are put into the URLs. These timestamps are the modification time of the files which means that you never run the risk of serving an old file by an already used name.

The next step is to wire this up in your Nginx. Here is the relevant rewrite rule:

location ^~ /css/  {
    root /var/mydjangosite/media;
    expires max;
    access_log off;
}
location ^~ /js/  {
    root /var/mydjangosite/media;
    expires max;
    access_log off;
}
location ^~ /img/  {
    root /var/mydjangosite/media;
    expires max;
    access_log off;
}

That wasn't particularly pretty. Besides as we haven't done any configuration yet this means that files like print.1269178381.css has been created inside your media files directory. Since these files are sooner or later going to be obsolete and they're never going to get included in your source control we probably want to put them somewhere else. Add this setting to your 'settings.py':

DJANGO_STATIC_SAVE_PREFIX = '/tmp/cache-forever'

That means that all the whitespace optimized files are put in this place instead. And the files that aren't whitespace optimized have symlinks into this directory.

The next problem with the Nginx config lines is that we're repeating ourselves for each prefix. Let's instead set a general prefix with this config:

DJANGO_STATIC_NAME_PREFIX = '/cache-forever'

And with that in place you can change your Nginx config to this:

location ^~ /cache-forever/  {
    root /tmp;
    expires max;
    access_log off;
}

django-static is wired up to depend on slimmer if available but you can use different ones, namely Yahoo! YUI Compressor and Google Closure Tools. So, let's use YUI Compressor for whitespace optimizing the CSS and Google Closure for the whitespace optimize the Javascript. Add this to your 'settings.py':

DJANGO_STATIC_CLOSURE_COMPILER = '/var/lib/stuff/compiler.jar'
DJANGO_STATIC_YUI_COMPRESSOR = '/var/lib/stuff/yuicompressor-2.4.2.jar'

Now we get the best possible whitespace optimization and a really neat Nginx configuration. Lastly (and this is optional) we might want to serve the static media from a different domain name as the browser won't download more than two resources at a time from the same domain. Or even better you might have a domain name dedicated that never accepts or sends any cookie headers (for example, Yahoo! uses yimg.com). This is accomplished by setting this setting:

DJANGO_STATIC_MEDIA_URL = 'http://static.peterbe.com' # no trailing slash

Now you're ready to go! Every single item on the big list above is checked. With a few easy steps and some modifications to your templates you can get the simplest yet best performing setup for your static media. As an example, study the static media URLs and headers of crosstips.org.

Some people might prefer to use a remote CDN to host the static media. This is something django-static is currently not able to do but I'm more than happy to accept patches and ideas from people who want to use it in production and who are eager to help. Everything else still applies. We would just need a callback function that can handle the network copy.

UPDATE

At the time of writing, version 1.3.7 has 93% test coverage. The number of lines of tests is double to the actual code itself. Code: 661 lines. Tests: 1358

Speed test between django_mongokit and postgresql_psycopg2

09 March 2010 15 comments   Python, Django

http://github.com/peterbe/django-mongokit


Following on from yesterday's blog about How and why to use django-mongokit I extended the exampleproject which is inside the django-mongokit project with another app called exampleapp_sql which does the same thing as the exampleapp but does it with SQL instead. Then I added a very simple benchmarker app in the same project and wrote three functions:

  1. One to create 10/100/500/1000 instances of my class
  2. One to edit one field of all 10/100/500/1000 instances
  3. One to delete each of the 10/100/500/1000 instances

Speed test between django_mongokit and postgresql_psycopg2

The results can speak for themselves:

# 10
mongokit django_mongokit.mongodb
Creating 10 talks took 0.0108649730682 seconds
Editing 10 talks took 0.0238521099091 seconds
Deleting 10 talks took 0.0241661071777 seconds
IN TOTAL 0.058883190155 seconds

sql django.db.backends.postgresql_psycopg2
Creating 10 talks took 0.0994439125061 seconds
Editing 10 talks took 0.088721036911 seconds
Deleting 10 talks took 0.0888710021973 seconds
IN TOTAL 0.277035951614 seconds

# 100
mongokit django_mongokit.mongodb
Creating 100 talks took 0.114995002747 seconds
Editing 100 talks took 0.181537866592 seconds
Deleting 100 talks took 0.13414812088 seconds
IN TOTAL 0.430680990219 seconds

sql django.db.backends.postgresql_psycopg2
Creating 100 talks took 0.856637954712 seconds
Editing 100 talks took 1.16229200363 seconds
Deleting 100 talks took 0.879518032074 seconds
IN TOTAL 2.89844799042 seconds

# 500
mongokit django_mongokit.mongodb
Creating 500 talks took 0.505300998688 seconds
Editing 500 talks took 0.809900999069 seconds
Deleting 500 talks took 0.65673494339 seconds
IN TOTAL 1.97193694115 seconds

sql django.db.backends.postgresql_psycopg2
Creating 500 talks took 4.4399368763 seconds
Editing 500 talks took 5.72280597687 seconds
Deleting 500 talks took 4.34039878845 seconds
IN TOTAL 14.5031416416 seconds

# 1000
mongokit django_mongokit.mongodb
Creating 1000 talks took 0.957674026489 seconds
Editing 1000 talks took 1.60552191734 seconds
Deleting 1000 talks took 1.28869891167 seconds
IN TOTAL 3.8518948555 seconds

sql django.db.backends.postgresql_psycopg2
Creating 1000 talks took 8.57405209541 seconds
Editing 1000 talks took 14.8357069492 seconds
Deleting 1000 talks took 11.9729249477 seconds
IN TOTAL 35.3826839924 seconds

On average, MongoDB is 7 times faster.

All in all it doesn't really mean that much. We expect MongoDB to be faster than PostgreSQL because what it lacks for in features it makes up for in speed. It's interesting to see it in action and nice to see that MongoKit is fast enough to benefit from the database's speed.

As always with benchmarks: Lies, lies and more damn lies! This doesn't really compare apples for apples but hopefully with django-mongokit the comparison is becoming more fair. Also, you're free to fork the project on github and do your optimizations and re-run the tests yourself.

How and why to use django-mongokit (aka. Django to MongoDB)

08 March 2010 9 comments   Python, Django

http://github.com/peterbe/django-mongokit


How and why to use django-mongokit Here I'm going to explain how to combine Django and MongoDB using MongoKit and django-mongokit.

MongoDB is a document store built for high speed and high concurrency with a very good redundancy story. It's an alternative to relational databases (e.g. MySQL) that is what Django is tightly coupled with in it's ORM (Object Relation Mapping) and what it's called now is ODM (Object Document Mapping) in lack of a better acronym. That's where MongoKit comes in. It's written in Python and it connects to the MongoDB database using a library called pymongo and it turns data from the MongoDB and turns it into instances of classes you have defined. MongoKit has nothing to do with Django. That's where django-mongokit comes in. Written by yours truly.

So we start by defining a MongoKit subclass:

import datetime
from mongokit import Document

class Computer(Document):

    structure = {
      'make': unicode,
      'model': unicode,
      'purchase_date': datetime.datetime,
      'cpu_ghz': float,
    }

    validators = {
      'cpu_ghz': lambda x: x > 0,
      'make': lambda x: x.strip(),
    }

    default_values = {
      'purchase_date': datetime.datetime.utcnow,
    }

    use_dot_notation = True

    indexes = [
      {'fields': ['make']},
    ]

All of these class attributes are features of MongoKit. Their names are so obvious that it needs no explanation. Perhaps the one about 'use_dot_notation'; it makes it possible to access data in the structure with a dot on the instance rather that the normal dictionary lookup method. Now let's work with this class on the shell. Important: to actually try this you have to have MongoDB and pymongo installed and up and running MongoDB:

>>> from mongokit import Connection
>>> conn = Connection()
>>> from mymodels import Computer
>>> conn.register([Computer])
>>> database = conn.mydb # will be created if it didn't exist
>>> collection = database.mycollection # equivalent of a SQL table
>>> instance = collection.Computer()
>>> instance.make = u"Apple"
>>> instance.model = u"G5"
>>> instance.cpu_hrz = 2.66
>>> instance.save()
>>>
>>> type(instance)
<class 'mymodels.Computer'>
>>> instance
{'model': u'G5', 'make': u'Apple', '_id':
ObjectId('4b9244989d40b334b4000000'), 'cpu_ghz': None,
'purchase_date': datetime.datetime(2010, 3, 6, 12, 3, 8, 281905)}
>>>

As you can see it's pretty easy to work with and it just feels so pythonic and obvious. What you get is a something that works just like a normal base class with some extra sugar plus the fact that it can save the data persistently and does so efficiently and redundantly (assuming you do some work on your MongoDB set it up with replication and/or sharding). Now let's look at retrieval which, as per the design principles of MongoKit, follows the basic interface of pymongo. To learn about querying you can skim the MongoKit documentation but really the thing to read is the pymongo documentation which MongoKit layers thinly:

>>> from mongokit import Connection
>>> conn = Connection()
>>> from mymodels import Computer
>>> conn.register([Computer])
>>> database = conn.mydb
>>> collection = database.mycollection
>>> instances = collection.Computer.find()
>>> type(instances)
<class 'mongokit.generators.MongoDocumentCursor'>
>>> list(instances)[0]
{u'cpu_ghz': None, u'model': u'G5', u'_id':
ObjectId('4b9244989d40b334b4000000'), u'purchase_date':
datetime.datetime(2010, 3, 6, 12, 3, 8, 281000), u'make': u'Apple'}
>>> instances = collection.Computer.find().count()
1
>>> collection.Computer.one() == list(collection.Computer.find())[0]
True

The query methods one() and find() can take search parameters which limits what you get back. These are quite similar to how Django's default Manager has a method called objects.get() and objects.filter() which should make you feel familiar.

So, what would it take to be able to do this MongoKit business in a running Django so that you can write Django views and templates that interface with your Mongo "documents". Answer: use django-mongokit. django-mongokit is a thin wrapper around MongoKit that makes it just slightly more convenient to use MongoKit in a Django environment. The primary tasks django-mongokit takes care of are: (1) the connection and (2) giving your classes a _meta class attribute. Especially important regarding the connection is that django-mongokit takes care of setting up and destroying a test database for you for running your tests. And since it's all in one place you don't have to worry about creating various connections to MongoKit in your views or management commands. Let's first define the database in your settings.py file:

DATABASES = {
    'default': {
        'ENGINE': 'sqlite3',
        'NAME': 'example-sqlite3.db',
    },
    'mongodb': {
        'ENGINE': 'django_mongokit.mongodb',
        'NAME': 'mydb',
    },
}

Then, with that in place all you need to get a connection are these lines:

>>> from django_mongokit import get_database
>>> database = get_database()

The reason it's a function an not an instance is because the database is going to be different based on if you're running tests or running in production/development mode. Had we imported a database instance instead of a function to get a database instance, the code would need to know what database you want when the python files are imported which is something that happens before we even know what you're doing with the imported code. django-mongokit also gives you the connection instances which you'll need to register your own models:

>>> from django_mongokit import connection
>>> connection.register([Computer])

But I recommend that a best practice is to always register your models right after you have defined them. This brings us to the DjangoDocument class so let's get straight into it this time in your models.py file inside a Django app you've just created:

import datetime
from django_mongokit import connection
from django_mongokit.document import DjangoDocument

class Computer(DjangoDocument): # notice difference from above
    class Meta:
        verbose_name_plural = "Computerz"

    structure = {
      'make': unicode,
      'model': unicode,
      'purchase_date': datetime.datetime,
      'cpu_ghz': float,
    }

    validators = {
      'cpu_ghz': lambda x: x > 0,
      'make': lambda x: x.strip(),
    }

    default_values = {
      'purchase_date': datetime.datetime.utcnow,
    }

    use_dot_notation = True

    indexes = [
      {'fields': ['make']},
    ]

connection.register([Computer])

That's now all you need to get on with your code. The DjangoDocument class offers a few more gems that makes your life easier such as handling signals and registering itself in a global variable (import django_mongokit.document.model_names and inspect). See the django-mongokit README file for more information.

So, what's so great about this setup? It's by personal taste but for me it's simplicity and purity. I like the thin layer MongoKit adds on top of pure pymongo that becomes oh so practical such as helping you make sure you only store what you said you would and it's easier to work with class instances you can see the definition of than it is to work with dictionaries and lists.

And here's one of MongoKit's best selling points for me: the few times you need speed, speed and more speed it's possible to go straight to the source without doing any wrapping. This is equivalent of how you sometimes in Django run raw SQL queries which, let's be honest, does happen quite frequently when the project becomes non-trivial. Django's ORM has the ability to turn the output of the raw SQL output into objects and with MongoKit when you go straight into MongoDB you get pure Python dictionaries which you can use to create instances with. Here's an example where you can't query what you're looking for but you might be trolling through thousands of documents:

>>> from some.thridparty import my_kind_of_cpu
>>> computers = []
>>> for item in collection.find():
...     # can't use dot notation when it's not a document
...     cpu = item['cpu_ghz']
...     if my_kind_of_cpu(cpu):
...         computers.append(collection.Computer(item))
...

A use case for this is when you want to store different types of documents in the same collection and by a value extracted from a raw query you only turn selected few results into mapped instances. More about that in a later post maybe.