How and why to use django-mongokit (aka. Django to MongoDB)

08 March 2010   10 comments   Python, Django

http://github.com/peterbe/django-mongokit

Powered by Fusion×

How and why to use django-mongokit Here I'm going to explain how to combine Django and MongoDB using MongoKit and django-mongokit.

MongoDB is a document store built for high speed and high concurrency with a very good redundancy story. It's an alternative to relational databases (e.g. MySQL) that is what Django is tightly coupled with in it's ORM (Object Relation Mapping) and what it's called now is ODM (Object Document Mapping) in lack of a better acronym. That's where MongoKit comes in. It's written in Python and it connects to the MongoDB database using a library called pymongo and it turns data from the MongoDB and turns it into instances of classes you have defined. MongoKit has nothing to do with Django. That's where django-mongokit comes in. Written by yours truly.

So we start by defining a MongoKit subclass:

import datetime
from mongokit import Document

class Computer(Document):

    structure = {
      'make': unicode,
      'model': unicode,
      'purchase_date': datetime.datetime,
      'cpu_ghz': float,
    }

    validators = {
      'cpu_ghz': lambda x: x > 0,
      'make': lambda x: x.strip(),
    }

    default_values = {
      'purchase_date': datetime.datetime.utcnow,
    }

    use_dot_notation = True

    indexes = [
      {'fields': ['make']},
    ]

All of these class attributes are features of MongoKit. Their names are so obvious that it needs no explanation. Perhaps the one about 'use_dot_notation'; it makes it possible to access data in the structure with a dot on the instance rather that the normal dictionary lookup method. Now let's work with this class on the shell. Important: to actually try this you have to have MongoDB and pymongo installed and up and running MongoDB:

>>> from mongokit import Connection
>>> conn = Connection()
>>> from mymodels import Computer
>>> conn.register([Computer])
>>> database = conn.mydb # will be created if it didn't exist
>>> collection = database.mycollection # equivalent of a SQL table
>>> instance = collection.Computer()
>>> instance.make = u"Apple"
>>> instance.model = u"G5"
>>> instance.cpu_hrz = 2.66
>>> instance.save()
>>>
>>> type(instance)
<class 'mymodels.Computer'>
>>> instance
{'model': u'G5', 'make': u'Apple', '_id':
ObjectId('4b9244989d40b334b4000000'), 'cpu_ghz': None,
'purchase_date': datetime.datetime(2010, 3, 6, 12, 3, 8, 281905)}
>>>

As you can see it's pretty easy to work with and it just feels so pythonic and obvious. What you get is a something that works just like a normal base class with some extra sugar plus the fact that it can save the data persistently and does so efficiently and redundantly (assuming you do some work on your MongoDB set it up with replication and/or sharding). Now let's look at retrieval which, as per the design principles of MongoKit, follows the basic interface of pymongo. To learn about querying you can skim the MongoKit documentation but really the thing to read is the pymongo documentation which MongoKit layers thinly:

>>> from mongokit import Connection
>>> conn = Connection()
>>> from mymodels import Computer
>>> conn.register([Computer])
>>> database = conn.mydb
>>> collection = database.mycollection
>>> instances = collection.Computer.find()
>>> type(instances)
<class 'mongokit.generators.MongoDocumentCursor'>
>>> list(instances)[0]
{u'cpu_ghz': None, u'model': u'G5', u'_id':
ObjectId('4b9244989d40b334b4000000'), u'purchase_date':
datetime.datetime(2010, 3, 6, 12, 3, 8, 281000), u'make': u'Apple'}
>>> instances = collection.Computer.find().count()
1
>>> collection.Computer.one() == list(collection.Computer.find())[0]
True

The query methods one() and find() can take search parameters which limits what you get back. These are quite similar to how Django's default Manager has a method called objects.get() and objects.filter() which should make you feel familiar.

So, what would it take to be able to do this MongoKit business in a running Django so that you can write Django views and templates that interface with your Mongo "documents". Answer: use django-mongokit. django-mongokit is a thin wrapper around MongoKit that makes it just slightly more convenient to use MongoKit in a Django environment. The primary tasks django-mongokit takes care of are: (1) the connection and (2) giving your classes a _meta class attribute. Especially important regarding the connection is that django-mongokit takes care of setting up and destroying a test database for you for running your tests. And since it's all in one place you don't have to worry about creating various connections to MongoKit in your views or management commands. Let's first define the database in your settings.py file:

DATABASES = {
    'default': {
        'ENGINE': 'sqlite3',
        'NAME': 'example-sqlite3.db',
    },
    'mongodb': {
        'ENGINE': 'django_mongokit.mongodb',
        'NAME': 'mydb',
    },
}

Then, with that in place all you need to get a connection are these lines:

>>> from django_mongokit import get_database
>>> database = get_database()

The reason it's a function an not an instance is because the database is going to be different based on if you're running tests or running in production/development mode. Had we imported a database instance instead of a function to get a database instance, the code would need to know what database you want when the python files are imported which is something that happens before we even know what you're doing with the imported code. django-mongokit also gives you the connection instances which you'll need to register your own models:

>>> from django_mongokit import connection
>>> connection.register([Computer])

But I recommend that a best practice is to always register your models right after you have defined them. This brings us to the DjangoDocument class so let's get straight into it this time in your models.py file inside a Django app you've just created:

import datetime
from django_mongokit import connection
from django_mongokit.document import DjangoDocument

class Computer(DjangoDocument): # notice difference from above
    class Meta:
        verbose_name_plural = "Computerz"

    structure = {
      'make': unicode,
      'model': unicode,
      'purchase_date': datetime.datetime,
      'cpu_ghz': float,
    }

    validators = {
      'cpu_ghz': lambda x: x > 0,
      'make': lambda x: x.strip(),
    }

    default_values = {
      'purchase_date': datetime.datetime.utcnow,
    }

    use_dot_notation = True

    indexes = [
      {'fields': ['make']},
    ]

connection.register([Computer])

That's now all you need to get on with your code. The DjangoDocument class offers a few more gems that makes your life easier such as handling signals and registering itself in a global variable (import django_mongokit.document.model_names and inspect). See the django-mongokit README file for more information.

So, what's so great about this setup? It's by personal taste but for me it's simplicity and purity. I like the thin layer MongoKit adds on top of pure pymongo that becomes oh so practical such as helping you make sure you only store what you said you would and it's easier to work with class instances you can see the definition of than it is to work with dictionaries and lists.

And here's one of MongoKit's best selling points for me: the few times you need speed, speed and more speed it's possible to go straight to the source without doing any wrapping. This is equivalent of how you sometimes in Django run raw SQL queries which, let's be honest, does happen quite frequently when the project becomes non-trivial. Django's ORM has the ability to turn the output of the raw SQL output into objects and with MongoKit when you go straight into MongoDB you get pure Python dictionaries which you can use to create instances with. Here's an example where you can't query what you're looking for but you might be trolling through thousands of documents:

>>> from some.thridparty import my_kind_of_cpu
>>> computers = []
>>> for item in collection.find():
...     # can't use dot notation when it's not a document
...     cpu = item['cpu_ghz']
...     if my_kind_of_cpu(cpu):
...         computers.append(collection.Computer(item))
...

A use case for this is when you want to store different types of documents in the same collection and by a value extracted from a raw query you only turn selected few results into mapped instances. More about that in a later post maybe.

Comments

Micha&#322; Pasternak
How can you have speed in "for item in collection.find()" loop? It is O(n) when you use such approach with SQL. How can it be different for MongoDB?
Peter Bengtsson
It's also O(n) sure and only details will tell if it's faster than SQL but the point was in that by doing that you're getting raw data from the database without turning it into mapped objects. In that example code the variable 'item' would be a dict not a fancy class instance with methods and stuff.
Gudbergur Erlendsson
I think it's lacking DRY. This is more beautiful IMO:

#models.py
from mongokit import *
class Computer(DjangoDocument): # notice difference from above
make = UnicodeField(validate=lambda x: x.strip(), index=True)
model = UnicodeField()
purchase_date = DateTimeField(default=datetime.datetime.utcnow)
cpu_ghz = FloatField(validate=lambda x: x>0)

class Meta:
verbose_name_plural = "Computerz"

and then

from models import Computer
computer=Computer()
computer.make = u"Apple"
computer.model = u"G5"
computer.cpu_hrz = 2.66
computer.save()

and the library should default handle the connection internally, but expose then connection objects when required.
Aakarsh.
Hello Gudberger,
 I have tried to use mongo kit, I understand that the unicode part has to be done in models.py .
Can you please tell me where do I write this piece of code you have mentioned here:

from models import Computer
computer=Computer()
computer.make = u"Apple"
computer.model = u"G5"
computer.cpu_hrz = 2.66
computer.save()

Is it the pyhon shell itself or the models.py?
Thanks in advance
Lior Gradstein
You made an error in one of the assignements, you wrote cpu_hrz, so in the display of cpu_ghz, you get None.
drozzy
I support @Gudbergur Erlendsson's statement.

The lack of "django-like" model definitions is what I noticed first. Also the use of DjangoDocument - within the django_mongokit module is a little redundant don't you think?

I am also not sure why I need an intermediate layer, this thing called "MongoKit". Just integrate everything into one module for django and I will gladly use it.

In any case - this looks great keep up the great work!
megido
Hi!
Could you please explain why should I use MongoKit, but not just PyMongo? Results coming from MongoDB already are objects when using PyMongo.
Peter Bengtsson
For strucuture. And to be able to attach that structure to python classes so you can have methods that relate to the data in a object oriented way.
Sunshine
How could any of this be better sttead? It couldn't.
Anuj
how to support nested structures ?
Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
Ubuntu Cola or Ubuntu Linux 06 March 2010
Next:
Speed test between django_mongokit and postgresql_psycopg2 09 March 2010
Related by keywords:
To readline() or readlines() 12 March 2004
bool is instance of int in Python 05 December 2008
Reciprocal lesson about gender perspectives 02 September 2011
Nginx vs. Squid 17 March 2009
Nasty surprise of Django cache 09 December 2008
IssueTrackerProduct now officially abandoned 30 March 2012
Google Calendar, iCalendar Validator but not bloody Apple iCal 09 April 2009
On the command line no one can hear you screen. Or can they? 03 May 2012
In Django, how much faster is it to aggregate? 27 October 2010
tempfile in Python standard library 07 February 2006
Random ID generator for Zope 02 September 2005
String length truncation optimization difference in Python 19 March 2012