Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Django. Clear filter

django-html-validator - now locally, fast!

12 August 2016 1 comment   Python, Web development, Django


A couple of years ago I released a project called django-html-validator (GitHub link) and it's basically a Django library that takes the HTML generated inside Django and sends it in for HTML validation.

The first option is to send the HTML payload, over HTTPS, to https://validator.nu/. Not only is this slow but it also means sending potentially revealing HTML. Ideally you don't have any passwords in your HTML and if you're doing HTML validation you're probably testing against some test data. But... it sucked.

The other alternative was to download a vnu.jar file from the github.com/validator/validator project and executing it in a subprocess with java -jar vnu.jar /tmp/file.html. Problem with this is that it's really slow because java programs take such a long time to boot up.

But then, at the beginning of the year some contributors breathed fresh life into the project. Python 3 support and best of all; the ability to start the vnu.jar as a local server on http://localhost:8888 and HTTP post HTML over to that. Now you don't have to pay the high cost of booting up a java program and you don't have to rely on a remote HTTP call.

Now it becomes possible to have HTML validation checked on every rendered HTML response in the Django unit tests.

To try it, check out the new instructions on "Setting the vnu.jar path".

The contributor who's made this possible is Ville "scop" Skyttä, as well as others. Thanks!!

How to track Google Analytics pageviews on non-web requests (with Python)

03 May 2016 0 comments   Python, Web development, Django, Mozilla


tl;dr; Use raven's ThreadedRequestsHTTPTransport transport class to send Google Analytics pageview trackings asynchronously to Google Analytics to collect pageviews that aren't actually browser pages.

We have an API on our Django site that was not designed from the ground up. We had a bunch of internal endpoints that were used by the website. So we simply exposed those as API endpoints that anybody can query. All we did was wrap certain parts carefully as to not expose private stuff and we wrote a simple web page where you can see a list of all the endpoints and what parameters are needed. Later we added auth-by-token.

Now the problem we have is that we don't know which endpoints people use and, as equally important, which ones people don't use. If we had more stats we'd be able to confidently deprecate some (for easier maintanenace) and optimize some (to avoid resource overuse).

Our first attempt was to use statsd to collect metrics and display those with graphite. But it just didn't work out. There are just too many different "keys". Basically, each endpoint (aka URL, aka URI) is a key. And if you include the query string parameters, the number of keys just gets nuts. Statsd and graphite is better when you have about as many keys as you have fingers on one hand. For example, HTTP error codes, 200, 302, 400, 404 and 500.

Also, we already use Google Analytics to track pageviews on our website, which is basically a measure of how many people render web pages that have HTML and JavaScript. Google Analytic's UI is great and powerful. I'm sure other competing tools like Mixpanel, Piwik, Gauges, etc are great too, but Google Analytics is reliable, likely to stick around and something many people are familiar with.

So how do you simulate pageviews when you don't have JavaScript rendering? The answer; using plain HTTP POST. (HTTPS of course). And how do you prevent blocking on sending analytics without making your users have to wait? By doing it asynchronously. Either by threading or a background working message queue.

Threading or a message queue

If you have a message queue configured and confident in its running, you should probably use that. But it adds a certain element of complexity. It makes your stack more complex because now you need to maintain a consumer(s) and the central message queue thing itself. What if you don't have a message queue all set up? Use Python threading.

To do the threading, which is hard, it's always a good idea to try to stand on the shoulder of giants. Or, if you can't find a giant, find something that is mature and proven to work well over time. We found that in Raven.

Raven is the Python library, or "agent", used for Sentry, the open source error tracking software. As you can tell by the name, Raven tries to be quite agnostic of Sentry the server component. Inside it, it has a couple of good libraries for making threaded jobs whose task is to make web requests. In particuarly, the awesome ThreadedRequestsHTTPTransport. Using it basically looks like this:

import urlparse
from raven.transport.threaded_requests import ThreadedRequestsHTTPTransport

transporter = ThreadedRequestsHTTPTransport(
    urlparse.urlparse('https://ssl.google-analytics.com/collect'),
    timeout=5
)

params = {
    ...more about this later...
}

def success_cb():
    print "Yay!"

def failure_cb(exception):
    print "Boo :("

transporter.async_send(
    params,
    headers,
    success_cb,
    failure_cb
)

The call isn't very different from regular plain old requests.post.

About the parameters

This is probably the most exciting part and the place where you need some thought. It's non-trivial because you might need to put some careful thought into what you want to track.

Your friends is: This documentation page

There's also the Hit Builder tool where you can check that the values you are going to send make sense.

Some of the basic ones are easy:

"Protocol Version"

Just set to v=1

"Tracking ID"

That code thing you see in the regular chunk of JavaScript you put in the head, e.g tid=UA-1234-Z

"Data Source"

Optional word you call this type of traffic. We went with ds=api because we use it to measure the web API.

The user ones are a bit more tricky. Basically because you don't want to accidentally leak potentially sensitive information. We decided to keep this highly anonymized.

"Client ID"

A random UUID (version 4) number that identifies the user or the app. Not to be confused with "User ID" which is basically a string that identifies the user's session storage ID or something. Since in our case we don't have a user (unless they use an API token) we leave this to a new random UUID each time. E.g. cid=uuid.uuid4().hex This field is not optional.

"User ID"

Some string that identifies the user but doesn't reveal anything about the user. For example, we use the PostgreSQL primary key ID of the user as a string. It just means we can know if the same user make several API requests but we can never know who that user is. Google Analytics uses it to "lump" requests together. This field is optional.

Next we need to pass information about the hit and the "content". This is important. Especially the "Hit type" because this is where you make your manually server-side tracking act as if the user had clicked around on the website with a browser.

"Hit type"

Set this to t=pageview and it'll show up Google Analytics as if the user had just navigated to the URL in her browser. It's kinda weird to do this because clearly the user hasn't. Most likely she's used curl or something from the command line. So it's not really a pageview but, on our end, we have "views" in the webserver that produce information to the user. Some of it is HTML and some of it is JSON, in terms of output format, but either way they're sending us a URL and we respond with data.

"Document location URL"

The full absolute URL of that was used. E.g. https://www.example.com/page?foo=bar. So in our Django app we set this to dl=request.build_absolute_uri(). If you have a site where you might have multiple domains in use but want to collect them all under just 1 specific domain you need to set dh=example.com.

"Document Host Name" and "Document Path"

I actually don't know what the point of this is if you've already set the "Document location URL".

"Document Title"

In Google Analytics you can view your Content Drilldown by title instead of by URL path. In our case we set this to a string we know from the internal Python class that is used to make the API endpoint. dt='API (%s)'%api_model.__class__.__name__.

There are many more things you can set, such as the clients IP, the user agent, timings, exceptions. We chose to NOT include the user's IP. If people using the JavaScript version of Google Analytics can set their browser to NOT include the IP, we should respect that. Also, it's rarely interesting to see where the requests for a web API because it's often servers' curl or requests that makes the query, not the human.

Sample implementation

Going back to the code example mentioned above, let's demonstrate a fuller example:

import urlparse
from raven.transport.threaded_requests import ThreadedRequestsHTTPTransport

transporter = ThreadedRequestsHTTPTransport(
    urlparse.urlparse('https://ssl.google-analytics.com/collect'),
    timeout=5
)

# Remember, this is a Django, but you get the idea

domain = settings.GOOGLE_ANALYTICS_DOMAIN
if not domain or domain == 'auto':
    domain = RequestSite(request).domain

params = {
    'v': 1,
    'tid': settings.GOOGLE_ANALYTICS_ID,
    'dh': domain,
    't': 'pageview,
    'ds': 'api',
    'cid': uuid.uuid4().hext,
    'dp': request.path,
    'dl': request.build_request_uri(),
    'dt': 'API ({})'.format(model_class.__class__.__name__),
    'ua': request.META.get('HTTP_USER_AGENT'),
}

def success_cb():
    logger.info('Successfully informed Google Analytics (%s)', params)

def failure_cb(exception):
    logger.exception(exception)

transporter.async_send(
    params,
    headers,
    success_cb,
    failure_cb
)

How to unit test this

The class we're using, ThreadedRequestsHTTPTransport has, as you might have seen, a method called async_send. There's also one, with the exact same signature, called sync_send which does the same thing but in a blocking fashion. So you could make your code look someting silly like this:

def send_tracking(page_title, request, async=True):
    # ...same as example above but wrapped in a function...
    function = async and transporter.async_send or transporter.sync_send
    function(
        params,
        headers,
        success_cb,
        failure_cb
    )

And then in your tests you pass in async=False instead.
But don't do that. The code shouldn't be sub-serviant to the tests (unless it's for the sake of splitting up monster-long functions).
Instead, I recommend you mock the inner workings of that ThreadedRequestsHTTPTransport class so you can make the whole operation synchronous. For example...

import mock
from django.test import TestCase
from django.test.client import RequestFactory

from where.you.have import pageview_tracking


class TestTracking(TestCase):

    @mock.patch('raven.transport.threaded_requests.AsyncWorker')
    @mock.patch('requests.post')
    def test_pageview_tracking(self, rpost, aw):

        def mocked_queue(function, data, headers, success_cb, failure_cb):
            function(data, headers, success_cb, failure_cb)

        aw().queue.side_effect = mocked_queue

        request = RequestFactory().get('/some/page')
        with self.settings(GOOGLE_ANALYTICS_ID='XYZ-123'):
            pageview_tracking('Test page', request)

            # Now we can assert that 'requests.post' was called.
            # Left as an exercise to the reader :)
            print rpost.mock_calls       

This is synchronous now and works great. It's not finished. You might want to write a side effect for the requests.post so you can have better control of that post. That'll also give you a chance to potentially NOT return a 200 OK and make sure that your failure_cb callback function gets called.

How to manually test this

One thing I was very curious about when I started was to see how it worked if you really ran this for reals but without polluting your real Google Analytics account. For that I built a second little web server on the side, whose address I used instead of https://ssl.google-analytics.com/collect. So, change your code so that https://ssl.google-analytics.com/collect is not hardcoded but a variable you can change locally. Change it to http://localhost:5000/ and start this little Flask server:

import time
import random
from flask import Flask, abort, request

app = Flask(__name__)
app.debug = True

@app.route("/", methods=['GET', 'POST'])
def hello():
    print "- " * 40
    print request.method, request.path
    print "ARGS:", request.args
    print "FORM:", request.form
    print "DATA:", repr(request.data)
    if request.args.get('sleep'):
        sec = int(request.args['sleep'])
        print "** Sleeping for", sec, "seconds"
        time.sleep(sec)
        print "** Done sleeping."
    if random.randint(1, 5) == 1:
        abort(500)
    elif random.randint(1, 5) == 1:
        # really get it stuck now
        time.sleep(20)
    return "OK"

if __name__ == "__main__":
    app.run()

Now you get an insight into what gets posted and you can pretend that it's slow to respond. Also, you can get an insight into how your app behaves when this collection destination throws a 5xx error.

How to really test it

Google Analytics is tricky to test in that they collect all the stuff they collect then they take their time to process it and it then shows up the next day as stats. But, there's a hack! You can go into your Google Analytics account and click "Real-Time" -> "Overview" and you should see hits coming in as you're testing this. Obviously you don't want to do this on your real production account, but perhaps you have a stage/dev instance you can use. Or, just be patient :)

How to no-mincss links with django-pipeline

03 February 2016 2 comments   Python, Web development, Django


This might be the kind of problem only I have, but I thought I'd share in case others are in a similar pickle.

Warming Up

First of all, the way my personal site works is that every rendered page gets cached as rendered HTML. Midway, storing the rendered page in the cache, an optimization transformation happens. It basically takes HTML like this:

<html>
<link rel="stylesheet" href="vendor.css">
<link rel="stylesheet" href="stuff.css">
<body>...</body>
</html>

into this:

<html>
<style>
/* optimized contents of vendor.css and stuff.css minified */
</style>
<body>...</body>
</html>

Just right-click and "View Page Source" and you'll see.

When it does this it also filters out CSS selectors in those .css files that aren't actually used in the rendered HTML. This makes the inlined CSS much smaller. Especially since so much of the CSS comes from a CSS framework.

However, there are certain .css files that have references to selectors that aren't in the generated HTML but are needed later when some JavaScript changes the DOM based on AJAX or user actions. For example, the CSS used by the Autocompeter widget. The program that does this CSS optimization transformation is called mincss and it has a feature where you can tell it to NOT bother with certain CSS selectors (using a CSS comment) or certain <link> tags entirely. It looks like this:

<link rel="stylesheet" href="ajaxstuff.css" data-mincss="no">

Where Does django-pipeline Come In?

So, setting that data-mincss="no" isn't easy when you use django-pipeline because you don't write <link ... in your Django templates, you write {% stylesheet 'name-of-bundle %}. So, how do you get it in?

Well, first let's define the bundle. In my case it looks like this:

PIPELINE_CSS = {
  ...
  # Bundle of CSS that strictly isn't needed at pure HTML render-time
  'base_dynamic': {
        'source_filenames': (
            'css/transition.css',
            'autocompeter/autocompeter.min.css',
        ),
        'extra_context': {
            'no_mincss': True,
        },
        'output_filename': 'css/base-dynamic.min.css',
    },
    ...
}

But that isn't enough. Next, I need to override how django-pipeline turn that block into a <link ...> tag. To do that, you need to create a directory and file called pipeline/css.html (or pipeline/css.jinja if you use Jinja rendering by default).

So take the default one from inside the pipeline package and copy it into your project into one of your apps's templates directory. For example, in my case, peterbecom/apps/base/templates/pipeline/css.jinja. Then, in that template add at the very end somehting like this:

{% if no_mincss %} data-mincss="no"{% endif %} />

The Point?

The point is that if you're in a similar situation where you want django-pipeline to output the <link> or <script> tag differently than it's capable of, by default, then this is a good example of that.

Headsupper.io

05 December 2015 0 comments   Python, Web development, Django, Javascript, ReactJS


tl;dr

Headsupper.io is a free GitHub webhook service that emails people when commits have the configurable keyword "headsup" in it.

Introduction

Headsupper.io is great for when you have a GitHub project with multiple people working on it and when you make a commit you want to notify other people by email.

Basically, you set up a GitHub Webhook, on pushes, to push to https://headsupper.io and then it'll parse the incoming push and its commits and look for certain things in the commit message. By default, it'll look for the word "headsup". For example, a git commit message might look like this:

fixes #123 - more juice in the Saab headsup! will require updating

Or you can use the multi-line approach where the first line is short and sweat and after the break a bit more elaborate:

bug 1234567 - tea kettle upgrade 2.1

Headsup: Next time you git pull from master, remember to run 
peep install on the requirements.txt file since this commit 
introduces a bunch of crazt dependency changes.

Git commits that come through that don't have any match on this word will simply be ignored by Headsupper.

How you use it

Maybe paradoxically, you need to authenticate with your GitHub account but that's in read-only mode and does NOT set up the Webhook for you. The reason you have to authenticate to prepare a configuration on headsupper.io is to tie the configuration to a real user.

Once you've authenticated you get the option to create your first configuration, then you have to enter at least these three piece of information:

  1. The GitHub "full name". This is the org name, slash, repo name. E.g. peterbe/django-peterbecom or mozilla/socorro.
  2. Pick a secret. Remember what you typed, because you'll need to type in this same secret when you set up the Webhook on your GitHub project's Webhooks page. (This is used to checksum and verify the source of the Webhook push)
  3. Who to send to. A list of email addresses separated with a newline or a semi-colon.

Once you've set that up, you'll need to go to your GitHub project's Setting page and enter a new Webhook and the URL you need to type in is https://headsupper.io and for the "Secret" type in that secret you used earlier. That's it!

Rules and options

The word that triggers is configurable by you. The default is headsupper. And by default, it's case insensitive. You can change that so it's case sensitive. Also, the word has to be word delimited on the left (e.g. a space or a newline character) and on the right it needs to be a space, a : or a !. So this won't match: theheadsup: or headsupper.

Other optional things you can configure are:

That last option, Only send when a new tag is created, is interesting. I added that option because at work, we make production server releases by pushing a git tag. When a tag is pushed, all those commits are sent to the continuous deployment service which makes a server upgrade. This means you get a chance to enter a heads up message to be emailed to the people who care about new deployments going out.

How it was built

It's a mix between Django and ReactJS. The whole client-side app it built statically with Webpack in ES6. It's served as static files through Nginx. But Nginx is making an exception on all URLs that start with /api or /accounts. The /api/* it used for loading and setting JSON. The /accounts/* is used for the GitHub OAuth endpoints.

What's interesting about this the architecture is that it's using HTTP cookies. Not API tokens. Cookies are quite good in that they're established and the browser does all the automated work of keeping it secure and making each request potentially authenticated.

Here's the relevant React code and here's the relevant Django code that processes the Webhook.

The whole project is available on: https://github.com/peterbe/headsupper.

Also, I made a demo at the November Mozilla Beer and Tell.

Django forms and making datetime inputs localized

04 December 2015 2 comments   Python, Django


tl;dr

To change from one timezone aware datetime to another, turn it into a naive datetime and then use pytz's localize() method to convert it back to the timezone you want it to be.

Introduction

Suppose you have a Django form where you allow people to enter a date, e.g. 2015-06-04 13:00. You have to save it timezone aware, because you have settings.USE_TZ on and it's just many times to store things in timezone aware dates.

By default, if you have settings.USE_TZ and no timezone information is in the string that the django.form.fields.DateTimeField parses, it will use settings.TIME_ZONE and that timezone might be different from what it really should be. For example, in my case, I have an app where you can upload a CSV file full of information about events. These events belong to a venue which I have in the database. Every venue has a timezone, e.g. Europe/Berlin or US/Pacific. So if someone uploads a CSV file for the Berlin location 2015-06-04 13:00 means 13:00 o'clock in Berlin. I don't care where the server is hosted and what its settings.TIME_ZONE is. I need to make that input timezone aware specifically for Berlin/Europe.

Examples

Suppose you have settings.TIME_ZONE == 'US/Pacific' and you let the django.form.fields.DateTimeField do its magic you get something you don't want:

>>> from django.conf import settings
>>> settings.TIME_ZONE
'US/Pacific'
>>> assert settings.USE_TZ
>>> from django.forms.fields import DateTimeField
>>> DateTimeField().clean('2015-06-04 13:00')
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)

See! That's wrong. Sort of. Not Django's fault. What I need to do is to convert that datetime object into one that is timezone aware on the Europe/Berlin timezone.

In old versions of pytz, specifically <=2014.2 you could do this:

>>> import pytz
>>> pytz.VERSION
'2014.2'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>)

But in modern versions of pytz you can't do that because if you don't use the pytz.timezone instance to localize it will use the default version which might be one of those crazy "Local Mean Time" which they used a 100 years ago. E.g.

>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> tz = pytz.timezone('Europe/Berlin')
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>)

See, it's that crazy LMT+0:53:00 that's oft talked of on Stackoverflow!

Here's the trick

The trick is to use pytz.timezone(MY TIME ZONE NAME).localize(MY NAIVE DATETIME OBJECT). When you use the .localize() method pytz can use the date to make sure it uses the right conversion for that named timezone.

And in the case of our overly smart django.form.fields.DateTimeField it means we need to convert it back into a naive datetime object and then localize it.

>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date = date.replace(tzinfo=None)
>>> date
datetime.datetime(2015, 6, 4, 13, 0)
>>> tz = pytz.timezone('Europe/Berlin')
>>> tz.localize(date)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CEST+2:00:00 DST>)

That was much harder than it needed to be. Timezones are hard. Especially when you have the human element of people typing in things and just, rightfully, expect the system to figure it out and get it right.

I hope this helps the next schmuck who has/had to set aside an hour to figure this out.

django-pipeline + django-jinja

04 October 2015 2 comments   Django


Do you have django-jinja in your Django 1.8 project to help you with your Jinja2 integration, and you use django-pipeline for your static assets?
If so, you need to tie them together by passing pipeline.templatetags.ext.PipelineExtension "to your Jinja2 environment". But how? Here's how:

# in your settings.py


from django_jinja.builtins import DEFAULT_EXTENSIONS

TEMPLATES = [
    {
        'BACKEND': 'django_jinja.backend.Jinja2',
        'APP_DIRS': True,
        'OPTIONS': {
            'match_extension': '.jinja',
            'context_processors': [
                ...
            ],
            'extensions': DEFAULT_EXTENSIONS + [
                'pipeline.templatetags.ext.PipelineExtension',
            ],
        }
    },
    ...

Now, in your template you simply use the {% stylesheet '...' %} or {% javascript '...' %} tags in your .jinja templates without the {% load pipeline %} stuff.

It took me a little while to figure that out so I hope it helps someone else googling around for a solution alike.

django-semanticui-form

14 September 2015 2 comments   Python, Django

https://github.com/peterbe/django-semanticui-form


I'm working on a (side)project in Django that uses the awesome Semantic UI CSS framework. This project has some Django forms that are rendered on the server and so I can't let Django render the form HTML or else the CSS framework can't do its magic.

The project is called django-semanticui-form and it's a fork from django-bootstrap-form.

It doesn't come with the Semantic UI CSS files at all. That's up to you. Semantic UI is available as a big fat bundle (i.e. one big .css file) but generally you just pick the components you want/need. To use it in your Django templates simply, create a django.forms.Form instance and render it like this:

{% load semanticui %}

<form>
  {{ myform | semanticui }}
</form>

The project is very quickly put together. The elements I intend to render seem to work but you might find that certain input elements don't work as nicely. However, if you want to help on the project, it's really easy to write tests and run tests. And Travis and automatic PyPI deployment is all set up so pull requests should be easy.

Use closure for your Django context processors

09 May 2015 11 comments   Python, Django


The idea with template context processors in Django is to inject some defaults thing to be available when rendering a template that is rendered with a request.

I.e. instead of...:

def view1(request):
    context = {
        'name': 'View 1', 
        'on_dev_server': request.get_host() in settings.DEV_HOSTNAMES
    }
    return render(request, 'view1.html', context)

def view2(request):
    context = {
        'name': 'View 2', 
        'other': 'things', 
        'on_dev_server': request.get_host() in settings.DEV_HOSTNAMES
    }
    return render(request, 'view2.html', context)

And in your nominal templates/base.html you might have something like this:

  ...
  <footer>
  <p>&copy; You 2015</p>
  {% if on_dev_server %}
    <p color="red">Note! We're currently on a dev server!</p>
  {% endif %}
  </footer>
  ...

Instead you do this trick; in your settings.py you write down the list of defaults plus the one you want to always have available:

TEMPLATE_CONTEXT_PROCESSORS = (
    "django.contrib.auth.context_processors.auth",
    "django.template.context_processors.static",
    "myproject.myapp.context_processors.debug_info",
)

And to accompany that you define your myprojects/myapp/context_processors.py like so:

def debug_info(request):
    return {
        'on_dev_server': request.get_host() in settings.DEV_HOSTNAMES,
    }

So far so good.

However, there's a problem with this. Two problems in fact.

First problem is that when all the templates in your big complicated website renders, it's quite possible that some pages don't need everything you set up in your context processors. That might mean a heck of a lot of extra computation when it won't ever be displayed.

For example, I have a project where most pages have a sidebar where I show "Trending Events" which is something I compute in a context_processors.py function called def sidebar_events(request):. But the sidebar is not always shown and on the pages where it's not shown it's a waste to compute the stuff that sidebar_events computes. Also, I have management pages which uses a totally different base.html template. So there's a big chance you're wasting precious CPU.

Another problem is that of code-readability (aka. how frustrating is this to debug for someone else or yourself after months of idle activity). If you're skimming through your base.html and you see this "random" variable called on_dev_server it's very very hard to tell where the heck that's defined. Hopefully grepping the whole source code is a way to go. A much better way to solve that problem would be sensible namespace naming.

And also, by being too liberal with globally scoped variables there's a chance you might clash from a different piece of functionality that uses the same variable names. That chance is smaller when you use namespaces.

So, to remedy this, let your template context processor functions return closures. It wraps the request automagically.

Let's rewrite our trivial example from above, the context_processors.py should now look like this:

def debug_info(request):
    def inner():
        return {
            'on_dev_server': request.get_host() in settings.DEV_HOSTNAMES,
        }
    return {'debug_info': inner}

Now executing that becomes more optional and more deliberate in the template instead. E.g.

  ...
  <footer>
  <p>&copy; You 2015</p>
  {% set debug_info = debug_info() %}
  {% if debug_info['on_dev_server'] %}
    <p color="red">Note! We're currently on a dev server!</p>
  {% endif %}
  </footer>
  ...

This makes it more explicity which is a good thing. It also has the potential to be avoided if the stuff in there isn't needed in some templates.

Almost premature optimization

02 January 2015 0 comments   Python, Web development, Django


In airmozilla the tests almost all derive from one base class whose tearDown deletes the automatically generated settings.MEDIA_ROOT directory and everything in it.

Then there's some code that makes sure a certain thing from the fixtures has a picture uploaded to it.

That means it has do that shutil.rmtree(directory) and that shutil.copy(src, dst) on almost every single test. Some might also not need or depend on it but it's conveninent to put it here.

Anyway, I thought this is all a bit excessive and I could probably optimize that by defining a custom test runner that is first responsible for creating a clean settings.MEDIA_ROOT with the necessary file in it and secondly, when the test suite ends, it deletes the directory.

But before I write that, let's measure how many gazillion milliseconds this is chewing up.

Basically, the tearDown was called 361 times and the _upload_media 281 times. In total, this adds to a whopping total of 0.21 seconds! (of the total of 69.133 seconds it takes to run the whole thing).

I think I'll cancel that optimization idea. Doing some light shutil operations are dirt cheap.

uwsgi and uid

03 November 2014 4 comments   Python, Linux, Django


So recently, I moved home for this blog. It used to be on AWS EC2 and is now on Digital Ocean. I wanted to start from scratch so I started on a blank new Ubuntu 14.04 and later rsync'ed over all the data bit by bit (no pun intended).

When I moved this site I copied the /etc/uwsgi/apps-enabled/peterbecom.ini file and started it with /etc/init.d/uwsgi start peterbecom. The settings were the same as before:

# this is /etc/uwsgi/apps-enabled/peterbecom.ini
[uwsgi]
virtualenv = /var/lib/django/django-peterbecom/venv
pythonpath = /var/lib/django/django-peterbecom
user = django
master = true
processes = 3
env = DJANGO_SETTINGS_MODULE=peterbecom.settings
module = django_wsgi2:application

But I kept getting this error:

Traceback (most recent call last):
...
  File "/var/lib/django/django-peterbecom/venv/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 182, in _cursor
    self.connection = Database.connect(**conn_params)
  File "/var/lib/django/django-peterbecom/venv/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect
    conn = _connect(dsn, connection_factory=connection_factory, async=async)
psycopg2.OperationalError: FATAL:  Peer authentication failed for user "django"

What the heck! I thought. I was able to connect perfectly fine with the same config on the old server and here on the new server I was able to do this:

django@peterbecom:~/django-peterbecom$ source venv/bin/activate
(venv)django@peterbecom:~/django-peterbecom$ ./manage.py shell
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from peterbecom.apps.plog.models import *
>>> BlogItem.objects.all().count()
1040

Clearly I've set the right password in the settings/local.py file. In fact, I haven't changed anything and I pg_dump'ed the data over from the old server as is.

I edit edited the file psycopg2/__init__.py and added a print "DSN=", dsn and those details were indeed correct.
I'm running the uwsgi app as user django and I'm connecting to Postgres as user django.

Anyway, what I needed to do to make it work was the following change:

# this is /etc/uwsgi/apps-enabled/peterbecom.ini
[uwsgi]
virtualenv = /var/lib/django/django-peterbecom/venv
pythonpath = /var/lib/django/django-peterbecom
user = django
uid = django   # THIS IS ADDED
master = true
processes = 3
env = DJANGO_SETTINGS_MODULE=peterbecom.settings
module = django_wsgi2:application

The difference here is the added uid = django.

I guess by moving across (I'm currently on uwsgi 1.9.17.1-debian) I get a newer version of uwsgi or something that simply can't just take the user directive but needs the uid directive too. That or something else complicated to do with the users and permissions that I don't understand.

Hopefully, by having blogged about this other people might find it and get themselves a little productivity boost.