Securing a Django Site in Production
I was setting up a Django site for somebody recently and got asked the question, “is it possible for someone to hack my site?”. The answer, of course, is yes. To some degree, this is unavoidable. If somebody is willing to expend the time, effort and money, it is almost impossible to have a complex site that is perfectly secure. Even security “experts” can get it wrong. However, this got me thinking about the steps to secure a Django site.
Django does a good job of being reasonably secure by default. Unlike some other frameworks where you have to explicitly use CSRF tokens, Django uses them unless you tell it not to. Django escapes data from your templates automatically and is generally safe from SQL injection. The framework contains the building blocks to build a secure site, but quite often the site is deployed on a shaky foundation.
Securing the admin
For maximum security, the Django admin site should probably always be deployed on a web server running HTTPS. There’s a good guide on setting up SSL for the admin. Redirecting requests for /admin to HTTPS is one way. Another way is setup the admin on a subdomain like admin.example.com and handle them like that. This is what it looks like in Nginx:
server {
listen 80;
server_name example.com www.example.com;
...
}
server {
listen 443;
server_name admin.example.com;
ssl_certificate sslcert.crt;
ssl_certificate_key sslcert.key;
...
}
Using this, you can proxy to two different Django instances: one handles the site over HTTP and one handles just the admin over HTTPS. Depending on your exact setup, you probably also want to mark the cookie as secure.
While the admin always needs security, some sites could also benefit from security outside of the admin if they’re handling user details, email addresses or other things. As an application developer, you need to build in that security — Django doesn’t know what you need to protect. Just remember the next time you login to your Django admin screen on a wifi hotspot at Starbucks that anybody can run something like Firesheep or Wireshark and capture your credentials. It’s amazing how many notable sites get this wrong. It reminds me of the wall of sheep.
Securing the server
It is amazing how many people put out a server with an inadequate firewall. Either they leave their database port wide open, memcached port open (this is REALLY bad — see here) or in some other way greatly increase the possible attack surface. While I generally knew what Amazon Web Services (AWS) could do as far as hosting, I had never used them before recently and I was impressed by their security. AWS makes configuring the firewall super easy and by default, only port 22 is open and SSH only accepts keys not passwords. That’s fairly secure by default! It gives a simple web GUI to open select ports and only to select machines. For example, if you host your database on a different server than your web server, only the web server should be able to connect to the database, not the whole internet. Also, Amazon S3 can serve its files over HTTPS as well. It’s a rather handy feature. I expect Rackspace is fairly similar in most regards.
Django security update
There were a couple fixes and changes in Django 1.2.5, but the main change was to CSRF exceptions to AJAX requests. The decision to remove the exception — despite backwards incompatibility — was the right move considering that the assumption that XmlHttpRequests could only come from the browser is no longer true (was it ever?). However, this release makes me wonder how many site authors didn’t bother to change much and just put @csrf_exempt above their web services just to get their site working again quickly with the new version.
Note: I secured the wordpress admin using the guide here and the WordPress HTTPS plugin. It’s a self-signed cert so I’m only getting maybe 75% of the security pixie dust, but I can deal with that.
Edit (September 14, 2011): Take a look through the Django security docs which your humble blogger helped write.
Lessons learned with RabbitMQ & Celery
A couple months ago, I posted about using asynchronous jobs with RabbitMQ and Celery. This is a follow-up with some lessons I learned the hard way.
Celery settings for good performance
Do not run millions of jobs with DEBUG = True. You will run out of memory — even if you have 48GB of it. On top of that, you might want to consider the celeryd option –maxtasksperchild.
Be extra careful with CELERY_SEND_TASK_ERROR_EMAILS = True. I sent 9000 emails to myself in a couple minutes. My phone which syncs my email really didn’t like it. I’m running with CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True and I’m looking to get a dashboard view of it with django-sentry. I think I’m almost there.
Persistence & disk space
RabbitMQ stores messages intelligently so you don’t have to keep track of them. It’s very good at this. However, problems can arise when you’re queuing tasks faster than you’re processing them. Use rabbitmqctl which ships with RabbitMQ. If you see things like this:
% /usr/sbin/rabbitmqctl list_queues Listing queues ... celery 9958124 celeryevent 6841 ...done.
There’s probably going to be some issues. Ten million messages have to be stored somewhere. By default on CentOS, they’re stored in /var. RabbitMQ really doesn’t like it when you run out of disk space for it to write persistent messages so be careful.
The new persistence engine in RabbitMQ 2.x handles this much better than before. In 1.x, the persistence log has to copy itself over every so often and copying multi-GB files all the time really slows the queue to a halt and adds to the problem of not processing tasks fast enough. On top of this, RabbitMQ writes a ton of logs, which is a good thing, but can backfire when disk runs out.
Task sets
Celery’s task sets work like magic. Instead of this:
from tasks import process_item
for item in items:
process_item.delay(item)
Use this:
from celery.task.sets import TaskSet
from tasks import process_item
job = TaskSet(tasks=[process_item.subtask((item,)) for item in items])
job.apply_async()
Note: the first parameter to subtask is a tuple of arguments to process_item.
General tips
- If you can make your tasks re-entrant — meaning they can be run with the same parameters multiple times without any side effects — your life will be a lot easier. Django’s get_or_create works wonders.
- Try to break tasks into smaller subtasks. Instead of one 45 minute task, break it into 2,000 tasks that take a second or two.
- If you are clever with your logging, debugging things will be a lot easier. This is generally always true, but it becomes much more apparent with celery’s concurrency.
RPC4Django v0.1.8 is Available
Go get it!
Changes
- Added cross referenced Sphinx based documentation
- Fixed bug #570852 which caused incompatibilities with MongoDB because of the name class with the variable is_rpcmethod.
- Fixed bug #658788 which caused CSRF issues with serve_rpc_request.
- Added out of the box authentication as per the blueprint on Launchpad.
Django-taggit versus Django-tagging
For some time, there was one re-usable Django tagging application — django-tagging — and that was it. If you didn’t like it, you rolled your own. It was certainly a decent application. You could pretty easily tag anything and provided some decent features out of the box. However, very recently, another app showed up: django-taggit. This post is going to compare and contrast the two and why I decided to switch my work project to django-taggit.
Old school tagging
Back in March when we were just starting out, I went with django-tagging because that was the only tagging app around. Back then I didn’t think about it, but now 6 months later there have been no new updates or releases to django-tagging. I think this really led to the creation of django-taggit. Django-tagging had some nice features that were pretty useful like a template tag for a tag cloud. While django-tagging got me up and running quickly, it wasn’t without its annoyances. Deleting an object left dangling references to its tags since those would not be deleted because it used generic foreign keys. There was an issue filed for this back in 2008 that was never resolved. It resulted in me having to override the delete method of every object that got tagged. If I had 3 objects that got tagged, I repeated the same snippet 3 times!
# Every tagged model with django-tagging needs this code to properly clean up
def delete(self):
# Deleting all asociated tags.
Tag.objects.update_tags(self, None)
super(MyModel, self).delete()
The fact that such seemingly important functionality had not been added since 2008 pointed to the fact that django-tagging had been left fallow for a while.
New school tagging
At Djangocon, I first heard about django-taggit. Immediately, I liked the fact that the docs were a little bit more fleshed out than django-tagging’s docs. In addition, I found the search API for taggit to be much more intuitive.
## Django-tagging: requires a level of indirection through the TaggedItem model
from tagging.models import Tag, TaggedItem
from myapp.blog.models import Post
hacking_tag = Tag.objects.get(name='hacking')
TaggedItem.objects.get_by_model(Post, hacking_tag)
## Django-taggit: look directly through the model you are searching
from myapp.blog.models import Post
Post.objects.filter(tags__name__in=["hacking"])
The little things
The nicest part about django-taggit is that it integrates much better with the admin. To tag an object in the admin with django-tagging, I would need to figure out the primary key id of the object I want to tag and then go to the TaggedItem admin and then tag it by id. It was unintuitive and error prone. With django-taggit, it’s as easy as editing an object the normal way. A “tags” field shows up and it explains that it simply accepts a comma separated list.
The one feature I liked from django-tagging that taggit doesn’t implement is the tag cloud. I can understand that different folks want clouds done slightly differently and that it’s not a feature that has one right way to do it. However, it was pretty damn convenient.
All told, django-taggit seems to do the job that I want it to do and it stays out of the way otherwise. It’s much more intuitive to setup and use. It’s actively maintained and the docs are better. There’s nothing not to like. At the same time, I don’t want to say it’s the end all. I’d love to see the django-tagging guys come back with some great new features because apps can always be better.
Django Security Update September 2010 Edition
Yesterday, the Django team released a security update. The post basically says it all. If you are on Django 1.2.1, UPDATE NOW!
The details
The issue is a standard non-persistent cross site scripting (XSS) exploit. Django explicitly trusted the cross site request forgery token which is supposed to be a hexdigest based on your SECRET_KEY in settings.py. However, cookies are simply stored on the client filesystem and they should generally be considered untrusted user input.
Exploit howto
First, setup a simple Django project that has the admin enabled. Visit your simple website and you’ll be issued a CSRF token that is saved in your cookie. Simply edit the token (with Edit this Cookie for Chrome maybe) and enter a script tag and save it. Reload the page and the script tag you entered will get echoed back unescaped and executed.
Edit: I should note that this vulnerability affects any form that uses a CSRF token, not just the admin.


