Django password security
Revision 16453 of Django improved the security of the password algorithm for the first time since the 0.90 days of years ago. This is a brief discussion on that and Django password schemes in general.
Worth its salt
Most people know that “good” passwords are at least 8 characters and contain an uppercase, lowercase, and number at least. Let’s ignore special characters for now. This yields about 47 bits of entropy. The entire set of 8 character passwords could be reversed in about 36 hours assuming ~1B hashes per second. You could just burn your new rainbow table to a DVD and break everyone’s password. Easy as pie.
Unfortunately for password crackers, this hasn’t worked for years because of salted passwords. Django uses a system of salted passwords where when a user types in their password, the user’s random salt gets prepended to the password before being compared with the salted and hashed password which is stored in the database. It works like this:
sha1(salt + password) = salted_hashed_password
Each user gets a different random salt and this way a leaked password database cannot be easily reversed using a rainbow table. Django switched from using a 5 character salt composed of [a-f0-9] (20 bits of entropy) to a 12 character salt made up of [a-zA-Z0-9] (over 71 bits). Formerly, the salt was simply made of the first 5 characters of a sha1 hash of a call to random.random() for about a million unique possible salts. Breaking an old password database was about a million times harder than unsalted passwords which made it prohibitive but not impossible. The new system is considerably more complex.
Remaining weaknesses
If your database leaked, salted passwords will protect the entire set of hashes from being reversed. However, it will not protect a specific hash from being reversed since the salt is stored in the clear. You do not need to reverse every hash to do some damage. You can just reverse the administrator user’s password. If you look what gets stored in Django’s User table, it looks like this:
sqlite> SELECT username, password FROM auth_user LIMIT 1; admin|sha1$760d0$87b86efd5a9b6f614a9854fda98471ab82d1a
The password field stores the method of hashing (sha1), the salt and the hashed password separated by ‘$’. Given the user’s salt, we could easily check all 8 character passwords for that salt in the same 36 hours. This doesn’t change if the salt is 5 or 12 characters. Because sha1 is designed to be “fast” since it is also used for things like checksums, it doesn’t really offer much protection here. A better solution is to use a “slow” hashing algorithm that is designed specifically for password hashing like bcrypt or PBKDF2.
More generally
There have been numerous tickets (#5787, #5600, #15367) and proposals and even a project that duckpunches Django to add bcrypt. Parts of these proposals — namely using a system source of randomness where available and a longer salt — have already been implemented as part of changeset 16453. A long term solution is to make the encryption pluggable similar to the way database backends are pluggable. This makes it easy to swap out a particular encryption algorithm if weaknesses are discovered and let different installations have different algorithms based on different requirements.
Django security overview docs
A brief security overview I collaborated on made it into the Django trunk! Let me know if you think I’m missing something.
If I somehow manage to get diligent, this might be just a precursor for a talk at Djangocon.
Django-taggit versus Django-tagging
For some time, there was one re-usable Django tagging application — django-tagging — and that was it. If you didn’t like it, you rolled your own. It was certainly a decent application. You could pretty easily tag anything and provided some decent features out of the box. However, very recently, another app showed up: django-taggit. This post is going to compare and contrast the two and why I decided to switch my work project to django-taggit.
Old school tagging
Back in March when we were just starting out, I went with django-tagging because that was the only tagging app around. Back then I didn’t think about it, but now 6 months later there have been no new updates or releases to django-tagging. I think this really led to the creation of django-taggit. Django-tagging had some nice features that were pretty useful like a template tag for a tag cloud. While django-tagging got me up and running quickly, it wasn’t without its annoyances. Deleting an object left dangling references to its tags since those would not be deleted because it used generic foreign keys. There was an issue filed for this back in 2008 that was never resolved. It resulted in me having to override the delete method of every object that got tagged. If I had 3 objects that got tagged, I repeated the same snippet 3 times!
# Every tagged model with django-tagging needs this code to properly clean up
def delete(self):
# Deleting all asociated tags.
Tag.objects.update_tags(self, None)
super(MyModel, self).delete()
The fact that such seemingly important functionality had not been added since 2008 pointed to the fact that django-tagging had been left fallow for a while.
New school tagging
At Djangocon, I first heard about django-taggit. Immediately, I liked the fact that the docs were a little bit more fleshed out than django-tagging’s docs. In addition, I found the search API for taggit to be much more intuitive.
## Django-tagging: requires a level of indirection through the TaggedItem model
from tagging.models import Tag, TaggedItem
from myapp.blog.models import Post
hacking_tag = Tag.objects.get(name='hacking')
TaggedItem.objects.get_by_model(Post, hacking_tag)
## Django-taggit: look directly through the model you are searching
from myapp.blog.models import Post
Post.objects.filter(tags__name__in=["hacking"])
The little things
The nicest part about django-taggit is that it integrates much better with the admin. To tag an object in the admin with django-tagging, I would need to figure out the primary key id of the object I want to tag and then go to the TaggedItem admin and then tag it by id. It was unintuitive and error prone. With django-taggit, it’s as easy as editing an object the normal way. A “tags” field shows up and it explains that it simply accepts a comma separated list.
The one feature I liked from django-tagging that taggit doesn’t implement is the tag cloud. I can understand that different folks want clouds done slightly differently and that it’s not a feature that has one right way to do it. However, it was pretty damn convenient.
All told, django-taggit seems to do the job that I want it to do and it stays out of the way otherwise. It’s much more intuitive to setup and use. It’s actively maintained and the docs are better. There’s nothing not to like. At the same time, I don’t want to say it’s the end all. I’d love to see the django-tagging guys come back with some great new features because apps can always be better.
Django Security Update September 2010 Edition
Yesterday, the Django team released a security update. The post basically says it all. If you are on Django 1.2.1, UPDATE NOW!
The details
The issue is a standard non-persistent cross site scripting (XSS) exploit. Django explicitly trusted the cross site request forgery token which is supposed to be a hexdigest based on your SECRET_KEY in settings.py. However, cookies are simply stored on the client filesystem and they should generally be considered untrusted user input.
Exploit howto
First, setup a simple Django project that has the admin enabled. Visit your simple website and you’ll be issued a CSRF token that is saved in your cookie. Simply edit the token (with Edit this Cookie for Chrome maybe) and enter a script tag and save it. Reload the page and the script tag you entered will get echoed back unescaped and executed.
Edit: I should note that this vulnerability affects any form that uses a CSRF token, not just the admin.
Django and asynchronous jobs
I’m going to focus on RabbitMQ & Celery, what it buys you and why and when to use it. Eric’s blog always seems to be a couple months ahead of mine, so instead of re-hashing what he wrote, I’ll let you read his blog and then come back and finish my complimentary post. Go there now. I’ll wait.
Here’s the mini-rehash for those who didn’t actually go read it. Basically, RabbitMQ is a general purpose messaging queue system based on the AMQP protocol. When you need something to get executed, you simply queue up a message with RabbitMQ. It won’t get lost if Rabbit restarts, you can be pretty confident that your tasks will eventually get executed or you’ll be able to find out why they didn’t. Celery (and django-celery) is a Python library for queuing up tasks to be executed and for actually executing these tasks asynchronously. It gets its tasks from RabbitMQ although it does work with other queues as well.
Why RabbitMQ & Celery?
Not everyone needs to execute asynchronous tasks. If you’re reading this and asking yourself “why would I ever need that,” you probably don’t need it. My project involves data analysis that takes minutes for each item in the database. In addition, I periodically re-analyze items in the database. At first, I built a cronjob (using Django management commands) which would wake up and check for new items queued in the database every minute. Then, I built a second cronjob for re-analysis. As the system grew, this became hokey and had issues anytime the database crashed during this analysis. Anytime your Django view connects to some external service that may or may not be up (source control, web services) or may take a while (shell commands, massive database queries, etc.), it’s probably best done asynchronously. It’s also great for anything that needs to be executed periodically at a particular interval (caching, expensive calculations). Celery also provides built-in ways to retry failed tasks a specified number of times at a specified interval which is particularly useful for ensuring that things actually get done.
Setup & integration with Django
My Django views didn’t change much after I switched to Celery. Instead of adding an item to the database when I want to queue up an item for analysis, I simply queue up the execution of a @task with django-celery. The particular tasks that your application can handle are simply placed in a tasks.py file in your Django applications. To process the tasks, you simply run python manage.py celeryd. It can also be setup to run at boot time using init.d (Ubuntu/Debian, Redhat/Fedora). Celeryd clients can connect to Rabbit from the same server as RabbitMQ or a different server or servers in order to distribute the load.
##old views.py
def queue_task(request, data):
"""
A cronjob will poll for new queued items periodically and process them
"""
item = NewQueueItem(data)
item.save()
render_to_response('success.html')
## new views.py
from tasks import process_item
def queue_task(request, data):
process_item.delay(data)
render_to_response('success.html')
## tasks.py
from celery.decorators import task
@task(max_retries=3, default_retry_delay=5*60)
def process_item(data, **kwargs):
...
The nitty gritty
One useful bit that I had to dig to find in the Celery documentation was on locking a particular task so that two different workers don’t work on the same thing at the same time. This could happen in my application if two users queued up analysis on the same item for example. The trick is to use Django’s cache framework and lock a particular item in the cache.
A word on Djangocon
Unlike last year, I convinced work to send me to Djangocon in Portland next week. I’ll probably do some live blogging on some of the interesting topics. If you’ll also be there, say hi!


