Djangocon Day One

As I promised, here’s some (semi-live) blogging from Djangocon.

The first talk of the day was Scaling the World’s Largest Django Application given by the guys at Disqus (slides). The basic gist is how they scaled up Django to handle a bajillion requests per month. One thing that I noticed is that to reach this scale, they are effectively throwing away referential integrity in their database. When asked about it, one of the guy mentioned that they have some scripts to verify that things have some level of integrity which I thought was a little scary. I was happy to see that they are basically measuring everything. On top of the usual continuous integration stuff like unit tests, lint (Pyflakes, actually) they are logging every query and every traceback using a package they opensourced named Sentry. In addition, the brief mention of the save() method’s concurrency issues was interesting.

Jeff Balogh’s talk on Switching addons.mozilla.org [AMO] from CakePHP to Django (slides – pdf) was on the similar subject of switching the high traffic site AMO to Django from PHP. My favorite point from this talk is how they handle the classic stale DB replication problem of a user submitting new data (to the master) and then not seeing their data (from the slave). Basically, they use a custom Django router that detects a POST and switches that user’s session to always read and write from the master which is pretty damn clever. Mozilla also de-normalized their database in order store a reference to the latest add-on (a classic problem). However, Jeff did mention that they might switch that to storing that info in cache. The main gist of this talk was cache everything.

Russell Keith-Magee’s So you want to be a core developer? and James Bennett’s Topics of Interest (going on now — see live blogging) are both on the future of Django and how to get involved and what needs to change in the Django community. The Django people need more people to get involved. James said there are only 14 committing developers and Russell said that to get Django 1.2 out the door he had to review tickets for 5 hours a night for 2 weeks because nobody else was doing it. As James said, their bus numbers — the number of people who get hit by a bus and then you’re screwed — is frighteningly small. There’s only 2 people who know Django Oracle support.

Themes from the con
  • Git seems to be winning in the DVCS space. It really seems like git and github are taking over and Bazaar and Mercurial are being left at the wayside. I’ve seen tons of links to github and zero to bitbucket or launchpad. James Bennett just now briefly mentioned bitbucket and launchpad.
  • Django developers think that database referential integrity is overrated. Multiple people mentioned that integrity is a farce when things scale insanely.
  • Deploy early and often. The Mozilla guys deploy at least weekly. The Disqus guys deploy daily or more often. Although Jeff Balogh from Mozilla didn’t say it, I wonder if this means that like Disqus they are running out of trunk rather than branching and releasing.
  • Celery (see my previous post) is awesome and everybody seems to be adopting it.
  • If you aren’t using Pip and Virtualenv, you should (previous post). However, you probably shouldn’t deploy a production box from Pip and Pypi.

Extending Distutils for Repeatable Builds

Distutils is Python’s built-in mechanism for packaging and installing Python modules. It is very convenient for packaging up your source code, scripts and other files and creating a distribution to be uploaded to pypi as I’ve mentioned before. Distutils was discussed (pdf) at PyCon last year and it looks like there are efforts afoot to improve it to add some much needed features like unittesting and metadata. Add-on packages like pip add additional features like uninstallation and dependency management but nothing guarantees that your users have it. Although Python’s packaging and distribution model beats PHP’s hands down, there is still a lot of room for improvement to make it seamless.

Release management

In essence, these issues and enhancements boil down to making release management easier. When releasing your package, you want to make sure that it contains all the appropriate files, is tested and can be installed easily. Distutils helps with the installation, pip with the dependencies and virtualenv (a topic for a later post) helps a lot with testing package interactions. But what about unittests? What about cleaning up after setup.py? What about generating documentation or other files?

Extending distutils

Until all these features get put into distutils, you have to extend it yourself in setup.py. Fortunately, this is not very complicated and can buy you some reliability in your build process. Adding a command like python setup.py test is pretty trivial:

The same sort of functionality could be used to verify any prerequisites not already checked by distutils or pip, generate documentation without external dependencies like Make (although Django supports Python 2.3 before this functionality was available) or to create a uniform way to take source control diffs and submit patches. Executing these commands from one place makes the whole process more consistent and easy to understood. Hopefully the new enhancements to distutils will make the process even better.

Packaging and Sharing Django Applications

When I packaged up the first version of rpc4django and sent it to the package index (pypi) with distutils, I thought I was doing everything right. I’d packaged my django application (if you’re unclear on applications vs. projects, see the terminology) in such a way that it could be used with a variety of projects and required almost no configuration to get off the ground. I filled out all the metadata in the distutils setup() function and I made sure that my package could be installed via easy_install. I even implemented a fix that I stole from Django’s setup.py to make sure the “install data” (including Django templates) get installed in the right place on the flawed Mac OS X versions of python. Little did I know, I was just getting started.

Easy Install

Easy install is a pretty convenient way to install packages and making your packages easier to install will make sure that they get installed more frequently by more users. Provided you have used distutils or setuptools for your python package and have a working setup.py, you should be able to easily upload your package to the python package index and install it with easy install. Where easy install doesn’t shine, however, is how it automatically creates .egg files out of packages.

Making eggs out of libraries seems like a great idea on paper, but when a Django application is packaged as an egg, the default settings.py TEMPLATE_LOADERS will not be able to load it. The additional loader django.template.loaders.eggs.load_template_source must be enabled. Easy install will only create an egg if setuptools “determines” that it can safely create it. Alternatively, the zip_safe setting can be set to False in the setuptools.setup() function. To make it easier on the users, making the easy install unpack all your templates (one day setup tools will identify Django apps) will require less configuration and get more users.

This issue, which I noticed after the first rpc4django release will be fixed in the next release in a few days.

Package Index (Pypi)

I got my module into pypi and then I realized that there’s a lot of issues that go with it. I made sure that setuptools/distutils summary and description were set, but until I read the somewhat erroneously named cheeseshop tutorial, I didn’t understand that pypi sometimes subtly and sometimes overtly steers people towards “better” packages.

Start with the distutils tutorial on distributing packages.This will show you what needs to go into the setup.py and how to register, which really is the first step. However, people browsing pypi can come across a well documented library with installation instructions and installers directly on the pypi page like setuptools, or they come across a listing that is little more than link to the package’s webpage. To get that pretty documentation, you have to spend a few minutes to learn reST and that has to be set into your long_description in setup.py. To steal a good tip from setuptools, put the structured text in the README and read it into your setup.py.

Cheesecake

The last aspect for today has to do with searching for packages on pypi. When you search, the results are returned based on a “score”. At first, like other search engines, I thought this score might be some sort of relevance number related to the search. It isn’t. It is the cheesecake index. This score is based on the quality (Kwalitee) of the package [Edit: apparently the score is based on relevance and the score listed is not based on the cheesecake index]. If python setup.py install works with your package, it helps your score. If you have unit tests and a docs/ directory, you get some more points. If you get a good pylint score or your code complies with PEP8, you get some more.

Overall, it’s not a bad idea. Linting and pep8 and a changelog file don’t make a package great, but easy installation, good documentation and unit tests are probably pretty strongly correlated with a higher quality package.

Upcoming RPC4Django Release

By my own calculations, rpc4django-0.1.0 got about 54% of the cheesecake index and that merits the #7 spot when people search for “rpc”. Even before I knew about cheesecake, I was running pep8 and pylint and I’m pretty good about installation, documentation and unit testing. Expect improvements in the next version along with the following features (which will be in my CHANGELOG for an extra couple cheesecake points):

  • reST documentation instead of what I rolled myself
  • the rpc method summary could use some reST
  • fix my easy install problems with templates
  • package improvements, including to my pypi page
  • test compatibility with more systems and python versions — it looks good down to 2.4, but there are some unit test failures related to the way exceptions are presented with repr()
  • analyzing post data to see if it is json or xml to improve the dispatching library compatibility — it’ll still use content type if it is set properly
  • if I get a chance, I’ll add a javascript library that allows testing the methods directly from the method summary