Extending Distutils for Repeatable Builds

Distutils is Python’s built-in mechanism for packaging and installing Python modules. It is very convenient for packaging up your source code, scripts and other files and creating a distribution to be uploaded to pypi as I’ve mentioned before. Distutils was discussed (pdf) at PyCon last year and it looks like there are efforts afoot to improve it to add some much needed features like unittesting and metadata. Add-on packages like pip add additional features like uninstallation and dependency management but nothing guarantees that your users have it. Although Python’s packaging and distribution model beats PHP’s hands down, there is still a lot of room for improvement to make it seamless.

Release management

In essence, these issues and enhancements boil down to making release management easier. When releasing your package, you want to make sure that it contains all the appropriate files, is tested and can be installed easily. Distutils helps with the installation, pip with the dependencies and virtualenv (a topic for a later post) helps a lot with testing package interactions. But what about unittests? What about cleaning up after setup.py? What about generating documentation or other files?

Extending distutils

Until all these features get put into distutils, you have to extend it yourself in setup.py. Fortunately, this is not very complicated and can buy you some reliability in your build process. Adding a command like python setup.py test is pretty trivial:

The same sort of functionality could be used to verify any prerequisites not already checked by distutils or pip, generate documentation without external dependencies like Make (although Django supports Python 2.3 before this functionality was available) or to create a uniform way to take source control diffs and submit patches. Executing these commands from one place makes the whole process more consistent and easy to understood. Hopefully the new enhancements to distutils will make the process even better.

Using Django for Intranet Applications

One thing I end up doing quite a bit of at work is developing custom web applications that sit on the company intranet. These aren’t your basic timecard application or company web portal. Instead, these customized applications do everything from reports and dashboards from our bug tracking software to providing a web service API to our test case management solution. This post is about chronicling some of our forays into intranet apps.

Every intranet app needs authentication

If your company is anything like mine, you have a huge Active Directory system and your webservers are protected by single-sign-on (SSO). Django plugs into that nicely with the RemoteUserMiddleware so that users don’t have to remember Yet Another Password for your app. My password to our off the shelf commercial bug tracking software is still “welcome” and my password to our “Agile” project and task tracker is “test”. I’m already on the corporate intranet. Why should I have to authenticate twice? With Django and RemoteUserMiddleware, your users are automatically logged in if they’re authenticated. It seems relatively trivial, but it greatly enhances the user experience to not have to remember the password to the /admin site.

Who needs /admin

It seems that virtually every application has some sort of need for admin functionality. We used to deploy phpMyAdmin on most of our web hosts to administer our content. It was dove hunting with a bazooka. There was little fine grained control and we ran into the same issue of re-authenticating to access the admin site. Now an admin site is not unique to Django and it exists in virtually every major web framework in every major language. This point can be taken as an overwhelming endorsement of using some framework over using no framework at all. However, Django’s admin site is easy and very customizable in case you need to pretty up your admin site because it will be accessed by a wider audience.

Intranet apps have a way of becoming internet apps

You never know management decides some piece of software that was never intended to be used externally suddenly is a must have for some outside group and it needs to be internationalized (into Japanese?!). Suddenly, all the code has to go through open source compliance, export compliance, code scans, and due to licensing restrictions management doesn’t want to ship with MySQL. Unfortunately, that software was written in Java and PHP with no framework and quite a few open source libraries that we couldn’t ship. The transition was much more painful than if we would have just used Django from the start.

Merging and splitting apps

I’ve released and deployed a number of web applications, but the real nightmare comes when apps are merged together or one app is broken apart. This is where Python’s packaging and the Django concept of splitting your project into multiple individual apps really shines. I’ve run into this from a couple of different sides. I’ve had to take multiple PHP applications and merge them together into a single deployment. There was a huge mess with including common code and the solutions aren’t great. You’re either messing with include_dir in php.ini — a huge nightmare when managing multiple deployments with different includes, or you’re stuck modifying every include statement to pickup libraries from a common location. Splitting apps up runs into similar issues. Packaging separable components into different apps used by your project really is the way to go and Django works with this very well.

I’ve developed web apps in Java, Perl and PHP and by far I’ve been happiest with Python and Django. There are other great frameworks out there for these languages and using them could definitely help alleviate some of these issues, but the Django solution fits together better than anything I’ve seen from Struts, CodeIgniter or one of the dozens of other frameworks out there. For me, there has to be a pretty compelling argument not to do new apps with Django.

Lessons Learned: Releasing A Django Application

As you may be aware, I released a Django application, RPC4Django, publicly a little while ago. This is what I learned.

Make It Easy

You efforts should be targeted at making adoption of your software as easy as possible. When developers look for a package that accomplishes a goal, they aren’t going to spend much time looking at any individual package. Usually a quick google search or a search in pypi and then they might spend a few minutes glancing at a package before moving on to the next one. There are a couple keys to making it easy:

  • Have easy documentation on installation, configuration and the license.
    This should be on both your webpage as well as on pypi. More on this later.
  • Give demo code or better yet, an actual demo.
    Quite often the one sentence summary on pypi (if you don’t provide a long description) is not sufficient to tell people why they should use your package.
  • List the package’s compatiblity.
    Considering very few packages are compatible between 2.5 and 3.0 or even 2.3 and 2.5, I’m very surprised by this. Nothing is more frustrating than thinking you’ve found what you need just to have installation (or worse, execution) fail.
Test Early and Test Often

When I released RPC4Django, it had a reasonable, but not great, unit tests with it. With the future releases, those improved significantly and a few more bugs were caught. Being able to quickly test python 2.4, 2.5 and 2.6 by having a fully built unit test suite was awesome. However, unit tests don’t catch everything. One bug that I did not catch for a while was a bug relating to how Django worked with mod_wsgi on Apache. No amount of unit testing or Django views testing (which is sort of a hybrid of unit and integration testing) would have caught this. It only got caught when the code was pushed to a production server.

Documentation Conundrums

Documentation usually requires keeping data in a number of places. Any project of considerable size has READMEs, a license file, HTML documentation, tutorials, a setup.py long description, class and module documentation, and more. There should be as few authoritative places for package information as possible. In the first version of RPC4Django, I rolled my own HTML documentation and and README basically told the user to read the real documentation. This solved the whole problem of duplicated documentation, but in a rather unenlightened way.

After reading more exhaustively about reST and looking at how other packages solve this problem, I found a better way. With reST, it is possible to include other documents like so:

What this allows for and what RPC4Django does now is to enable one document to be built of many documents. In this way, I keep the license in one file, the installation in another, the changelog in another and they all are included into my README which can be used to generate my HTML documentation.

If my project was even larger, I might do what both the Python project and the Django project do. They include reST documents in their source tree. These include walkthroughs, tutorials, introductions to classes and more. Sphinx, a python package built to document the python library itself, can create attractive documentation hierarchies directly from simple text documents (which are themselves human readable!). Python uses it for all of their documentation and the Django project uses it for basically the entire documentation section of their webpage — including tutorials. You wouldn’t know it since the pages are pretty attractive but they appear to be generated directly out of subversion on a nightly basis. This keeps all the documentation together and makes sure your website, HTML documentation and source documentation stay in sync.

Don’t Get Discouraged

Just because your package is now on pypi doesn’t mean people are going to flock to it and download it. I think I’ve put together a pretty solid and useful package but I have only a few downloaders and I haven’t gotten much feedback. I intend to power through and start a new project.

Deploying Django Powered Web Applications

Terminology

Firstly, there’s a little problem of terminology to take care of. What many people call a web application, Django calls a “project”. Instead, the Django team uses the term “application” to describe a web application component that can be deployed into one of many projects. To describe it in the wordpress paradigm, wordpress could be a Django project (if it weren’t written in php) but the blogging component, the tagging component and the themes manager might all be separate Django applications.

This distinction really pushes the concept of re-using components. For example, once some one were to write a tagging Django application, the same app could be deployed for photo tagging, blog tagging and other types of content management. These applications are supposed to be completely contained and include their seed data (fixtures), database models, and templates.

The issue and what brings me to the main part of my post is what to do with the media? Should an application include its own images, css, javascript? What’s the best way to deploy them in a convenient way for being served?

Deployment

Packaging python modules is a relatively trivial task and there is a well defined approach for it: distutils. This creates a more or less standard installer that allows anyone to install your package with a single command. From there, it can be put into the python package index and easy_install can install your python package (and dependency packages) with a single command. This works great for python packages like Django and BeautifulSoup, but how would it work for a whole web application? It made me wonder how Ellington, the flagship Django product, does it.

When distributing a python module, it makes sense to support as many platforms and configurations as possible. However, when deploying a full web application into a production environment, it makes sense to restrict your platform to what is tried and true. Ellington’s website, for example states:

Ellington takes advantage of the most secure and flexible open-source technology available: Apache for web serving, Python for programming, PostgreSQL for data, all optimized to work together on a stable, high-performance Linux platform.

Ellington isn’t intended to run on a wide variety of platforms even though it probably could. It is meant to run a production grade newspaper and therefore they specify its exact dependencies — probably the specific versions of apache, postgres, python and even linux!

So what have I learned about deployment and what do I do with all the media? After browsing django-users and the blogosphere, sticking with distutils is a great idea.  I think that packaging each separate Django application is good idea. Each package is completely self contained and includes its media, templates, and code. In addition, the project settings should be minimal and possibly contained in another easily deployed python module controlled by distutils. In terms of fitting the whole thing together and deploying end to end, this is where the native package manager comes into play. This is the best way to manage both python and external dependencies. Rpm, msi or deb installers could fetch all the appropriate python modules (your Django applications), install the right version of your database and web server, sync your database, create the symbolic links to your media and even fill out your basic settings. For larger installations that require the database to be split from the python code and from the static media, this process still makes sense with few changes.