Why You Should Be Using Pip and Virtualenv

In a previous post, I promised to write about Pip and Virtualenv and I’m now finally making good. Others have done this before, but I think I have a little to add. If you develop a Python module and you don’t test it with virtualenv, don’t make your next release until you do.

Configuring the environment

Virtualenv creates a Python environment that is segregated from your system wide Python installation. In this way, you can test your module without any external packages mucking up the result, add different versions of dependency packages and generally verify the exact set of requirements for your package.

To create the virtual environment:

This creates a directory testarea/ that contains directories for installing modules and a Python executable. Using the virtual environment:

Sourcing activate will set environment variables so that only modules installed under testarea/ are used. After setting up the environment, any desired packages can be installed (from pypi):

Packages can also be uninstalled, specific versions can be installed or packages can be installed from the file system, URLs or directly from source control:

Pip is worth using over easy_install for its uninstall capabilities alone, but I should mention that pip is actively maintained while setuptools is mostly dead.

When you’re done with the virtual environment, simply deactivate it:

Do it for the tests

Testing with virtualenv
While the segregated environment that virtualenv provides is extremely well suited to getting the correct environment up and running, it is just as well suited to testing your application under a variety of different package configurations. With pip and virtualenv, testing your application under three different versions of Django is a snap and it doesn’t affect your system environment in the slightest.

Dependencies made easy

My favorite feature of pip is the ability to create a requirements file based on a set of packages installed in your virtual environment (or your global site-packages). Creating a requirements file can be done automatically using the freeze command for pip:

Wsgiref will always appear in pip’s output. It is a standard library package that includes package metadata. The requirements file is used as follows:

The requirements file can be version controlled both to aid in installation and to capture the exact versions of your dependencies directly where they are used rather than after the fact in documentation that can easily become out of date. The requirements file can be used to rebuild a virtual environment or to deploy a virtual environment into the machine’s site-packages. Pip and virtualenv are exceptionally easy to use and there’s really no excuse for a Python packager not to use them.

Note: I’m working on a fairly large sized application for work. When it is finished, I will release a post-mortem that will also function as an update to my post about packaging and distributing.

Extending Distutils for Repeatable Builds

Distutils is Python’s built-in mechanism for packaging and installing Python modules. It is very convenient for packaging up your source code, scripts and other files and creating a distribution to be uploaded to pypi as I’ve mentioned before. Distutils was discussed (pdf) at PyCon last year and it looks like there are efforts afoot to improve it to add some much needed features like unittesting and metadata. Add-on packages like pip add additional features like uninstallation and dependency management but nothing guarantees that your users have it. Although Python’s packaging and distribution model beats PHP’s hands down, there is still a lot of room for improvement to make it seamless.

Release management

In essence, these issues and enhancements boil down to making release management easier. When releasing your package, you want to make sure that it contains all the appropriate files, is tested and can be installed easily. Distutils helps with the installation, pip with the dependencies and virtualenv (a topic for a later post) helps a lot with testing package interactions. But what about unittests? What about cleaning up after setup.py? What about generating documentation or other files?

Extending distutils

Until all these features get put into distutils, you have to extend it yourself in setup.py. Fortunately, this is not very complicated and can buy you some reliability in your build process. Adding a command like python setup.py test is pretty trivial:

The same sort of functionality could be used to verify any prerequisites not already checked by distutils or pip, generate documentation without external dependencies like Make (although Django supports Python 2.3 before this functionality was available) or to create a uniform way to take source control diffs and submit patches. Executing these commands from one place makes the whole process more consistent and easy to understood. Hopefully the new enhancements to distutils will make the process even better.

Deploying Django Powered Web Applications

Terminology

Firstly, there’s a little problem of terminology to take care of. What many people call a web application, Django calls a “project”. Instead, the Django team uses the term “application” to describe a web application component that can be deployed into one of many projects. To describe it in the wordpress paradigm, wordpress could be a Django project (if it weren’t written in php) but the blogging component, the tagging component and the themes manager might all be separate Django applications.

This distinction really pushes the concept of re-using components. For example, once some one were to write a tagging Django application, the same app could be deployed for photo tagging, blog tagging and other types of content management. These applications are supposed to be completely contained and include their seed data (fixtures), database models, and templates.

The issue and what brings me to the main part of my post is what to do with the media? Should an application include its own images, css, javascript? What’s the best way to deploy them in a convenient way for being served?

Deployment

Packaging python modules is a relatively trivial task and there is a well defined approach for it: distutils. This creates a more or less standard installer that allows anyone to install your package with a single command. From there, it can be put into the python package index and easy_install can install your python package (and dependency packages) with a single command. This works great for python packages like Django and BeautifulSoup, but how would it work for a whole web application? It made me wonder how Ellington, the flagship Django product, does it.

When distributing a python module, it makes sense to support as many platforms and configurations as possible. However, when deploying a full web application into a production environment, it makes sense to restrict your platform to what is tried and true. Ellington’s website, for example states:

Ellington takes advantage of the most secure and flexible open-source technology available: Apache for web serving, Python for programming, PostgreSQL for data, all optimized to work together on a stable, high-performance Linux platform.

Ellington isn’t intended to run on a wide variety of platforms even though it probably could. It is meant to run a production grade newspaper and therefore they specify its exact dependencies — probably the specific versions of apache, postgres, python and even linux!

So what have I learned about deployment and what do I do with all the media? After browsing django-users and the blogosphere, sticking with distutils is a great idea.  I think that packaging each separate Django application is good idea. Each package is completely self contained and includes its media, templates, and code. In addition, the project settings should be minimal and possibly contained in another easily deployed python module controlled by distutils. In terms of fitting the whole thing together and deploying end to end, this is where the native package manager comes into play. This is the best way to manage both python and external dependencies. Rpm, msi or deb installers could fetch all the appropriate python modules (your Django applications), install the right version of your database and web server, sync your database, create the symbolic links to your media and even fill out your basic settings. For larger installations that require the database to be split from the python code and from the static media, this process still makes sense with few changes.