Using Django for Intranet Applications

One thing I end up doing quite a bit of at work is developing custom web applications that sit on the company intranet. These aren’t your basic timecard application or company web portal. Instead, these customized applications do everything from reports and dashboards from our bug tracking software to providing a web service API to our test case management solution. This post is about chronicling some of our forays into intranet apps.

Every intranet app needs authentication

If your company is anything like mine, you have a huge Active Directory system and your webservers are protected by single-sign-on (SSO). Django plugs into that nicely with the RemoteUserMiddleware so that users don’t have to remember Yet Another Password for your app. My password to our off the shelf commercial bug tracking software is still “welcome” and my password to our “Agile” project and task tracker is “test”. I’m already on the corporate intranet. Why should I have to authenticate twice? With Django and RemoteUserMiddleware, your users are automatically logged in if they’re authenticated. It seems relatively trivial, but it greatly enhances the user experience to not have to remember the password to the /admin site.

Who needs /admin

It seems that virtually every application has some sort of need for admin functionality. We used to deploy phpMyAdmin on most of our web hosts to administer our content. It was dove hunting with a bazooka. There was little fine grained control and we ran into the same issue of re-authenticating to access the admin site. Now an admin site is not unique to Django and it exists in virtually every major web framework in every major language. This point can be taken as an overwhelming endorsement of using some framework over using no framework at all. However, Django’s admin site is easy and very customizable in case you need to pretty up your admin site because it will be accessed by a wider audience.

Intranet apps have a way of becoming internet apps

You never know management decides some piece of software that was never intended to be used externally suddenly is a must have for some outside group and it needs to be internationalized (into Japanese?!). Suddenly, all the code has to go through open source compliance, export compliance, code scans, and due to licensing restrictions management doesn’t want to ship with MySQL. Unfortunately, that software was written in Java and PHP with no framework and quite a few open source libraries that we couldn’t ship. The transition was much more painful than if we would have just used Django from the start.

Merging and splitting apps

I’ve released and deployed a number of web applications, but the real nightmare comes when apps are merged together or one app is broken apart. This is where Python’s packaging and the Django concept of splitting your project into multiple individual apps really shines. I’ve run into this from a couple of different sides. I’ve had to take multiple PHP applications and merge them together into a single deployment. There was a huge mess with including common code and the solutions aren’t great. You’re either messing with include_dir in php.ini — a huge nightmare when managing multiple deployments with different includes, or you’re stuck modifying every include statement to pickup libraries from a common location. Splitting apps up runs into similar issues. Packaging separable components into different apps used by your project really is the way to go and Django works with this very well.

I’ve developed web apps in Java, Perl and PHP and by far I’ve been happiest with Python and Django. There are other great frameworks out there for these languages and using them could definitely help alleviate some of these issues, but the Django solution fits together better than anything I’ve seen from Struts, CodeIgniter or one of the dozens of other frameworks out there. For me, there has to be a pretty compelling argument not to do new apps with Django.

Django authentication and mod_wsgi

While I was setting up the RPC4Django authenticated demo site (user = pass = rpc4django, self signed certificate), I ran into an interesting problem. There is a well documented way to use the Django auth database for HTTP basic authentication with apache/mod_python, but an authentication handler for mod_wsgi was not built into Django. After some investigation, I found the part of the mod_wsgi documentation on Django authentication. However, I was curious why a mod_python authentication handler existed in the Django code line, but no such mod_wsgi handler existed despite the fact that mod_wsgi is now the preferred method of Django production deployment.

A mod_wsgi authentication handler

This investigation led me to Django ticket #10809 which contains a patch for a mod_wsgi authentication handler. I found this interesting and I tried to install it into the demo site. At the time, I hadn’t even implemented the .wsgi authentication script and I was using an Apache htpasswd file in addition to the Django auth database. However, when I attempted to install the mod_wsgi handler, I ran into a number of issues. Firstly, mod_wsgi does not seem to be able to import a python module from the python path. Therefore, the line: WSGIAuthUserScript django.contrib.auth.handlers.modwsgi yields:

This led me to believe that mod_wsgi literally opens and reads the WSGIAuthUserScript. The problem with this is that DJANGO_SETTINGS_MODULE is not set before that is called and mod_wsgi does not pass environment variables that are set to the auth script. As per the documentation:

Any configuration defined by SetEnv directives is not passed in the ‘environ’ dictionary because doing so would allow users to override the configuration specified in such a way from a ‘.htaccess’ file. Configuration should as a result be placed into the script file itself.

.
This means that any attempt at a generic mod_wsgi authentication handler is moot since it cannot be configured to connect to the correct project’s auth database.

The solution

The solution is simple and not very gratifying: write an auth script specific for your application. This is what the demo site does now.

httpd.conf:

auth.wsgi:

Update (October 9, 2009)

After some discussion on Django ticket #10809 with the author of mod_wsgi, I submitted a patch that includes a Django mod_wsgi auth handler. Hopefully it gets accepted soon.

Lessons Learned: Releasing A Django Application

As you may be aware, I released a Django application, RPC4Django, publicly a little while ago. This is what I learned.

Make It Easy

You efforts should be targeted at making adoption of your software as easy as possible. When developers look for a package that accomplishes a goal, they aren’t going to spend much time looking at any individual package. Usually a quick google search or a search in pypi and then they might spend a few minutes glancing at a package before moving on to the next one. There are a couple keys to making it easy:

  • Have easy documentation on installation, configuration and the license.
    This should be on both your webpage as well as on pypi. More on this later.
  • Give demo code or better yet, an actual demo.
    Quite often the one sentence summary on pypi (if you don’t provide a long description) is not sufficient to tell people why they should use your package.
  • List the package’s compatiblity.
    Considering very few packages are compatible between 2.5 and 3.0 or even 2.3 and 2.5, I’m very surprised by this. Nothing is more frustrating than thinking you’ve found what you need just to have installation (or worse, execution) fail.
Test Early and Test Often

When I released RPC4Django, it had a reasonable, but not great, unit tests with it. With the future releases, those improved significantly and a few more bugs were caught. Being able to quickly test python 2.4, 2.5 and 2.6 by having a fully built unit test suite was awesome. However, unit tests don’t catch everything. One bug that I did not catch for a while was a bug relating to how Django worked with mod_wsgi on Apache. No amount of unit testing or Django views testing (which is sort of a hybrid of unit and integration testing) would have caught this. It only got caught when the code was pushed to a production server.

Documentation Conundrums

Documentation usually requires keeping data in a number of places. Any project of considerable size has READMEs, a license file, HTML documentation, tutorials, a setup.py long description, class and module documentation, and more. There should be as few authoritative places for package information as possible. In the first version of RPC4Django, I rolled my own HTML documentation and and README basically told the user to read the real documentation. This solved the whole problem of duplicated documentation, but in a rather unenlightened way.

After reading more exhaustively about reST and looking at how other packages solve this problem, I found a better way. With reST, it is possible to include other documents like so:

What this allows for and what RPC4Django does now is to enable one document to be built of many documents. In this way, I keep the license in one file, the installation in another, the changelog in another and they all are included into my README which can be used to generate my HTML documentation.

If my project was even larger, I might do what both the Python project and the Django project do. They include reST documents in their source tree. These include walkthroughs, tutorials, introductions to classes and more. Sphinx, a python package built to document the python library itself, can create attractive documentation hierarchies directly from simple text documents (which are themselves human readable!). Python uses it for all of their documentation and the Django project uses it for basically the entire documentation section of their webpage — including tutorials. You wouldn’t know it since the pages are pretty attractive but they appear to be generated directly out of subversion on a nightly basis. This keeps all the documentation together and makes sure your website, HTML documentation and source documentation stay in sync.

Don’t Get Discouraged

Just because your package is now on pypi doesn’t mean people are going to flock to it and download it. I think I’ve put together a pretty solid and useful package but I have only a few downloaders and I haven’t gotten much feedback. I intend to power through and start a new project.

Packaging and Sharing Django Applications

When I packaged up the first version of rpc4django and sent it to the package index (pypi) with distutils, I thought I was doing everything right. I’d packaged my django application (if you’re unclear on applications vs. projects, see the terminology) in such a way that it could be used with a variety of projects and required almost no configuration to get off the ground. I filled out all the metadata in the distutils setup() function and I made sure that my package could be installed via easy_install. I even implemented a fix that I stole from Django’s setup.py to make sure the “install data” (including Django templates) get installed in the right place on the flawed Mac OS X versions of python. Little did I know, I was just getting started.

Easy Install

Easy install is a pretty convenient way to install packages and making your packages easier to install will make sure that they get installed more frequently by more users. Provided you have used distutils or setuptools for your python package and have a working setup.py, you should be able to easily upload your package to the python package index and install it with easy install. Where easy install doesn’t shine, however, is how it automatically creates .egg files out of packages.

Making eggs out of libraries seems like a great idea on paper, but when a Django application is packaged as an egg, the default settings.py TEMPLATE_LOADERS will not be able to load it. The additional loader django.template.loaders.eggs.load_template_source must be enabled. Easy install will only create an egg if setuptools “determines” that it can safely create it. Alternatively, the zip_safe setting can be set to False in the setuptools.setup() function. To make it easier on the users, making the easy install unpack all your templates (one day setup tools will identify Django apps) will require less configuration and get more users.

This issue, which I noticed after the first rpc4django release will be fixed in the next release in a few days.

Package Index (Pypi)

I got my module into pypi and then I realized that there’s a lot of issues that go with it. I made sure that setuptools/distutils summary and description were set, but until I read the somewhat erroneously named cheeseshop tutorial, I didn’t understand that pypi sometimes subtly and sometimes overtly steers people towards “better” packages.

Start with the distutils tutorial on distributing packages.This will show you what needs to go into the setup.py and how to register, which really is the first step. However, people browsing pypi can come across a well documented library with installation instructions and installers directly on the pypi page like setuptools, or they come across a listing that is little more than link to the package’s webpage. To get that pretty documentation, you have to spend a few minutes to learn reST and that has to be set into your long_description in setup.py. To steal a good tip from setuptools, put the structured text in the README and read it into your setup.py.

Cheesecake

The last aspect for today has to do with searching for packages on pypi. When you search, the results are returned based on a “score”. At first, like other search engines, I thought this score might be some sort of relevance number related to the search. It isn’t. It is the cheesecake index. This score is based on the quality (Kwalitee) of the package [Edit: apparently the score is based on relevance and the score listed is not based on the cheesecake index]. If python setup.py install works with your package, it helps your score. If you have unit tests and a docs/ directory, you get some more points. If you get a good pylint score or your code complies with PEP8, you get some more.

Overall, it’s not a bad idea. Linting and pep8 and a changelog file don’t make a package great, but easy installation, good documentation and unit tests are probably pretty strongly correlated with a higher quality package.

Upcoming RPC4Django Release

By my own calculations, rpc4django-0.1.0 got about 54% of the cheesecake index and that merits the #7 spot when people search for “rpc”. Even before I knew about cheesecake, I was running pep8 and pylint and I’m pretty good about installation, documentation and unit testing. Expect improvements in the next version along with the following features (which will be in my CHANGELOG for an extra couple cheesecake points):

  • reST documentation instead of what I rolled myself
  • the rpc method summary could use some reST
  • fix my easy install problems with templates
  • package improvements, including to my pypi page
  • test compatibility with more systems and python versions — it looks good down to 2.4, but there are some unit test failures related to the way exceptions are presented with repr()
  • analyzing post data to see if it is json or xml to improve the dispatching library compatibility — it’ll still use content type if it is set properly
  • if I get a chance, I’ll add a javascript library that allows testing the methods directly from the method summary