Updates April 2010 Edition

Django tickets

There’s been only a little movement on the ticket (#13101) I patched for 1.2. However, there’s been some new developments on the ticket (#10809) I patched regarding authentication with mod_wsgi. There’s been a suggestion to add group based authorization to Django’s mod_wsgi auth handler. There’s still some debate as to whether to use Django groups or Django permissions.

Edit (November 30, 2012): Issue #10809 finally made it into trunk and the release notes for Django 1.5.

django-pyodbc is dead?

In a previous post, I talked about getting involved in django-pyodbc development. We are using django-pyodbc at work but the project is languishing a little bit. The project has never had a formal release, the documentation (other than source documentation) is a little light, and despite patches being submitted to get the code in shape for Django’s upcoming 1.2 release, nothing has been checked in by the developers. In fact, there’s been nothing on the project from the developers since January. I emailed the developers a few days ago offering to help and I haven’t heard anything back yet. I’d much rather keep the project together, but if I continue to get nothing I will probably branch the code line and begin development and maintenance. I’m not looking forward to having to find a Windows box on which to setup multiple versions of SQL Server but I’m hoping to be able to virtualize it.

Edit (June 23, 2010): The developers have gotten involved again and I killed my fork of the project.

RPC4Django updates

I’m planning to put some effort into RPC4Django this weekend and make a release in the next week or two. The main features I’m looking at is the existing blueprint in Launchpad to handle authentication out of the box. Other than that, I got a little feedback on the HTTP access control functionality back in January that I need to test. I also plan to rip out the existing documentation and go to a Sphinx based system. We’ve been using Sphinx at work and I’ve been very impressed with its capabilities.

RPC4Django v0.1.6 Feature Requests

A number of feature requests have come in and I am going to outline what I’m going to try to get done. I hope to put out a release next week (I’ve said that before), but that greatly depends on what features go into it.

Access to HttpRequest

One user submitted feature that seems particularly useful is the ability to access variables and data in the HttpRequest that dispatches the RPC method. For example:

This RPC method will be authenticated, but it cannot tell who the user is for example. It does not have the ability to read the REMOTE_USER variable. This would be particularly useful if the RPC method was submitting data and needed to know who the submitter was. To rectify this situation, I am proposing sending the HttpRequest object as a keyword argument:

Methods that need access can accept all the additional keyword arguments and methods that don’t care can be written as normal and do not need to change.

Work with the cross site request forgery framework

Django has added a new module to protect against cross site request forgery (CSRF). Unfortunately, if a user adds the django.middleware.csrf.CsrfViewMiddleware to their project, it will break RPC4Django. CSRF protection works by ensuring that all GET requests are side-effect free and then adding a little bit of random data (the CSRF token) to every form that submits via POST. The server can then verify that the correct random data was POSTed and this ensures that the form was submitted from your site and not some remote site.

Unfortunately, RPC is usually intended to be used in a cross site fashion. The fix is relatively painless, but requires some adversarial testing to make it work under Django 1.0, Django 1.1 and Django 1.2. The code (views.py) will look something like this:

I need to test it out very thoroughly and on a number of setups, but I think this code should do it.

HTTP access control

Usually the Javascript XmlHttpRequest object is subject to the same origin policy. This is a security measure that ensures that Javascript will only access data from the same domain, protocol and port as the initial request. There have always been hacks to get around this and the browser vendors are finally providing a uniform, secure way around.

Basically, RPC4Django will need to respond to an HTTP OPTIONS request with the header Access-Control-Allow-Origin specifying what domains are allowed to send requests. Both allowing HTTP access control and which domains are allowed will be options in settings.py.

Patches and feature requests

As always, I am happy to accept patches, but this is what I’ll be working on next week. Please let me know if there are any other important features that you would like to see in RPC4Django.

Using Django for Intranet Applications

One thing I end up doing quite a bit of at work is developing custom web applications that sit on the company intranet. These aren’t your basic timecard application or company web portal. Instead, these customized applications do everything from reports and dashboards from our bug tracking software to providing a web service API to our test case management solution. This post is about chronicling some of our forays into intranet apps.

Every intranet app needs authentication

If your company is anything like mine, you have a huge Active Directory system and your webservers are protected by single-sign-on (SSO). Django plugs into that nicely with the RemoteUserMiddleware so that users don’t have to remember Yet Another Password for your app. My password to our off the shelf commercial bug tracking software is still “welcome” and my password to our “Agile” project and task tracker is “test”. I’m already on the corporate intranet. Why should I have to authenticate twice? With Django and RemoteUserMiddleware, your users are automatically logged in if they’re authenticated. It seems relatively trivial, but it greatly enhances the user experience to not have to remember the password to the /admin site.

Who needs /admin

It seems that virtually every application has some sort of need for admin functionality. We used to deploy phpMyAdmin on most of our web hosts to administer our content. It was dove hunting with a bazooka. There was little fine grained control and we ran into the same issue of re-authenticating to access the admin site. Now an admin site is not unique to Django and it exists in virtually every major web framework in every major language. This point can be taken as an overwhelming endorsement of using some framework over using no framework at all. However, Django’s admin site is easy and very customizable in case you need to pretty up your admin site because it will be accessed by a wider audience.

Intranet apps have a way of becoming internet apps

You never know management decides some piece of software that was never intended to be used externally suddenly is a must have for some outside group and it needs to be internationalized (into Japanese?!). Suddenly, all the code has to go through open source compliance, export compliance, code scans, and due to licensing restrictions management doesn’t want to ship with MySQL. Unfortunately, that software was written in Java and PHP with no framework and quite a few open source libraries that we couldn’t ship. The transition was much more painful than if we would have just used Django from the start.

Merging and splitting apps

I’ve released and deployed a number of web applications, but the real nightmare comes when apps are merged together or one app is broken apart. This is where Python’s packaging and the Django concept of splitting your project into multiple individual apps really shines. I’ve run into this from a couple of different sides. I’ve had to take multiple PHP applications and merge them together into a single deployment. There was a huge mess with including common code and the solutions aren’t great. You’re either messing with include_dir in php.ini — a huge nightmare when managing multiple deployments with different includes, or you’re stuck modifying every include statement to pickup libraries from a common location. Splitting apps up runs into similar issues. Packaging separable components into different apps used by your project really is the way to go and Django works with this very well.

I’ve developed web apps in Java, Perl and PHP and by far I’ve been happiest with Python and Django. There are other great frameworks out there for these languages and using them could definitely help alleviate some of these issues, but the Django solution fits together better than anything I’ve seen from Struts, CodeIgniter or one of the dozens of other frameworks out there. For me, there has to be a pretty compelling argument not to do new apps with Django.

Django authentication and mod_wsgi

While I was setting up the RPC4Django authenticated demo site (user = pass = rpc4django, self signed certificate), I ran into an interesting problem. There is a well documented way to use the Django auth database for HTTP basic authentication with apache/mod_python, but an authentication handler for mod_wsgi was not built into Django. After some investigation, I found the part of the mod_wsgi documentation on Django authentication. However, I was curious why a mod_python authentication handler existed in the Django code line, but no such mod_wsgi handler existed despite the fact that mod_wsgi is now the preferred method of Django production deployment.

A mod_wsgi authentication handler

This investigation led me to Django ticket #10809 which contains a patch for a mod_wsgi authentication handler. I found this interesting and I tried to install it into the demo site. At the time, I hadn’t even implemented the .wsgi authentication script and I was using an Apache htpasswd file in addition to the Django auth database. However, when I attempted to install the mod_wsgi handler, I ran into a number of issues. Firstly, mod_wsgi does not seem to be able to import a python module from the python path. Therefore, the line: WSGIAuthUserScript django.contrib.auth.handlers.modwsgi yields:

This led me to believe that mod_wsgi literally opens and reads the WSGIAuthUserScript. The problem with this is that DJANGO_SETTINGS_MODULE is not set before that is called and mod_wsgi does not pass environment variables that are set to the auth script. As per the documentation:

Any configuration defined by SetEnv directives is not passed in the ‘environ’ dictionary because doing so would allow users to override the configuration specified in such a way from a ‘.htaccess’ file. Configuration should as a result be placed into the script file itself.

.
This means that any attempt at a generic mod_wsgi authentication handler is moot since it cannot be configured to connect to the correct project’s auth database.

The solution

The solution is simple and not very gratifying: write an auth script specific for your application. This is what the demo site does now.

httpd.conf:

auth.wsgi:

Update (October 9, 2009)

After some discussion on Django ticket #10809 with the author of mod_wsgi, I submitted a patch that includes a Django mod_wsgi auth handler. Hopefully it gets accepted soon.

RPC and Authentication

I’m working on adding support for authenticated service calls to RPC4Django built on top of Django’s user authentication. While doing this, I took a brief look around at how other projects implemented authentication for XMLRPC or JSONRPC. Without exception, they all implemented it such that the username and password was part of the RPC call like so:

Some of them abstracted the actual username and password checking into a decorator, but in the end, the RPC call had the username and password in the parameters. It seemed bulky and out of place. This led to an analysis about authentication and authorization and what should be handled where. As a little spoiler, I don’t like the idea of sending the username and password in the RPC parameters one bit.

Authentication & Authorization

In applications, authentication is the process that confirms the identity of the user. Usually this takes the form of a login form, HTTP basic authentication, or something similar. Authorization is the process to determine whether the user has sufficient privileges to perform the specified action. This takes the form of permission checks based on the authenticated user. Therefore, authentication must come before authorization.

Fortunately, Django’s user authentication helps with both authentication and authorization. The authenticate method checks a username and password against the set of Django users and gets the user object if everything goes well. Once this user object is retrieved, permissions can be checked using the has_perm method. Django has a pretty easy way to create new permissions based on your application’s logic. Permissions have to be checked at the specific method level since permissions are closely tied to the application logic. I like the idea of abstracting much of it into a decorator though. The only remaining question is: where does the username and password come from?

An Example from the Real World

Why should every RPC method need to be specially written to accept the login credentials and authenticate the user? This makes the method only usable as an RPC method and not useful at all to the rest of the project which is bad for code reuse. Amazon s3, a commercial web service for storing files, is a perfect example of the proper way to authenticate and authorize users. With s3, the login information is contained in the HTTP header in a manner similar to HTTP basic authentication and in this way the request can be rejected earlier based on login credentials before the request even routes to the proper method requested. Permission checking, seeing whether the user is allowed to store new files for example, still needs to be done at the method level but at least the identity of the user is known.

Implementation and Demo

For RPC4Django, I’m proposing that authentication be handled at a higher level — with basic HTTP authentication for example. To illustrate this, I set up an https RPC4Django demo site that requires a username and password (rpc4django/rpc4django). The demo site requires that you accept a self-signed certificate. Using python, it is possible to send authenticated requests like so:

The next step is to modify RPC4Django to actually be able to specify permissions for specific methods and to actually log in the users. Expect a release this week.