Django authentication and mod_wsgi

While I was setting up the RPC4Django authenticated demo site (user = pass = rpc4django, self signed certificate), I ran into an interesting problem. There is a well documented way to use the Django auth database for HTTP basic authentication with apache/mod_python, but an authentication handler for mod_wsgi was not built into Django. After some investigation, I found the part of the mod_wsgi documentation on Django authentication. However, I was curious why a mod_python authentication handler existed in the Django code line, but no such mod_wsgi handler existed despite the fact that mod_wsgi is now the preferred method of Django production deployment.

A mod_wsgi authentication handler

This investigation led me to Django ticket #10809 which contains a patch for a mod_wsgi authentication handler. I found this interesting and I tried to install it into the demo site. At the time, I hadn’t even implemented the .wsgi authentication script and I was using an Apache htpasswd file in addition to the Django auth database. However, when I attempted to install the mod_wsgi handler, I ran into a number of issues. Firstly, mod_wsgi does not seem to be able to import a python module from the python path. Therefore, the line: WSGIAuthUserScript django.contrib.auth.handlers.modwsgi yields:

This led me to believe that mod_wsgi literally opens and reads the WSGIAuthUserScript. The problem with this is that DJANGO_SETTINGS_MODULE is not set before that is called and mod_wsgi does not pass environment variables that are set to the auth script. As per the documentation:

Any configuration defined by SetEnv directives is not passed in the ‘environ’ dictionary because doing so would allow users to override the configuration specified in such a way from a ‘.htaccess’ file. Configuration should as a result be placed into the script file itself.

.
This means that any attempt at a generic mod_wsgi authentication handler is moot since it cannot be configured to connect to the correct project’s auth database.

The solution

The solution is simple and not very gratifying: write an auth script specific for your application. This is what the demo site does now.

httpd.conf:

auth.wsgi:

Update (October 9, 2009)

After some discussion on Django ticket #10809 with the author of mod_wsgi, I submitted a patch that includes a Django mod_wsgi auth handler. Hopefully it gets accepted soon.

RPC4Django 0.1.5 is Available

Go get it!

I finally completed the version of RPC4Django that uses Django’s authentication system. I blogged about authenticated RPC services previously, and in reality the changes weren’t too major. The only thing I haven’t decided on is what to do in the event a user executes a method with insufficient privileges. Currently, RPC4Django returns HTTP status code 403 (Forbidden), but that seems almost restful. Depending on any feedback I receive, I may change that to actually return an RPC fault which is more RPC like.

In addition, I was contacted about RPC4Django and unicode and I decided to do some testing. As far as I can tell, it supports full unicode without any problem. I wrote some unit tests to verify this and to make sure it continues to support unicode in the future.

Changes
  • Authenticated view that ties in with Django’s auth system
  • Added unicode unit test cases to verify that RPC4Django supports unicode (it does!)
  • Added authenticated demo site (user = pass = rpc4django, self signed certificate)
  • Improved the documentation stylesheet

RPC and Authentication

I’m working on adding support for authenticated service calls to RPC4Django built on top of Django’s user authentication. While doing this, I took a brief look around at how other projects implemented authentication for XMLRPC or JSONRPC. Without exception, they all implemented it such that the username and password was part of the RPC call like so:

Some of them abstracted the actual username and password checking into a decorator, but in the end, the RPC call had the username and password in the parameters. It seemed bulky and out of place. This led to an analysis about authentication and authorization and what should be handled where. As a little spoiler, I don’t like the idea of sending the username and password in the RPC parameters one bit.

Authentication & Authorization

In applications, authentication is the process that confirms the identity of the user. Usually this takes the form of a login form, HTTP basic authentication, or something similar. Authorization is the process to determine whether the user has sufficient privileges to perform the specified action. This takes the form of permission checks based on the authenticated user. Therefore, authentication must come before authorization.

Fortunately, Django’s user authentication helps with both authentication and authorization. The authenticate method checks a username and password against the set of Django users and gets the user object if everything goes well. Once this user object is retrieved, permissions can be checked using the has_perm method. Django has a pretty easy way to create new permissions based on your application’s logic. Permissions have to be checked at the specific method level since permissions are closely tied to the application logic. I like the idea of abstracting much of it into a decorator though. The only remaining question is: where does the username and password come from?

An Example from the Real World

Why should every RPC method need to be specially written to accept the login credentials and authenticate the user? This makes the method only usable as an RPC method and not useful at all to the rest of the project which is bad for code reuse. Amazon s3, a commercial web service for storing files, is a perfect example of the proper way to authenticate and authorize users. With s3, the login information is contained in the HTTP header in a manner similar to HTTP basic authentication and in this way the request can be rejected earlier based on login credentials before the request even routes to the proper method requested. Permission checking, seeing whether the user is allowed to store new files for example, still needs to be done at the method level but at least the identity of the user is known.

Implementation and Demo

For RPC4Django, I’m proposing that authentication be handled at a higher level — with basic HTTP authentication for example. To illustrate this, I set up an https RPC4Django demo site that requires a username and password (rpc4django/rpc4django). The demo site requires that you accept a self-signed certificate. Using python, it is possible to send authenticated requests like so:

The next step is to modify RPC4Django to actually be able to specify permissions for specific methods and to actually log in the users. Expect a release this week.

RPC4Django 0.1.4 is Available

Go get it!

I attempted to reproduce the bug reported last week, but I was unsuccessful. A related bug seems to still be plaguing the Django project as well and is intermittent and difficult to reproduce. The bug is especially curious since the code used to parse the text into restructured text is very standard and used exactly as the docutils documentation describes. In the mean time, I’ve provided a workaround for those who are seeing this bug which will catch the docutils exception and simply display plain text instead of restructured text. I added a new BUGS.txt into the subversion tree to track this bug, which is the only known bug in RPC4Django.

Changes
  • Provided a workaround for the bug relating to Django Bug #6681.
  • Provided the settings.py option RPC4DJANGO_RESTRICT_REST which forces RPC4Django to not attempt to convert any of the method summary docstrings to restructured text.

RESTful Django Powered Web Services

What is REST

REST is an alternative to RPC based web services such as JSONRPC, XMLRPC and SOAP. Instead of simply using HTTP POST for all of its requests (with JSONRPC’s proposed GET implementation excepted) like RPC services do, it uses all the HTTP methods. It usually includes GET, POST, DELETE, PUT and other methods to achieve different results and thereby uses relatively few URIs.

Some people think any web service that makes various services available at URIs is REST. It isn’t. Some people make a service at one URI for getting an object, another for saving the object, another for getting a list of ojects, another for a list of objects matching a certain criteria. This is just RPC outisde of the realm of a specific protocol (like XMLRPC). If people are going to use simple HTTP RPC requests to get all their data but not follow any specific pattern, they’d be better off with a real RPC implementation.

How is it Better (or Worse)

REST has a lot going for it. Because it is a little more “native” to the HTTP protocol, caching can work very efficiently. Depending on language support, it may be be easier to work with a REST interface than working with a more complex RPC specific protocol. Its simplicity can be very beautiful. The RESTful idea of making your data available as a “resource” that links via hypertext to more resources can make REST very powerful.

Instead of

GET http://example.com/testcase/56

<testcase>
<results>
<result>1</result>
<result>2</result>
</results>
</testcase>

You have

GET http://example.com/testcase/56

<testcase>
<results>
<result>http://example.com/result/1</result>
<result>http://example.com/result/2</result>
</results>
</testcase>

In the 2nd method a full test case object can be generated by off of a request to the testcase object and later requests for the result objects. The client would not need to know anything special about testcases or the specific domain as it would in the first example.

RPC also has a lot going for it and there are some cases where I would pick it over REST. Caching is not always very important and when it isn’t, the benefits REST are not as apparent. Most RPC protocols already have the capability out of the box to construct objects (for SOAP — very complex objects) from web service calls. They also usually have introspection methods or WSDL to figure out what services are available. These would need to be built by a crafty REST service developer. RPC, however, doesn’t take much advantage of the HTTP protocol in that most requests are just POST requests with an RPC payload. At the same time, every HTTP implementation supports POST and not all of them support PUT or DELETE.

Next Steps

Django has a few libraries to help with REST interfaces, but nothing I’ve seen is that great. I am going to look into creating one or contributing to an existing project. Here are some things I’d like to see in a REST API:

  • In the Django 1.1 development version, PUT, DELETE, OPTIONS and HEAD are available in django.test.client.Client. A REST interface should use them by default and have another mechanism for clients that do not support these lesser used HTTP methods.
  • caching and ETags
  • different output formats (eg. XML and JSON)
  • service/resource discovery or introspection (similar to WSDL or system.listMethods)
  • a client library than can generate complex native python objects given a URI
  • models and other sources of data as REST resources
  • integration at some point with the Django trunk!
What’s Already Out There

There’s a few Django projects for making data available via REST. These efforts seem to have stalled or be in infant stages.

  • Django model views — A GSoC project to make Django model data available via REST.This project never seemed to get far off of the ground. I don’t think it has been updated much since 2007.
  • Django REST interface — Another GSoC project to create RESTful interfaces.There seem to be some active users of this one and it seems to be more fully featured than the above model views project. However, it has stalled and there hasn’t been much work on it in the past few years.
  • Django RESTAPI — Another project to make models available via REST.This project seems to have been more recently updated and it seems ok, but it still isn’t ready for prime time or in Pypi.
  • RESTinPy — A sourceforge project that makes data available in REST.
    This project seems somewhat advanced but it hasn’t been updated since the first cut was put onto sourceforge and Pypi.
  • DAPI — Another model to REST mapping module
Reading

Edit (July 15, 2010): I wrote an update involving Piston, a popular REST framework for Django.
Update (September 7, 2011): There were some updates on what folks in the community were using at Djangocon.