Signing and Verifying Python Packages with PGP

When I first showed Pip, the Python package installer, to a coworker a few years ago his first reaction was that he didn’t think it was a good idea to directly run code he downloaded from the Internet as root without looking at it first. He’s got a point. Paul McMillan dedicated part of his PyCon talk to this subject.

Python package management vs. Linux package management

To illustrate the security concerns, it is good to contrast how Python modules are usually installed with how Apt or Yum do it for Linux distributions. Debian and Redhat distros usually pre-provision the PGP keys for their packages with the distribution. Provided you installed a legitimate Linux distribution, you get the right PGP keys and every package downloaded through Apt/Yum is PGP checked. This means that the package is signed using private key for that distribution and you can verify that the exact package was signed and has not been modified. The package manager checks this and warns you when it does not match.

Pip and Easy Install don’t do any of that. They download packages in plaintext (which would be fine if every package was PGP signed and checked) and they download the checksums of the package in plaintext. If you manually tell Pip to point to a PyPI repository over HTTPS (say crate.io), it does not check the certificate. If you are on an untrusted network, it would not be tough to simply intercept requests to PyPI, download the package, add malicious code to setup.py and recalculate the checksum before returning the new malicious package on to be downloaded.

I think the big users of Python like the Mozillas of the world run their own PyPI servers and only load a subset of packages into it. I’ve heard of other shops making RPMs or DEBs out of Python packages. That’s what I often do. It lets you leverage the infrastructure of your distribution and the signing and checking infrastructure is already there. However, if you don’t want to do that, you can always PGP sign and verify your packages which is what the rest of this post is about.

Verifying a package

There are relatively few packages on the cheeseshop (PyPI) that are PGP signed. For this example, I’ll use rpc4django, a package I release, and Gnu Privacy Guard (GPG), a PGP implementation. The PGP signature of the package (rpc4django-0.1.12.tar.gz.asc) can be downloaded along with the package (rpc4django-0.1.12.tar.gz). If you simply attempt to verify it, you’ll probably get a message like this:

This message lets you know that the signature was made using PGP at the given date, but without the public key there is no way to verify that this package has not been modified since the author (me) signed it. So the next step is to get the public key for the package:

If you hit “1”, you will import the key. Re-running the verify command will now properly verify the package:

The fact that ten different Python modules will probably be signed by ten different PGP keys is a problem and I’m not sure there’s a way to make that easier. In addition, my key is probably not in your web of trust; nobody who you trust has signed my public key. So when you verify the signature, you will probably also see a message like this.

This means that I need to get my key signed by more people and you need to expand your web of trust.

Signing a package

Signing a package is easy and it is done as part of the upload process to PyPI. This assumes you have PGP all setup already. I haven’t done this in about a month so I hope the command is right.

There are additional options like the correct key to sign the package, but the signing part is easy.

However, how many people actually verify the signature? Almost nobody. The package managers (Pip/EasyInstall) don’t and you probably just use one of them.

The future of Python packaging

So what can we do? I tried to work on this at the PythonSD meetup but I didn’t get very far partially because it is a tough problem and partly because there was more chatting than coding. As a concrete proposal, I think we need to get PGP verification into Pip and solve issue #425. This probably means making Python-gnupg a prerequisite for Pip (at least for PGP verification). Step two is to add certificate verification. Python3 already supports certificate checking through OpenSSL. Python2 might have to use something like the Requests library. Step three is to get a proper certificate on PyPI.

Edit: Updated command to upload signed package

Edit (January 2018): This 5 year old post is massively outdated. I recommend taking a look at the Python packaging and distributing docs which are much better now. The commands I typically run to distribute a package are:

Securing a Django Site in Production

Edit (2020): This is pretty outdated. Instead, probably the best resource is the Django deployment checklist.

I was setting up a Django site for somebody recently and got asked the question, “is it possible for someone to hack my site?”. The answer, of course, is yes. To some degree, this is unavoidable. If somebody is willing to expend the time, effort and money, it is almost impossible to have a complex site that is perfectly secure. Even security “experts” can get it wrong. However, this got me thinking about the steps to secure a Django site.

Django does a good job of being reasonably secure by default. Unlike some other frameworks where you have to explicitly use CSRF tokens, Django uses them unless you tell it not to. Django escapes data from your templates automatically and is generally safe from SQL injection. The framework contains the building blocks to build a secure site, but quite often the site is deployed on a shaky foundation.

Securing the admin

For maximum security, the Django admin site should probably always be deployed on a web server running HTTPS. There’s a good guide on setting up SSL for the admin. Redirecting requests for /admin to HTTPS is one way. Another way is setup the admin on a subdomain like admin.example.com and handle them like that. This is what it looks like in Nginx:

Using this, you can proxy to two different Django instances: one handles the site over HTTP and one handles just the admin over HTTPS. Depending on your exact setup, you probably also want to mark the cookie as secure.

While the admin always needs security, some sites could also benefit from security outside of the admin if they’re handling user details, email addresses or other things. As an application developer, you need to build in that security — Django doesn’t know what you need to protect. Just remember the next time you login to your Django admin screen on a wifi hotspot at Starbucks that anybody can run something like Firesheep or Wireshark and capture your credentials. It’s amazing how many notable sites get this wrong. It reminds me of the wall of sheep.

Securing the server

It is amazing how many people put out a server with an inadequate firewall. Either they leave their database port wide open, memcached port open (this is REALLY bad — see here) or in some other way greatly increase the possible attack surface. While I generally knew what Amazon Web Services (AWS) could do as far as hosting, I had never used them before recently and I was impressed by their security. AWS makes configuring the firewall super easy and by default, only port 22 is open and SSH only accepts keys not passwords. That’s fairly secure by default! It gives a simple web GUI to open select ports and only to select machines. For example, if you host your database on a different server than your web server, only the web server should be able to connect to the database, not the whole internet. Also, Amazon S3 can serve its files over HTTPS as well. It’s a rather handy feature. I expect Rackspace is fairly similar in most regards.

Django security update

There were a couple fixes and changes in Django 1.2.5, but the main change was to CSRF exceptions to AJAX requests. The decision to remove the exception — despite backwards incompatibility — was the right move considering that the assumption that XmlHttpRequests could only come from the browser is no longer true (was it ever?). However, this release makes me wonder how many site authors didn’t bother to change much and just put @csrf_exempt above their web services just to get their site working again quickly with the new version.

Note: I secured the wordpress admin using the guide here and the WordPress HTTPS plugin. It’s a self-signed cert so I’m only getting maybe 75% of the security pixie dust, but I can deal with that.

Edit (September 14, 2011): Take a look through the Django security docs which your humble blogger helped write.

RPC4Django Update October 2009

A user has requested that RPC4Django support HTTP access control. This is the new preferred method where newer browsers are allowed to make cross domain AJAX requests (with specific constraints) without having to resort to hacks and workarounds like dynamic script tags. I also want to work on JSON class hinting, which is not currently supported. I’m shooting to get this going in the next week before I leave for a Mexican vacation. Swine flu has made the Mexican resorts very reasonable.

Weird Issue on Chrome

In addition, I have noticed that the authenticated demo site does not work in Google Chrome. Is anyone else experiencing this? Any idea why? There’s no problem with Chrome on the demo site not running ssl.

RPC and Authentication

I’m working on adding support for authenticated service calls to RPC4Django built on top of Django’s user authentication. While doing this, I took a brief look around at how other projects implemented authentication for XMLRPC or JSONRPC. Without exception, they all implemented it such that the username and password was part of the RPC call like so:

Some of them abstracted the actual username and password checking into a decorator, but in the end, the RPC call had the username and password in the parameters. It seemed bulky and out of place. This led to an analysis about authentication and authorization and what should be handled where. As a little spoiler, I don’t like the idea of sending the username and password in the RPC parameters one bit.

Authentication & Authorization

In applications, authentication is the process that confirms the identity of the user. Usually this takes the form of a login form, HTTP basic authentication, or something similar. Authorization is the process to determine whether the user has sufficient privileges to perform the specified action. This takes the form of permission checks based on the authenticated user. Therefore, authentication must come before authorization.

Fortunately, Django’s user authentication helps with both authentication and authorization. The authenticate method checks a username and password against the set of Django users and gets the user object if everything goes well. Once this user object is retrieved, permissions can be checked using the has_perm method. Django has a pretty easy way to create new permissions based on your application’s logic. Permissions have to be checked at the specific method level since permissions are closely tied to the application logic. I like the idea of abstracting much of it into a decorator though. The only remaining question is: where does the username and password come from?

An Example from the Real World

Why should every RPC method need to be specially written to accept the login credentials and authenticate the user? This makes the method only usable as an RPC method and not useful at all to the rest of the project which is bad for code reuse. Amazon s3, a commercial web service for storing files, is a perfect example of the proper way to authenticate and authorize users. With s3, the login information is contained in the HTTP header in a manner similar to HTTP basic authentication and in this way the request can be rejected earlier based on login credentials before the request even routes to the proper method requested. Permission checking, seeing whether the user is allowed to store new files for example, still needs to be done at the method level but at least the identity of the user is known.

Implementation and Demo

For RPC4Django, I’m proposing that authentication be handled at a higher level — with basic HTTP authentication for example. To illustrate this, I set up an https RPC4Django demo site that requires a username and password (rpc4django/rpc4django). The demo site requires that you accept a self-signed certificate. Using python, it is possible to send authenticated requests like so:

The next step is to modify RPC4Django to actually be able to specify permissions for specific methods and to actually log in the users. Expect a release this week.