Signing and Verifying Python Packages with PGP

When I first showed Pip, the Python package installer, to a coworker a few years ago his first reaction was that he didn’t think it was a good idea to directly run code he downloaded from the Internet as root without looking at it first. He’s got a point. Paul McMillan dedicated part of his PyCon talk to this subject.

Python package management vs. Linux package management

To illustrate the security concerns, it is good to contrast how Python modules are usually installed with how Apt or Yum do it for Linux distributions. Debian and Redhat distros usually pre-provision the PGP keys for their packages with the distribution. Provided you installed a legitimate Linux distribution, you get the right PGP keys and every package downloaded through Apt/Yum is PGP checked. This means that the package is signed using private key for that distribution and you can verify that the exact package was signed and has not been modified. The package manager checks this and warns you when it does not match.

Pip and Easy Install don’t do any of that. They download packages in plaintext (which would be fine if every package was PGP signed and checked) and they download the checksums of the package in plaintext. If you manually tell Pip to point to a PyPI repository over HTTPS (say crate.io), it does not check the certificate. If you are on an untrusted network, it would not be tough to simply intercept requests to PyPI, download the package, add malicious code to setup.py and recalculate the checksum before returning the new malicious package on to be downloaded.

I think the big users of Python like the Mozillas of the world run their own PyPI servers and only load a subset of packages into it. I’ve heard of other shops making RPMs or DEBs out of Python packages. That’s what I often do. It lets you leverage the infrastructure of your distribution and the signing and checking infrastructure is already there. However, if you don’t want to do that, you can always PGP sign and verify your packages which is what the rest of this post is about.

Verifying a package

There are relatively few packages on the cheeseshop (PyPI) that are PGP signed. For this example, I’ll use rpc4django, a package I release, and Gnu Privacy Guard (GPG), a PGP implementation. The PGP signature of the package (rpc4django-0.1.12.tar.gz.asc) can be downloaded along with the package (rpc4django-0.1.12.tar.gz). If you simply attempt to verify it, you’ll probably get a message like this:

This message lets you know that the signature was made using PGP at the given date, but without the public key there is no way to verify that this package has not been modified since the author (me) signed it. So the next step is to get the public key for the package:

If you hit “1”, you will import the key. Re-running the verify command will now properly verify the package:

The fact that ten different Python modules will probably be signed by ten different PGP keys is a problem and I’m not sure there’s a way to make that easier. In addition, my key is probably not in your web of trust; nobody who you trust has signed my public key. So when you verify the signature, you will probably also see a message like this.

This means that I need to get my key signed by more people and you need to expand your web of trust.

Signing a package

Signing a package is easy and it is done as part of the upload process to PyPI. This assumes you have PGP all setup already. I haven’t done this in about a month so I hope the command is right.

There are additional options like the correct key to sign the package, but the signing part is easy.

However, how many people actually verify the signature? Almost nobody. The package managers (Pip/EasyInstall) don’t and you probably just use one of them.

The future of Python packaging

So what can we do? I tried to work on this at the PythonSD meetup but I didn’t get very far partially because it is a tough problem and partly because there was more chatting than coding. As a concrete proposal, I think we need to get PGP verification into Pip and solve issue #425. This probably means making Python-gnupg a prerequisite for Pip (at least for PGP verification). Step two is to add certificate verification. Python3 already supports certificate checking through OpenSSL. Python2 might have to use something like the Requests library. Step three is to get a proper certificate on PyPI.

Edit: Updated command to upload signed package

Edit (January 2018): This 5 year old post is massively outdated. I recommend taking a look at the Python packaging and distributing docs which are much better now. The commands I typically run to distribute a package are:

RPC4Django updates November 2011 edition

I released v0.1.10 of RPC4Django. I fixed an issue so that setup.py has no requirements on anything outside of the standard library and I set the project up such that python setup.py test runs the unit tests.

The bigger change is that I moved the project from Launchpad to Github. I’ve already been using Github quite a bit and I thought that I’d bite the bullet and do the move. While I liked Launchpad, I think it is better suited to larger projects that will use the features like Blueprints and Translations. For a small project like RPC4Django, Github’s code-centric approach works better.

Djangocon 2011 Day Three

I know Djangocon has been over for a week, but I didn’t get a chance to talk about day three and specifically Paul McMillan’s excellent security talk. I also think it’s interesting that Djangocon seems to correlate with security releases (2011, 2010).

Timing attacks

Paul demonstrated a timing attack against password reset: a method that mails a user a one-time link to use to reset their password. This timing attack could guess that link with fewer requests than would be needed to guess that link via brute force — that is, fewer than having to guess all possible combinations. It did so by measuring the difference in the times requests took between requests with more vs. fewer correct characters in the URL. I spoke with Paul and he said that this attack works best locally and would be hard to execute remotely because variability in network latency would be significant enough to make measuring the differences in timing difficult. While this attack is not completely practical, a lot of people use shared or cloud hosting which allow attackers to somewhat mitigate this by setting up attack servers in the same network.

Paul also demonstrated a timing attack which leaked some information about whether a username was valid in the system.

Securing Django in production

Even if Django is completely secure (which nothing truly is), mistakes can be made in deployment. Paul recommended an app called django-secure which checks for common misconfigurations. In addition, he said that the login URL should always be throttled to prevent password guessing. The Django security docs which your humble blogger helped write also recommend that among a number of other things. They are worth a read.

Password issues

I posted a primer about Django passwords last month. Paul had some more things to say about it. Firstly, database dumps/backups and initial data which contain hashed passwords should not be public (for example, on github). As I mentioned in the primer, eight character passwords using Django’s current hashing algorithm (sha1) can be brute forced in a matter of hours in the worst case. So if you accidentally leaked a backup — and a number of high profile sites have done things like this — then consider those passwords broken.

The fix for the password problem is to use a “slower” hashing algorithm designed for hashing passwords. I spoke with Paul after the talk and one of the road blocks to using something like bcrypt is its reliance on C extensions and the Django core team is reluctant to introduce them. However, they are really trying to get something better into the Django core for 1.4.

Miscellaneous

There were a number of other recommendations including:

  • Be careful where you store pickled data (cache, /tmp, etc.). Pickled objects can contain executable code.
  • Use the proper cryptographic functions available in Django and Python including: random.SystemRandom, django.utils.crypto.constant_time_compare, and django.utils.crypto.salted_hmac
  • Be careful when deploying HTTPS to make sure it is done properly

It’s good to hear that security people are going over Django with a fine-toothed comb.

Djangocon 2011 Day Two

I enjoyed some brief time traveling when Jacob showed what Django looked like in 2005 or so. It has come a long way.

OMG APIs

I attended two talks on APIs today: Isaac Kelly’s talk on Tastypie and Tareque Hossain’s talk on the Promises & Lies in REST. Tareque’s talk involved PBS’ use of Piston and the changes that they had to make (presumably because the core has not been updated). It seems like a number of new projects in the Django/REST space have cropped up (on top of Tastypie) such as Django REST framework and dj-webmachine. At last year’s Djangocon, Eric Holscher (I think) mentioned that it seemed like there was agreement on Piston for Django REST interfaces and now the REST community is fragmenting a little and using a variety of different tools and methodologies.

Tareque recommended a number of methodologies in his talk that I would say are not very RESTful such as including the status code in the data (as opposed to just using the HTTP status code), putting the API version in the URL (a good idea but maybe that should be in the header) or putting the desired output format in the URL (.xml, .json, etc) as opposed to in the HTTP header. Perhaps thinking about “not very RESTful” though is not the right way to think about it. In his talk, Isaac said that “Restish is enough” and maybe that is the answer. If you’re doing most of the RESTful things, you’re Doing Things Right. On the other hand, once you say “Restish is enough” you’re basically admitting that everybody does REST differently and that divergence in REST interfaces is going to continue for at least the foreseeable future.