Code Happiness

While I was reading Delivering Happiness on my vacation, I got to thinking about how it applies to me. Tony Hsieh claims that a key factor in Zappo’s success was making sure his employees, customers and suppliers were happy. So what are the best ways to keep developers, management and customers happy in a software setting?

For open source projects, there is often no management. Developers are giving away their work so you would assume that if they weren’t happy they could just move on to a different project. Users of OSS are usually happy to get something for nothing but at times I’ve seen them get picky and demanding. The situation for companies is quite a bit different. Quite often management is telling a developer what to do. Marketing or sales has promised the world and customers are unhappy at what is delivered. Senior management is breathing down the neck of middle management to get the project done faster, cheaper and better. So why are so many talented people willing to give their time — which is worth a fortune — to open source and at the same time companies struggle to keep their developers happy and engaged? Why also are a lot of users (especially power users) satisfied with open source but not satisfied with commercial software? How can management simultaneously keep developers happy and also ship products on time and on budget?

Understanding and recognition

In a commercial setting, there’s often a barrier between who worked on the code and who the users (customers) are. Developers like to be recognized for their hard work like they are in open source. It gives them a sense of pride. It certainly does for me. Now I’m not suggesting that companies should tear down the wall between developers and customers completely and always have an easy direct line of interaction between them. However, there should probably be a door on that wall. Developers may be surprised at how their creations are being used and they can come to a better agreement about new features or a change of direction. Users also may better understand why a feature was done a certain way instead of simply thinking that the software was done poorly. Open source usually has a direct line but once a project is big enough there is also a users group to field the easier questions and concerns. This satisfies developers’ desire for recognition while also allowing for good communication with customers

Ownership – the key to happiness

The idea of having developers interfacing with and understanding the wants of customers suggests that the developers, not strictly management should be at least partially in charge of what features get developed. If management can’t convince developers that their work is important, the project is doomed to failure. Nothing can lower the morale of a team of developers faster than knowing that they are working on something that they view as valueless. People want to feel that their work is useful. Problems of code ownership are for the most part unique to commercial software since there’s always a sense of ownership with open source.

On the other side of the same coin, why have Fedex days at Atlassian and 20% time at Google been so successful at motivating developers even when management is removed from control? The answer, I think, is three-fold. Firstly, I think concept of pride and recognition play into it. Secondly, developers like to have that control and not always have management dictate to them. The other reason is that when a developer works on a codebase more than a certain amount, they become uniquely qualified to make certain improvements: refactors that only they can make or features that only they could think of. In a word, they become the master of their domain and they just need the time to show it.

There’s an entire TED talk on the topics of autonomy, mastery and purpose that is worth watching. While the conclusion is that those are better motivators, I think they also lead to more happiness.

So what about management?

So if management isn’t really as important in telling developers what to do, do we still need them? While open source projects get by just fine without them, I think managers are necessary for commercial projects. However, instead of telling people what to do, I think managers (and developers) would be happier if they functioned more like idea-men. They plant the seeds of ideas into the minds of developers and see what takes root. While emersed in a project, occasionally it is difficult to sit back and take in the big picture. However, somebody needs to keep it in sight while also having an eye for improvements.

What to do about it?

One other message from Hsieh’s book is that it is never too late for a change. If your management is draconion and completely removes you (the developer) from the process, it might be time to explore other options. Likewise, if you’re a manager and you basically function as a middleman and your superiors yell at you to go yell at your subordinates, there are better opportunities out there. You might also take these things into account as a consumer of software as well. You might just end up a little happier.

Update (March 15, 2011): There’s a good article at NYT about Google’s approach to management.

Here’s a picture of some of my friends on a trike taxi in the Philippines.

RPC4Django Updates March 2011 Edition

I’ve been ignoring RPC4Django for a while, and I figured it was time to revisit it. There have been a couple bug reports as well as a bug reported against South. On a slight tangent, South works amazingly well. Getting back to RPC4Django, there is also a merge request on Launchpad to “allow specific methods to be available at specific URLs”. It sounds like it might be useful. What do you — the nebulous community — think? You can take a look at the code here.

I’ll be out of town for the next month on vacation through South-East Asia. I’ll be sure to post a picture or two.

Securing a Django Site in Production

Edit (2020): This is pretty outdated. Instead, probably the best resource is the Django deployment checklist.

I was setting up a Django site for somebody recently and got asked the question, “is it possible for someone to hack my site?”. The answer, of course, is yes. To some degree, this is unavoidable. If somebody is willing to expend the time, effort and money, it is almost impossible to have a complex site that is perfectly secure. Even security “experts” can get it wrong. However, this got me thinking about the steps to secure a Django site.

Django does a good job of being reasonably secure by default. Unlike some other frameworks where you have to explicitly use CSRF tokens, Django uses them unless you tell it not to. Django escapes data from your templates automatically and is generally safe from SQL injection. The framework contains the building blocks to build a secure site, but quite often the site is deployed on a shaky foundation.

Securing the admin

For maximum security, the Django admin site should probably always be deployed on a web server running HTTPS. There’s a good guide on setting up SSL for the admin. Redirecting requests for /admin to HTTPS is one way. Another way is setup the admin on a subdomain like admin.example.com and handle them like that. This is what it looks like in Nginx:

Using this, you can proxy to two different Django instances: one handles the site over HTTP and one handles just the admin over HTTPS. Depending on your exact setup, you probably also want to mark the cookie as secure.

While the admin always needs security, some sites could also benefit from security outside of the admin if they’re handling user details, email addresses or other things. As an application developer, you need to build in that security — Django doesn’t know what you need to protect. Just remember the next time you login to your Django admin screen on a wifi hotspot at Starbucks that anybody can run something like Firesheep or Wireshark and capture your credentials. It’s amazing how many notable sites get this wrong. It reminds me of the wall of sheep.

Securing the server

It is amazing how many people put out a server with an inadequate firewall. Either they leave their database port wide open, memcached port open (this is REALLY bad — see here) or in some other way greatly increase the possible attack surface. While I generally knew what Amazon Web Services (AWS) could do as far as hosting, I had never used them before recently and I was impressed by their security. AWS makes configuring the firewall super easy and by default, only port 22 is open and SSH only accepts keys not passwords. That’s fairly secure by default! It gives a simple web GUI to open select ports and only to select machines. For example, if you host your database on a different server than your web server, only the web server should be able to connect to the database, not the whole internet. Also, Amazon S3 can serve its files over HTTPS as well. It’s a rather handy feature. I expect Rackspace is fairly similar in most regards.

Django security update

There were a couple fixes and changes in Django 1.2.5, but the main change was to CSRF exceptions to AJAX requests. The decision to remove the exception — despite backwards incompatibility — was the right move considering that the assumption that XmlHttpRequests could only come from the browser is no longer true (was it ever?). However, this release makes me wonder how many site authors didn’t bother to change much and just put @csrf_exempt above their web services just to get their site working again quickly with the new version.

Note: I secured the wordpress admin using the guide here and the WordPress HTTPS plugin. It’s a self-signed cert so I’m only getting maybe 75% of the security pixie dust, but I can deal with that.

Edit (September 14, 2011): Take a look through the Django security docs which your humble blogger helped write.

Lessons learned with RabbitMQ & Celery

A couple months ago, I posted about using asynchronous jobs with RabbitMQ and Celery. This is a follow-up with some lessons I learned the hard way.

Celery settings for good performance

Do not run millions of jobs with DEBUG = True. You will run out of memory — even if you have 48GB of it. On top of that, you might want to consider the celeryd option –maxtasksperchild.

Be extra careful with CELERY_SEND_TASK_ERROR_EMAILS = True. I sent 9000 emails to myself in a couple minutes. My phone which syncs my email really didn’t like it. I’m running with CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True and I’m looking to get a dashboard view of it with django-sentry. I think I’m almost there.

Persistence & disk space

RabbitMQ stores messages intelligently so you don’t have to keep track of them. It’s very good at this. However, problems can arise when you’re queuing tasks faster than you’re processing them. Use rabbitmqctl which ships with RabbitMQ. If you see things like this:

There’s probably going to be some issues. Ten million messages have to be stored somewhere. By default on CentOS, they’re stored in /var. RabbitMQ really doesn’t like it when you run out of disk space for it to write persistent messages so be careful.

The new persistence engine in RabbitMQ 2.x handles this much better than before. In 1.x, the persistence log has to copy itself over every so often and copying multi-GB files all the time really slows the queue to a halt and adds to the problem of not processing tasks fast enough. On top of this, RabbitMQ writes a ton of logs, which is a good thing, but can backfire when disk runs out.

Task sets

Celery’s task sets work like magic. Instead of this:

Use this:

Note: the first parameter to subtask is a tuple of arguments to process_item.

General tips
  • If you can make your tasks re-entrant — meaning they can be run with the same parameters multiple times without any side effects — your life will be a lot easier. Django’s get_or_create works wonders.
  • Try to break tasks into smaller subtasks. Instead of one 45 minute task, break it into 2,000 tasks that take a second or two.
  • If you are clever with your logging, debugging things will be a lot easier. This is generally always true, but it becomes much more apparent with celery’s concurrency.