Working with World Bank Data in R

Although I generally stick to Python, I am going to go off on a tangent about statistics, data sets and R. You’ve been warned.

Getting the data

Last week, the World Bank released some of its underlying data that it uses as development indicators. The data is fairly clean and easy to work with. I grabbed the USA data in Excel format and transposed it (using “paste special”) so that each year was a row instead of having the years as columns. Then I saved it as a CSV file on my desktop.

Working with the data in R

R is a programming language that focuses on statistics and data visualization. Unlike Python, R has a number of useful functions for statistics as built-ins to the language. These features allow you to easy find means, minimums, maximums, standard deviations, summarize data sets, plot graphs and more. Working with the data is very interesting and it provides a good way to learn R.

First off, you can read in the CSV file saved easily.

The variable usa contains all columns of data and the columns can be accessed easily:

Plotting with R

Visualizing the data is the real interesting aspect and this is where R really shines. First we need to get the columns we want to graph.

There are some missing data points in both the population and energy use columns for the most recent years. It is possible that that data hasn’t yet been collected and verified. By coercing the data into an integer vector any non-integer data points will be converted into the R NA type. While similar to null or Python’s None, this type indicates that the data is not available and it will be ignored in plotting. Once the data is ready, it can be plotted easily.

When I saw the resulting graph I thought to myself: WOW, that’s a lot of energy. I don’t think I use multiple tons of oil per year, but I assume this also includes industrial, commercial and military usage. Still, that’s a lot of energy. It’s interesting to note that the peak of US energy usage was 1978 and then there’s the subsequent decline due to the energy crisis. The next thing I thought about was how energy usage has leveled off while population has continued to grow. So I decided to put population on the same chart.

US Energy Usage
While the leveling of energy usage may not be as amazing as I thought due to the fact that a significant percentage of it must be industrial use which is probably declining, it is still interesting and fairly impressive. While the population has continued to grow fairly linearly, energy usage is flat or slightly less than it was 35 years ago. I guess those slightly more efficient water heaters and refrigerators are paying off.

3 thoughts on “Working with World Bank Data in R

  1. Hey David,

    Interesting post you’ve got there. Playing around with World Bank data looks like it could be a lot of fun.

    However, at the end of your post you seem to suggest that the energy usage has not increased with population. However, what you have plotted is (AFAIK) the energy usage per capita – so the total energy use has in fact increased a lot, I do not see what is impressive about that.


  2. While the population figure may not be too relevant to the energy use per capita (doh!), I still think there’s something impressive that people today with all their computers and electronic devices are using the same or less energy than 35 years ago.

Comments are closed.