The Github public timeline is up on bigquery and I decided I’d play around with it. I created this visualization which is a first (alright, like eighth) try at measuring “how social” a project really is. The colors correspond to different programming languages and the size of the arc is based on the number of distinct collaborators on a project.
Other attempts
I thought about looking only at Pull Requests. It does uncover some interesting projects which have a lot of pull requests. I think this penalizes projects like Linux which doesn’t really have pull requests. Also, I wasn’t sure code submissions alone were exactly what I was looking for. I also briefly looked at only merged pull requests. I ended up filtering out projects with no stated language partially because there were a number of projects that just had common names (eg. “test).
While doing this, I figured I would have heard of the most social projects, but that there would be a lot of projects I’d never heard of for a variety of reasons. Some projects can get a lot of watchers, forks, comments or issues from an entirely separate group of people from the people I follow.
Most social projects by language
The visualization has the full dataset, but here’s a taste of the data:
- C – php-src, linux, mruby
- C++ – mosh, mysql, fr_public
- JavaScript – bootstrap, meteor, jquery-file-upload
- PHP – symfony, codeigniter, foundation
- Python – django, legit, flask
- Ruby – sample_app, rails, first_app