Data analysis of GitHub contributions reveals unexpected gender bias
With over 12 million users, GitHub is one of the largest online communities for collaborating on development projects. Now a team of researchers has done an exhaustive analysis of millions of GitHub pull requests for open source projects, trying to discover whether the contributions of women were accepted less often than the contributions of men. What they discovered was that women's contributions were actually accepted more often than men's—but only if the women had gender-neutral profiles. Women whose GitHub profiles revealed their genders had a much harder time.
The researchers are American computer scientists whose work was approved by an Institutional Review Board (IRB), a group that determines whether experiments on human subjects are ethical or not. They've published a pre-print of their GitHub analysis on PeerJ today and offered a deep look at how they did it.
Finding men and women on GitHub
First, they needed a dataset. Luckily the GHTorrent dataset contains public data on GitHub users, pull requests, and projects up to April 1, 2015. The group writes that they "augmented this GHTorrent data by mining GitHub’s webpages for information about each pull request status, description, and comments." But they had just one problem. GitHub profiles do not include gender information. So the researchers determined the genders of over 1.4 million users by linking their e-mail addresses with G+ profiles that list a gender. They write: