Curry On
July 15-16th, 2019

Getting everything wrong without doing anything right! On the perils of large-scale analysis of Github data
Jan Vitek
Northeastern University


Github has a wealth of data, trying to mine those data for insights about the software development process is irresistible. This talk is a cautionary tale of what can go wrong if care and healthy skepticism are not applied to the results obtained from data torture.

I will tell you about a study that aimed to link the choice of programming language to software defect and how that study failed at more or less every juncture. This talk will touch on how reproduction studies can help us regain trust in the results we cite and on how to make your work reproducible.


Jan Vitek is a Professor of Computer Science at Northeastern University. He holds degrees from the University of Geneva (PhD’99, BS’89) and University of Victoria (MS’95). Professor Vitek works on topics related to the design and implementation of programming languages. In the Ovm project, he led the implementation of the first real-time Java virtual machine to be successfully flight-tested. Together with Noble and Potter, he proposed a concept that became known as Ownership Types. Prof Vitek was one of the designers of the Thorn language. He works on gaining a better understanding of the JavaScript language and is now looking at supporting scalable data analysis in R. Prof. Vitek chaired ACM SIGPLAN; he was the Chief Scientist at Fiji Systems and part of the founding team at, a vice chair of AITO; a vice chair of IFIP WG 2.4. He chaired SPLASH, PLDI, ECOOP, ISMM and LCTES and was program chair of ESOP, ECOOP, VEE, Coordination, and TOOLS. Vitek has started a number of successful workshop series, including MOS on Mobile Objects, IWACO, STOP, and TRANSACT. He is/was on the steering committees of ECOOP, JTRES, TRANSACT, ICFP, OOPSLA, POPL, PLDI and LCTES.