Expanding The Scope Of Reproducibility Research Through Data Analysis Replications
- Jake Hofman ,
- Daniel G. Goldstein ,
- Siddhartha Sen ,
- Forough Poursabzi-Sangdeh
The Web Conference Workshop on Innovative Ideas in Data Science (IID) |
In recent years, researchers in several scientific disciplines have become concerned with published studies replicating less often than expected. A positive side effect of this concern is an increased appreciation for replicating other researchers’ work as a vital part of the scientific process. To date, many such efforts have come from the experimental sciences, where replication entails running new experiments, generating new data, and analyzing it. In this article, we emphasize not experimental replication but data analysis replication. We do so for three reasons. First, experimental replication excludes entire classes of publications that do not run experiments or even collect original data (for example, papers that make use of economic data, census data, municipal data, and the like). Second, experimental replication may in some cases be a needlessly high bar: there is great value in replicating the data analyses of published experimental work. As analytical replications require a lower investment of time and money than experimental replications, their adoption should expand the number and variety of scientific reproducibility studies undertaken. Third, we propose educating undergraduate students to perform data analysis replications, which has scalable benefits for both the students themselves and the broader research community. In our talk we will provide details of a pilot program we created to teach undergraduates the skills necessary to conduct data analysis replications, and include a case study of the first set of students who completed this program and attempted to replicate a widely-cited social science paper on policing.