Jan 16, 2018

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

Buckheit & Donoho (1995)

Non-reproducible research

Duke Potti scandal

Non-reproducible research

Microarray studies

Nature Genetics (2015 Impact Factor: 31.616). 20 articles about microarray profiling published in Nature Genetics between Jan 2005 and Dec 2006.

Non-reproducible research

Bible code

  • Witztum, Rips, & Rosenberg (1994)

  • McKay, Bar-Natan, Bar-Hillel, & Kalai (1999)

Why reproducible research

  • Reproducibility has been a foundation of science. It helps accumulate scientific knowledge.

  • Greater research impact.

  • Better work habit boosts quality of research.

  • Better teamwork. For you as graduate students, it means better communication with your advisor.

    while true  
      Stud: "that idea you told me to try - it doesn't work!"  
      Prof: "ok. how about trying this instead."
    end

    Unless you reproduce the computing environment (algorithms, dataset, tuning parameters), there's no way others can help you.

How to be reproducible in statistics?

When we publish articles containing figures which were generated by computer, we also publish the complete software environment which generates the figures.

Buckheit & Donoho (1995)

Tools for reproducible research

  • Version control: Git+GitHub.

  • Distribute method implementation, e.g., R packages, on GitHub or bitbucket.

  • Dynamic document: RMarkdown for R or Jupyter for Julia/Python/R.

  • Docker container for reproducing a computing environment.

  • Cloud computing tools.

  • We are going to practice reproducible research now. That is to make your homework reproducible using Git, GitHub, and RMarkdown.

References

Baggerly, K. A., & Coombes, K. R. (2009). Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Ann. Appl. Stat., 3(4), 1309–1334. https://doi.org/10.1214/09-AOAS291

Buckheit, J., & Donoho, D. (1995). WaveLab and reproducible research. In A. Antoniadis & G. Oppenheim (Eds.), Wavelets and statistics (Vol. 103, pp. 55–81). Springer New York. https://doi.org/10.1007/978-1-4612-2544-7_5

McKay, B., Bar-Natan, D., Bar-Hillel, M., & Kalai, G. (1999). Solving the bible code puzzle. Statist. Sci., 14(2), 150–173. https://doi.org/10.1214/ss/1009212243

Potti, A., Dressman, H. K., Bild, A., & Riedel, R. F. (2006). Genomic signatures to guide the use of chemotherapeutics. Nature Medicine, 12(11), 1294–1300. https://doi.org/10.1038/nm1491

Witztum, D., Rips, E., & Rosenberg, Y. (1994). Equidistant letter sequences in the book of genesis. Statist. Sci., 9(3), 429–438. https://doi.org/10.1214/ss/1177010393