Download pdf version

1 Introduction

The field of reproducible science as implemented in RStudio is moving very fast and it is hard to keep up with new R packages and tools. The instructor is attempting at providing students with relevant online resources to complement class materials.

2 Podcast

The challenge of reproducing results from ten-year-old code:

3 Library of open-access books

Please find here a suite of open-access books written with the R bookdown package, which are relevant to the EEB603 - Reproducible Science course.

4 R keypoints

Here are three important points for R coding:

  1. R’s basic data types are character, numeric, integer, complex, and logical. Read more about R data types here.
  2. R’s basic data structures include the vector, list, matrix, data frame, and factors. Some of these structures require that all members be of the same data type (e.g. vectors, matrices) while others permit multiple data types (e.g. lists, data frames). Read more about R data structures here.
  3. R objects may have attributes, such as name, dimension, and class. Read more about R objects here.

5 R Markdown

Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. Please find below additional key resources on this topic:

6 Writing books and complex/large documents

R Markdown can be used to produce basic HTML, PDF, and Word documents; however, more complex and larger projects can become difficult to manage in a single R Markdown file. The bookdown package addresses this limitation, and offers several key improvements:

  • Books and reports can be built from multiple R Markdown files.
  • Additional formatting features are added, such as cross-referencing, and numbering of figures, equations, and tables.
  • Documents can easily be exported in a range of formats suitable for publishing, including PDF, e-books and HTML websites.

A book dedicated to this topic can be accessed here and the package can be downloaded here.

7 Notebook

An R Notebook is an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input.

You can create a new notebook in RStudio with the menu command File -> New File -> R Notebook, or by using the html_notebook output type in your document’s YAML metadata.

8 Homework assignments on RPubs

A huge number of homework assignments have been published to the website https://RPubs.com (a free publishing platform provided by RStudio), which shows that R Markdown is easy and convenient enough for students to do their homework assignments. As a TA or GA, you might even consider asking your students to complete their homework using R Markdown and publishing them on RPubs!

9 Journals

Submitting scientific manuscripts written in R Markdown is still challenging; however the R rticles package was designed to simplify the creation of documents that conform to submission standards for academic journals. The package provides a suite of custom R Markdown LaTeX formats and templates for the following journals/publishers that are relevant to the EEB program:

  • Biometrics articles
  • Elsevier journal submissions
  • Frontiers articles
  • MDPI journal submissions
  • PeerJ articles
  • PNAS articles
  • Royal Society Open Science journal submissions
  • Sage journal submissions
  • Springer journal submissions
  • The R Journal articles
  • Taylor & Francis articles

10 Preparing literature

The doi2bib web tool allows you to generate BibTex formatted literature from DOIs.

11 Interactive tutorials

The R learnr package allows to turn R Markdown documents into interactive tutorials. Tutorials consist of content along with interactive components for checking and reinforcing understanding. Tutorials can include any or all of the following:

  • Narrative, figures, illustrations, and equations.
  • Code exercises (R code chunks that users can edit and execute directly).
  • Quiz questions.
  • Videos (currently supported services include YouTube and Vimeo).
  • Interactive Shiny components.

Please see the full documentation (with examples) at this url: https://rstudio.github.io/learnr/

13 Special Issue on Reproducibility and Research Integrity

BMC Research Notes published a special issue on Reproducibility and research integrity in 2022. You can consult all the publications here.

14 Publications used in this class

Here is a list of the publications used in this class.

[1] M. Cargill and P. O’Connor. _ Writing Scientific Research Articles: Strategy and Steps _. Wiley-Blackwell , .

[2] L. Allen, J. Scott, A. Brand, et al. “Publishing: Credit where credit is due”. In: Nature 508 (2014), pp. 312-313. DOI: 10.1038/508312a.

[3] M. Baker. “1,500 scientists lift the lid on reproducibility”. In: Nature 533.7604 (May. 2016), pp. 452-454. DOI: 10.1038/533452a. https://doi.org/10.1038/533452a.

[4] R. Barron, P. Martinez, M. Serpe, et al. “Development of an In Vitro Method of Propagation for Artemisia tridentata subsp. tridentata to Support Genome Sequencing and Genotype-by-Environment Research”. In: Plants 9.12 (2020). ISSN: 2223-7747. DOI: 10.3390/plants9121717. https://www.mdpi.com/2223-7747/9/12/1717.

[5] R. E. Bone, J. A. C. Smith, N. Arrigo, et al. “A macro-ecological perspective on crassulacean acid metabolism (CAM) photosynthesis evolution in Afro-Madagascan drylands: Eulophiinae orchids as a case study”. In: New Phytologist 208.2 (2015). 2015-19192, pp. 469-481. ISSN: 1469-8137. DOI: 10.1111/nph.13572. http://dx.doi.org/10.1111/nph.13572.

[6] J. Boulter, M. L. Orozco Morales, N. Principe, et al. “What is Kindness in Science and why does it matter?” In: Immunology & Cell Biology 101.2 (2023), pp. 97-103. DOI: https://doi.org/10.1111/imcb.12580. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/imcb.12580. https://onlinelibrary.wiley.com/doi/abs/10.1111/imcb.12580.

[7] British Ecological Society, ed. A Guide to Data Management in Ecology and Evolution. British Ecological Society. 2014.

[8] British Ecological Society, ed. A Guide to Getting Published in Ecology and Evolution. British Ecological Society. 2014.

[9] British Ecological Society, ed. A Guide to Peer Review in Ecology and Evolution. British Ecological Society. 2014.

[10] British Ecological Society, ed. A Guide to Reproducible Code in Ecology and Evolution. British Ecological Society. 2014.

[11] S. R. Carroll, E. Herczog, M. Hudson, et al. “Operationalizing the CARE and FAIR Principles for Indigenous data futures”. In: Scientific Data 8.1 (Apr. 2021), p. 108. ISSN: 2052-4463. DOI: 10.1038/s41597-021-00892-0. https://doi.org/10.1038/s41597-021-00892-0.

[12] H. J. Cole, D. G. Gomes, and J. R. Barber. “EcoCountHelper: an R package and analytical pipeline for the analysis of ecological count data using GLMMs, and a case study of bats in Grant Teton National Park”. In: PeerJ 10 (Dec. 2022), p. e14509. DOI: 10.7717/peerj.14509. https://doi.org/10.7717/peerj.14509.

[13] D. L. Donoho. “An invitation to reproducible computational research”. In: Biostatistics 11.3 (2010), pp. 385-388. DOI: 10.1093/biostatistics/kxq028. eprint: /oup/backfile/content_public/journal/biostatistics/11/3/10.1093/biostatistics/kxq028/2/kxq028.pdf. http://dx.doi.org/10.1093/biostatistics/kxq028.

[14] P. Ellestad, M. A. Pérez-Farrera, and S. Buerki. “Genomic Insights into Cultivated Mexican Vanilla planifolia Reveal High Levels of Heterozygosity Stemming from Hybridization”. In: Plants 11.16 (Aug. 2022), p. 2090. DOI: 10.3390/plants11162090. https://doi.org/10.3390/plants11162090.

[15] C. W. Fox and C. S. Burns. “The relationship between manuscript title structure and success: editorial decisions and citation performance for an ecological journal”. In: Ecology and Evolution 5.10 (2015), pp. 1970-1980. DOI: 10.1002/ece3.1480. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/ece3.1480. https://onlinelibrary.wiley.com/doi/abs/10.1002/ece3.1480.

[16] L. P. Freedman, I. M. Cockburn, and T. S. Simcoe. “The Economics of Reproducibility in Preclinical Research”. In: PLOS Biology 13.6 (Jun. 2015), p. e1002165. DOI: 10.1371/journal.pbio.1002165. https://doi.org/10.1371/journal.pbio.1002165.

[17] C. Gandrud. Reproducible Research with R and RStudio. Ed. by C. Gandrud. CRC Press, 2015, p. 294. ISBN: 1466572841.

[18] C. Gandrud. repmis: Miscellaneous Tools for Reproducible Research. R package version 0.5. 2016. https://CRAN.R-project.org/package=repmis.

[19] Q. Groom, L. Weatherdon, and I. R. Geijzendorffer. “Is citizen science an open science in the case of biodiversity observations?” In: Journal of Applied Ecology 54.2 (2017), pp. 612-617. DOI: https://doi.org/10.1111/1365-2664.12767. eprint: https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/1365-2664.12767. https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/1365-2664.12767.

[20] Y. Guangchuang, S. D. K., Z. Huachen, et al. “ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data”. In: Methods in Ecology and Evolution 8.1 (2017), pp. 28-36. DOI: 10.1111/2041-210X.12628. eprint: https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041-210X.12628. https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12628.

[21] J. Hester, G. Csárdi, H. Wickham, et al. remotes: R Package Installation from Remote Repositories, Including GitHub. R package version 2.1.0. 2019. https://github.com/r-lib/remotes#readme.

[22] N. L. Kerr. “HARKing: Hypothesizing After the Results are Known”. In: Personality and Social Psychology Review 2.3 (1998), pp. 196-217. https://doi.org/10.1207/s15327957pspr0203_4.

[23] S. Y. Khoo. “Article Processing Charge Hyperinflation and Price Insensitivity: An Open Access Sequel to the Serials Crisis”. In: LIBER Quarterly: The Journal of the Association of European Research Libraries 29.1 (May. 2019), p. 1–18. DOI: 10.18352/lq.10280. https://liberquarterly.eu/article/view/10729.

[24] F. Markowetz. “Five selfish reasons to work reproducibly”. In: Genome Biology 16.1 (Dec. 2015), p. 274. ISSN: 1474-760X. DOI: 10.1186/s13059-015-0850-7. https://doi.org/10.1186/s13059-015-0850-7.

[25] M. R. Munafo, B. A. Nosek, D. V. M. Bishop, et al. “A manifesto for reproducible science”. English. In: Nature human behaviour 1.0021 (Jan. 2017), p. 0021. ISSN: 2397-3374. DOI: 10.1038/s41562-016-0021.

[26] B. A. Nosek, G. Alter, G. C. Banks, et al. “Promoting an open research culture”. In: Science 348.6242 (2015), pp. 1422-1425. ISSN: 0036-8075. DOI: 10.1126/science.aab2374. eprint: http://science.sciencemag.org/content/348/6242/1422.full.pdf. http://science.sciencemag.org/content/348/6242/1422.

[27] B. A. Nosek, C. R. Ebersole, A. C. DeHaven, et al. “The preregistration revolution”. In: Proceedings of the National Academy of Sciences 115.11 (2018), pp. 2600-2606. ISSN: 0027-8424. DOI: 10.1073/pnas.1708274114. eprint: http://www.pnas.org/content/115/11/2600.full.pdf. http://www.pnas.org/content/115/11/2600.

[28] B. A. Nosek, J. R. Spies, and M. Motyl. “Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability”. In: Perspectives on Psychological Science 7.6 (2012), pp. 615-631. DOI: 10.1177/1745691612459058. https://doi.org/10.1177/1745691612459058.

[29] R. D. Peng and S. C. Hicks. “Reproducible Research: A Retrospective”. In: Annual Review of Public Health 42.1 (2021). PMID: 33467923, pp. 79-93. DOI: 10.1146/annurev-publhealth-012420-105110. eprint: https://doi.org/10.1146/annurev-publhealth-012420-105110

. https://doi.org/10.1146/annurev-publhealth-012420-105110.

[30] R Core Team. foreign: Read Data Stored by ‘Minitab’, ‘S’, ‘SAS’, ‘SPSS’, ‘Stata’, ‘Systat’, ‘Weka’, ‘dBase’, … R package version 0.8-75. 2020. https://CRAN.R-project.org/package=foreign.

[31] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2019. https://www.R-project.org/.

[32] RStudio Team. RStudio: Integrated Development Environment for R. RStudio, PBC.. Boston, MA, 2020. http://www.rstudio.com/.

[33] D. Sarewitz. “The pressure to publish pushes down quality”. In: Nature 533.7602 (May. 2016), pp. 147-147. DOI: 10.1038/533147a. https://doi.org/10.1038/533147a.

[34] J. F. Smith, T. H. Parker, S. Nakagawa, et al. “Promoting Transparency in Evolutionary Biology and Ecology”. In: Systematic Botany 41.3 (Jul. 2016), pp. 495-497. ISSN: 0363-6445. DOI: 10.1600/036364416X692262. http://www.bioone.org/doi/abs/10.1600/036364416X692262.

[35] D. E. Soltis, V. B. Smocovitis, K. K. Pham, et al. “Rethinking the Ph.D. dissertation in botany: Widening the circle”. In: American Journal of Botany 110.3 (2023), p. e16136. DOI: https://doi.org/10.1002/ajb2.16136. eprint: https://bsapubs.onlinelibrary.wiley.com/doi/pdf/10.1002/ajb2.16136. https://bsapubs.onlinelibrary.wiley.com/doi/abs/10.1002/ajb2.16136.

[36] A. Trisovic, M. K. Lau, T. Pasquier, et al. “A large-scale study on research code quality and execution”. In: Scientific Data 9.1 (Feb. 2022), p. 60. ISSN: 2052-4463. DOI: 10.1038/s41597-022-01143-6. https://doi.org/10.1038/s41597-022-01143-6.

[37] J. Troudet, R. Vignes-Lebbe, P. Grandcolas, et al. “The Increasing Disconnection of Primary Biodiversity Data from Specimens: How Does It Happen and How to Handle It?” In: Systematic Biology (2018), p. syy044. DOI: 10.1093/sysbio/syy044. eprint: /oup/backfile/content_public/journal/sysbio/pap/10.1093_sysbio_syy044/3/syy044.pdf. http://dx.doi.org/10.1093/sysbio/syy044.

[38] K. Wagenknecht, T. Woods, F. G. Sanz, et al. “EU-Citizen.Science: A Platform for Mainstreaming Citizen Science and Open Science in Europe”. In: Data Intelligence 3.1 (Feb. 2021), pp. 136-149. ISSN: 2641-435X. DOI: 10.1162/dint_a_00085. eprint: https://direct.mit.edu/dint/article-pdf/3/1/136/1893818/dint\_a\_00085.pdf. https://doi.org/10.1162/dint%5C_a%5C_00085.

[39] A. S. Wagner, L. K. Waite, M. Wierzba, et al. “FAIRly big: A framework for computationally reproducible processing of large-scale data”. In: Scientific Data 9.1 (Mar. 2022), p. 80. ISSN: 2052-4463. DOI: 10.1038/s41597-022-01163-2. https://doi.org/10.1038/s41597-022-01163-2.

[40] H. Wickham. Advanced R. Chapman & Hall/CRC The R Series. Taylor & Francis, 2014. ISBN: 9781466586963. https://books.google.com/books?id=PFHFNAEACAAJ.

[41] H. Wickham and J. Bryan. readxl: Read Excel Files. R package version 1.3.1. 2019. https://CRAN.R-project.org/package=readxl.

[42] H. Wickham and G. Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 1st. O’Reilly Media, Inc., 2017. ISBN: 1491910399, 9781491910399. http://r4ds.had.co.nz.

[43] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, et al. “The FAIR Guiding Principles for scientific data management and stewardship”. In: Scientific Data 3.1 (Mar. 2016), p. 160018. ISSN: 2052-4463. DOI: 10.1038/sdata.2016.18. https://doi.org/10.1038/sdata.2016.18.

[44] J. W. Williams, A. Taylor, K. A. Tolley, et al. “Shifts to open access with high article processing charges hinder research equity and careers”. In: Journal of Biogeography 50.9 (Jul. 2023), pp. 1485-1489. DOI: 10.1111/jbi.14697. https://doi.org/10.1111/jbi.14697.

[45] J. M. A. Wojahn, S. J. Galla, A. E. Melton, et al. “G2PMineR: A Genome to Phenome Literature Review Approach”. In: Genes 12.2 (Feb. 2021), p. 293. DOI: 10.3390/genes12020293. https://doi.org/10.3390/genes12020293.

15 Appendix 1

Citations of all R packages used to generate this report.

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.21. 2023. https://CRAN.R-project.org/package=rmarkdown.

[2] C. Boettiger. knitcitations: Citations for Knitr Markdown Files. R package version 1.0.12. 2021. https://github.com/cboettig/knitcitations.

[3] M. C. Koohafkan. kfigr: Integrated Code Chunk Anchoring and Referencing for R Markdown Documents. R package version 1.2.1. 2021. https://github.com/mkoohafkan/kfigr.

[4] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2022. https://www.R-project.org/.

[5] H. Wickham, J. Bryan, and M. Barrett. usethis: Automate Package and Project Setup. R package version 2.1.6. 2022. https://CRAN.R-project.org/package=usethis.

[6] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.0.9. 2022. https://CRAN.R-project.org/package=dplyr.

[7] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.4. 2022. https://CRAN.R-project.org/package=devtools.

[8] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. ISBN 978-1138700109. Boca Raton, Florida: Chapman and Hall/CRC, 2016. https://bookdown.org/yihui/bookdown.

[9] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.33. 2023. https://CRAN.R-project.org/package=bookdown.

[10] Y. Xie. Dynamic Documents with R and knitr. 2nd. ISBN 978-1498716963. Boca Raton, Florida: Chapman and Hall/CRC, 2015. https://yihui.org/knitr/.

[11] Y. Xie. formatR: Format R Code Automatically. R package version 1.12. 2022. https://github.com/yihui/formatR.

[12] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014.

[13] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.42. 2023. https://yihui.org/knitr/.

[14] Y. Xie and J. Allaire. tufte: Tufte’s Styles for R Markdown Documents. R package version 0.12. 2022. https://github.com/rstudio/tufte.

[15] Y. Xie, J. Allaire, and G. Grolemund. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman and Hall/CRC, 2018. ISBN: 9781138359338. https://bookdown.org/yihui/rmarkdown.

[16] Y. Xie, C. Dervieux, and E. Riederer. R Markdown Cookbook. Boca Raton, Florida: Chapman and Hall/CRC, 2020. ISBN: 9780367563837. https://bookdown.org/yihui/rmarkdown-cookbook.

[17] H. Zhu. kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.3.4. 2021. https://CRAN.R-project.org/package=kableExtra.