The field of reproducible science as implemented in RStudio is moving very fast and it is hard to keep up with new R packages and tools. The instructor is attempting at providing students with relevant online resources to complement class materials.
The challenge of reproducing results from ten-year-old code:
Please find here a suite of open-access books written with the R bookdown package, which are relevant to the EEB603 - Reproducible Science course.
Here are three important points for R coding:
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. Please find below additional key resources on this topic:
R Markdown can be used to produce basic HTML, PDF, and Word documents; however, more complex and larger projects can become difficult to manage in a single R Markdown file. The bookdown package addresses this limitation, and offers several key improvements:
A book dedicated to this topic can be accessed here and the package can be downloaded here.
An R Notebook is an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input.
You can create a new notebook in RStudio with the menu command File -> New File -> R Notebook
, or by using the html_notebook
output type in your document’s YAML metadata.
A huge number of homework assignments have been published to the website https://RPubs.com (a free publishing platform provided by RStudio), which shows that R Markdown is easy and convenient enough for students to do their homework assignments. As a TA or GA, you might even consider asking your students to complete their homework using R Markdown and publishing them on RPubs!
Submitting scientific manuscripts written in R Markdown is still challenging; however the R rticles package was designed to simplify the creation of documents that conform to submission standards for academic journals. The package provides a suite of custom R Markdown LaTeX formats and templates for the following journals/publishers that are relevant to the EEB program:
The doi2bib web tool allows you to generate BibTex formatted literature from DOIs.
The R learnr package allows to turn R Markdown documents into interactive tutorials. Tutorials consist of content along with interactive components for checking and reinforcing understanding. Tutorials can include any or all of the following:
Please see the full documentation (with examples) at this url: https://rstudio.github.io/learnr/
BMC Research Notes published a special issue on Reproducibility and research integrity in 2022. You can consult all the publications here.
Here is a list of the publications used in this class.
[1] M. Cargill and P. O’Connor. _ Writing Scientific Research Articles: Strategy and Steps _. Wiley-Blackwell , .
[2] L. Allen, J. Scott, A. Brand, et al. “Publishing: Credit where credit is due”. In: Nature 508 (2014), pp. 312-313. DOI: 10.1038/508312a.
[3] M. Baker. “1,500 scientists lift the lid on reproducibility”. In: Nature 533.7604 (May. 2016), pp. 452-454. DOI: 10.1038/533452a. https://doi.org/10.1038/533452a.
[4] R. Barron, P. Martinez, M. Serpe, et al. “Development of an In Vitro Method of Propagation for Artemisia tridentata subsp. tridentata to Support Genome Sequencing and Genotype-by-Environment Research”. In: Plants 9.12 (2020). ISSN: 2223-7747. DOI: 10.3390/plants9121717. https://www.mdpi.com/2223-7747/9/12/1717.
[5] R. E. Bone, J. A. C. Smith, N. Arrigo, et al. “A macro-ecological perspective on crassulacean acid metabolism (CAM) photosynthesis evolution in Afro-Madagascan drylands: Eulophiinae orchids as a case study”. In: New Phytologist 208.2 (2015). 2015-19192, pp. 469-481. ISSN: 1469-8137. DOI: 10.1111/nph.13572. http://dx.doi.org/10.1111/nph.13572.
[6] J. Boulter, M. L. Orozco Morales, N. Principe, et al. “What is Kindness in Science and why does it matter?” In: Immunology & Cell Biology 101.2 (2023), pp. 97-103. DOI: https://doi.org/10.1111/imcb.12580. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/imcb.12580. https://onlinelibrary.wiley.com/doi/abs/10.1111/imcb.12580.
[7] British Ecological Society, ed. A Guide to Data Management in Ecology and Evolution. British Ecological Society. 2014.
[8] British Ecological Society, ed. A Guide to Getting Published in Ecology and Evolution. British Ecological Society. 2014.
[9] British Ecological Society, ed. A Guide to Peer Review in Ecology and Evolution. British Ecological Society. 2014.
[10] British Ecological Society, ed. A Guide to Reproducible Code in Ecology and Evolution. British Ecological Society. 2014.
[11] S. R. Carroll, E. Herczog, M. Hudson, et al. “Operationalizing the CARE and FAIR Principles for Indigenous data futures”. In: Scientific Data 8.1 (Apr. 2021), p. 108. ISSN: 2052-4463. DOI: 10.1038/s41597-021-00892-0. https://doi.org/10.1038/s41597-021-00892-0.
[12] H. J. Cole, D. G. Gomes, and J. R. Barber. “EcoCountHelper: an R package and analytical pipeline for the analysis of ecological count data using GLMMs, and a case study of bats in Grant Teton National Park”. In: PeerJ 10 (Dec. 2022), p. e14509. DOI: 10.7717/peerj.14509. https://doi.org/10.7717/peerj.14509.
[13] D. L. Donoho. “An invitation to reproducible computational research”. In: Biostatistics 11.3 (2010), pp. 385-388. DOI: 10.1093/biostatistics/kxq028. eprint: /oup/backfile/content_public/journal/biostatistics/11/3/10.1093/biostatistics/kxq028/2/kxq028.pdf. http://dx.doi.org/10.1093/biostatistics/kxq028.
[14] P. Ellestad, M. A. Pérez-Farrera, and S. Buerki. “Genomic Insights into Cultivated Mexican Vanilla planifolia Reveal High Levels of Heterozygosity Stemming from Hybridization”. In: Plants 11.16 (Aug. 2022), p. 2090. DOI: 10.3390/plants11162090. https://doi.org/10.3390/plants11162090.
[15] C. W. Fox and C. S. Burns. “The relationship between manuscript title structure and success: editorial decisions and citation performance for an ecological journal”. In: Ecology and Evolution 5.10 (2015), pp. 1970-1980. DOI: 10.1002/ece3.1480. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/ece3.1480. https://onlinelibrary.wiley.com/doi/abs/10.1002/ece3.1480.
[16] L. P. Freedman, I. M. Cockburn, and T. S. Simcoe. “The Economics of Reproducibility in Preclinical Research”. In: PLOS Biology 13.6 (Jun. 2015), p. e1002165. DOI: 10.1371/journal.pbio.1002165. https://doi.org/10.1371/journal.pbio.1002165.
[17] C. Gandrud. Reproducible Research with R and RStudio. Ed. by C. Gandrud. CRC Press, 2015, p. 294. ISBN: 1466572841.
[18] C. Gandrud. repmis: Miscellaneous Tools for Reproducible Research. R package version 0.5. 2016. https://CRAN.R-project.org/package=repmis.
[19] Q. Groom, L. Weatherdon, and I. R. Geijzendorffer. “Is citizen science an open science in the case of biodiversity observations?” In: Journal of Applied Ecology 54.2 (2017), pp. 612-617. DOI: https://doi.org/10.1111/1365-2664.12767. eprint: https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/1365-2664.12767. https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/1365-2664.12767.
[20] Y. Guangchuang, S. D. K., Z. Huachen, et al. “ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data”. In: Methods in Ecology and Evolution 8.1 (2017), pp. 28-36. DOI: 10.1111/2041-210X.12628. eprint: https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041-210X.12628. https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12628.
[21] J. Hester, G. Csárdi, H. Wickham, et al. remotes: R Package Installation from Remote Repositories, Including GitHub. R package version 2.1.0. 2019. https://github.com/r-lib/remotes#readme.
[22] N. L. Kerr. “HARKing: Hypothesizing After the Results are Known”. In: Personality and Social Psychology Review 2.3 (1998), pp. 196-217. https://doi.org/10.1207/s15327957pspr0203_4.
[23] S. Y. Khoo. “Article Processing Charge Hyperinflation and Price Insensitivity: An Open Access Sequel to the Serials Crisis”. In: LIBER Quarterly: The Journal of the Association of European Research Libraries 29.1 (May. 2019), p. 1–18. DOI: 10.18352/lq.10280. https://liberquarterly.eu/article/view/10729.
[24] F. Markowetz. “Five selfish reasons to work reproducibly”. In: Genome Biology 16.1 (Dec. 2015), p. 274. ISSN: 1474-760X. DOI: 10.1186/s13059-015-0850-7. https://doi.org/10.1186/s13059-015-0850-7.
[25] M. R. Munafo, B. A. Nosek, D. V. M. Bishop, et al. “A manifesto for reproducible science”. English. In: Nature human behaviour 1.0021 (Jan. 2017), p. 0021. ISSN: 2397-3374. DOI: 10.1038/s41562-016-0021.
[26] B. A. Nosek, G. Alter, G. C. Banks, et al. “Promoting an open research culture”. In: Science 348.6242 (2015), pp. 1422-1425. ISSN: 0036-8075. DOI: 10.1126/science.aab2374. eprint: http://science.sciencemag.org/content/348/6242/1422.full.pdf. http://science.sciencemag.org/content/348/6242/1422.
[27] B. A. Nosek, C. R. Ebersole, A. C. DeHaven, et al. “The preregistration revolution”. In: Proceedings of the National Academy of Sciences 115.11 (2018), pp. 2600-2606. ISSN: 0027-8424. DOI: 10.1073/pnas.1708274114. eprint: http://www.pnas.org/content/115/11/2600.full.pdf. http://www.pnas.org/content/115/11/2600.
[28] B. A. Nosek, J. R. Spies, and M. Motyl. “Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability”. In: Perspectives on Psychological Science 7.6 (2012), pp. 615-631. DOI: 10.1177/1745691612459058. https://doi.org/10.1177/1745691612459058.
[29] R. D. Peng and S. C. Hicks. “Reproducible Research: A Retrospective”. In: Annual Review of Public Health 42.1 (2021). PMID: 33467923, pp. 79-93. DOI: 10.1146/annurev-publhealth-012420-105110. eprint: https://doi.org/10.1146/annurev-publhealth-012420-105110
. https://doi.org/10.1146/annurev-publhealth-012420-105110.
[30] R Core Team. foreign: Read Data Stored by ‘Minitab’, ‘S’, ‘SAS’, ‘SPSS’, ‘Stata’, ‘Systat’, ‘Weka’, ‘dBase’, … R package version 0.8-75. 2020. https://CRAN.R-project.org/package=foreign.
[31] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2019. https://www.R-project.org/.
[32] RStudio Team. RStudio: Integrated Development Environment for R. RStudio, PBC.. Boston, MA, 2020. http://www.rstudio.com/.
[33] D. Sarewitz. “The pressure to publish pushes down quality”. In: Nature 533.7602 (May. 2016), pp. 147-147. DOI: 10.1038/533147a. https://doi.org/10.1038/533147a.
[34] J. F. Smith, T. H. Parker, S. Nakagawa, et al. “Promoting Transparency in Evolutionary Biology and Ecology”. In: Systematic Botany 41.3 (Jul. 2016), pp. 495-497. ISSN: 0363-6445. DOI: 10.1600/036364416X692262. http://www.bioone.org/doi/abs/10.1600/036364416X692262.
[35] D. E. Soltis, V. B. Smocovitis, K. K. Pham, et al. “Rethinking the Ph.D. dissertation in botany: Widening the circle”. In: American Journal of Botany 110.3 (2023), p. e16136. DOI: https://doi.org/10.1002/ajb2.16136. eprint: https://bsapubs.onlinelibrary.wiley.com/doi/pdf/10.1002/ajb2.16136. https://bsapubs.onlinelibrary.wiley.com/doi/abs/10.1002/ajb2.16136.
[36] A. Trisovic, M. K. Lau, T. Pasquier, et al. “A large-scale study on research code quality and execution”. In: Scientific Data 9.1 (Feb. 2022), p. 60. ISSN: 2052-4463. DOI: 10.1038/s41597-022-01143-6. https://doi.org/10.1038/s41597-022-01143-6.
[37] J. Troudet, R. Vignes-Lebbe, P. Grandcolas, et al. “The Increasing Disconnection of Primary Biodiversity Data from Specimens: How Does It Happen and How to Handle It?” In: Systematic Biology (2018), p. syy044. DOI: 10.1093/sysbio/syy044. eprint: /oup/backfile/content_public/journal/sysbio/pap/10.1093_sysbio_syy044/3/syy044.pdf. http://dx.doi.org/10.1093/sysbio/syy044.
[38] K. Wagenknecht, T. Woods, F. G. Sanz, et al. “EU-Citizen.Science: A Platform for Mainstreaming Citizen Science and Open Science in Europe”. In: Data Intelligence 3.1 (Feb. 2021), pp. 136-149. ISSN: 2641-435X. DOI: 10.1162/dint_a_00085. eprint: https://direct.mit.edu/dint/article-pdf/3/1/136/1893818/dint\_a\_00085.pdf. https://doi.org/10.1162/dint%5C_a%5C_00085.
[39] A. S. Wagner, L. K. Waite, M. Wierzba, et al. “FAIRly big: A framework for computationally reproducible processing of large-scale data”. In: Scientific Data 9.1 (Mar. 2022), p. 80. ISSN: 2052-4463. DOI: 10.1038/s41597-022-01163-2. https://doi.org/10.1038/s41597-022-01163-2.
[40] H. Wickham. Advanced R. Chapman & Hall/CRC The R Series. Taylor & Francis, 2014. ISBN: 9781466586963. https://books.google.com/books?id=PFHFNAEACAAJ.
[41] H. Wickham and J. Bryan. readxl: Read Excel Files. R package version 1.3.1. 2019. https://CRAN.R-project.org/package=readxl.
[42] H. Wickham and G. Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 1st. O’Reilly Media, Inc., 2017. ISBN: 1491910399, 9781491910399. http://r4ds.had.co.nz.
[43] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, et al. “The FAIR Guiding Principles for scientific data management and stewardship”. In: Scientific Data 3.1 (Mar. 2016), p. 160018. ISSN: 2052-4463. DOI: 10.1038/sdata.2016.18. https://doi.org/10.1038/sdata.2016.18.
[44] J. W. Williams, A. Taylor, K. A. Tolley, et al. “Shifts to open access with high article processing charges hinder research equity and careers”. In: Journal of Biogeography 50.9 (Jul. 2023), pp. 1485-1489. DOI: 10.1111/jbi.14697. https://doi.org/10.1111/jbi.14697.
[45] J. M. A. Wojahn, S. J. Galla, A. E. Melton, et al. “G2PMineR: A Genome to Phenome Literature Review Approach”. In: Genes 12.2 (Feb. 2021), p. 293. DOI: 10.3390/genes12020293. https://doi.org/10.3390/genes12020293.
Citations of all R packages used to generate this report.
[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.21. 2023. https://CRAN.R-project.org/package=rmarkdown.
[2] C. Boettiger. knitcitations: Citations for Knitr Markdown Files. R package version 1.0.12. 2021. https://github.com/cboettig/knitcitations.
[3] M. C. Koohafkan. kfigr: Integrated Code Chunk Anchoring and Referencing for R Markdown Documents. R package version 1.2.1. 2021. https://github.com/mkoohafkan/kfigr.
[4] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2022. https://www.R-project.org/.
[5] H. Wickham, J. Bryan, and M. Barrett. usethis: Automate Package and Project Setup. R package version 2.1.6. 2022. https://CRAN.R-project.org/package=usethis.
[6] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.0.9. 2022. https://CRAN.R-project.org/package=dplyr.
[7] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.4. 2022. https://CRAN.R-project.org/package=devtools.
[8] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. ISBN 978-1138700109. Boca Raton, Florida: Chapman and Hall/CRC, 2016. https://bookdown.org/yihui/bookdown.
[9] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.33. 2023. https://CRAN.R-project.org/package=bookdown.
[10] Y. Xie. Dynamic Documents with R and knitr. 2nd. ISBN 978-1498716963. Boca Raton, Florida: Chapman and Hall/CRC, 2015. https://yihui.org/knitr/.
[11] Y. Xie. formatR: Format R Code Automatically. R package version 1.12. 2022. https://github.com/yihui/formatR.
[12] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014.
[13] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.42. 2023. https://yihui.org/knitr/.
[14] Y. Xie and J. Allaire. tufte: Tufte’s Styles for R Markdown Documents. R package version 0.12. 2022. https://github.com/rstudio/tufte.
[15] Y. Xie, J. Allaire, and G. Grolemund. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman and Hall/CRC, 2018. ISBN: 9781138359338. https://bookdown.org/yihui/rmarkdown.
[16] Y. Xie, C. Dervieux, and E. Riederer. R Markdown Cookbook. Boca Raton, Florida: Chapman and Hall/CRC, 2020. ISBN: 9780367563837. https://bookdown.org/yihui/rmarkdown-cookbook.
[17] H. Zhu. kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.3.4. 2021. https://CRAN.R-project.org/package=kableExtra.