Summaries of procedures applied in this project to sequence, phase and annotate the sagebrush genome based on the G2_b24_1
individual line is presented here. Click here to learn more about G2_b24_1
. We are briefly presenting below our predicted sequence data and biomass requirements for each sequencing technology.
Our sequencing and assembly strategy is described in section 2, but a summary of sequence data and their associated NGS platforms is provided below (see also Table 1.1 for biomass requirement and Table 1.2 for sequencing data per technology):
We are estimating that ca. 120 gr of leaf biomass are necessary for genome sequencing, phasing and annotation. Table 1.1 provides a summary of sequencing technologies applied in this project, their purpose and biomass requirements. These data do not account for a preliminary DNA extraction trial (to conduct PacBio sequencing).
Type | Purpose | Unit/Biomass | Total units / biomass | Number of plantlets |
---|---|---|---|---|
Illumina HiSeq | Genome size and complexity (incl. haploid draft genome) | 1 Illumina library = 20 mg | 1 library = 20 mg | 1 |
PacBio sequencing | De novo genome assembly | 1 cell = 20 gr | 5 cells = 100 gr | 125 |
Proximity ligation (Hi-C for phasing genome) | Phasing | 1 Illumina library = 6 gr | 3 libraries = 18 gr | 23 |
RNA-seq | Annotation | 1 library = 20 mg | 1 library = 20 mg | 1 |
In the case of the Illumina HiSeq, although we will only produce 1 library, it will be dispatched on five HiSeq runs.
The amount of data (in Gbp) produced per sequencing technology for de novo genome assembly is provided in Table 1.2 together with estimates of haploid genome coverage (x). We aim at sequencing the sagebrush genome between 50-100x. Please see Wet-lab procedures for more details on these data.
Type | Purpose | Data (Gbp)/Run | N. runs | Total data (Gbp) | Haploid genome coverage (x) |
---|---|---|---|---|---|
Illumina HiSeq | Genome size and complexity (incl. haploid draft genome) | 105 | 5 | 525 | 116.7 |
PacBio sequencing | De novo genome assembly | 50 | 5 | 250 | 55.6 |
Proximity ligation (Hi-C for phasing genome) | Phasing | 30 | 3 | 90 | 20.0 |
Completion of the wet-lab work by Dovetail Genomics detailed here is predicted to take 28 weeks (7 months) upon receipt of the biomass. To gain insights into the timetable for biomass production, please click here.
G2_b24_1
. This will be done by counting chromosomes (based on root squashes), inferring 2C genome size using flow cytometry (based on root and leaf tissues) and estimating genome size and complexity by applying a k-mer approach on Illumina HiSeq data (5 runs, each 2x150bp; see Table 1.2 for more details). In addition, the Illumina data will be used to assemble a haploid draft genome (which should have a coverage of ca. 100x). The sequencing will be outsourced to GENEWIZ.Citations of all R packages used to generate this report.
[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.6. 2020. <URL: https://github.com/rstudio/rmarkdown>.
[2] C. Boettiger. knitcitations: Citations for Knitr Markdown Files. R package version 1.0.10. 2019. <URL: https://github.com/cboettig/knitcitations>.
[3] J. Bryan. googlesheets4: Access Google Sheets using the Sheets API V4. R package version 0.2.0. 2020. <URL: https://github.com/tidyverse/googlesheets4>.
[4] J. Bryan, C. Citro, and H. Wickham. gargle: Utilities for Working with Google APIs. R package version 0.5.0. 2020. <URL: https://CRAN.R-project.org/package=gargle>.
[5] J. Cheng, B. Karambelkar, and Y. Xie. leaflet: Create Interactive Web Maps with the JavaScript Leaflet Library. R package version 2.0.3. 2019. <URL: http://rstudio.github.io/leaflet/>.
[6] D. Ebbert. chisq.posthoc.test: A Post Hoc Analysis for Pearson’s Chi-Squared Test for Count Data. R package version 0.1.2. 2019. <URL: http://chisq-posthoc-test.ebbert.nrw/>.
[7] G. Grolemund and H. Wickham. “Dates and Times Made Easy with lubridate.” In: Journal of Statistical Software 40.3 (2011), pp. 1-25. <URL: https://www.jstatsoft.org/v40/i03/>.
[8] T. Hothorn, A. Zeileis, R. W. Farebrother, et al. lmtest: Testing Linear Regression Models. R package version 0.9-38. 2020. <URL: https://CRAN.R-project.org/package=lmtest>.
[9] S. Jackman, A. Tahk, A. Zeileis, et al. pscl: Political Science Computational Laboratory. R package version 1.5.5. 2020. <URL: http://github.com/atahk/pscl>.
[10] A. Kassambara. ggpubr: ggplot2 Based Publication Ready Plots. R package version 0.4.0. 2020. <URL: https://rpkgs.datanovia.com/ggpubr/>.
[11] M. C. Koohafkan. kfigr: Integrated Code Chunk Anchoring and Referencing for R Markdown Documents. R package version 1.2. 2015. <URL: https://github.com/mkoohafkan/kfigr>.
[12] R. Lenth. emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.5.2-1. 2020. <URL: https://github.com/rvlenth/emmeans>.
[13] R. Lenth. lsmeans: Least-Squares Means. R package version 2.30-0. 2018. <URL: https://CRAN.R-project.org/package=lsmeans>.
[14] R. V. Lenth. “Least-Squares Means: The R Package lsmeans.” In: Journal of Statistical Software 69.1 (2016), pp. 1-33. DOI: 10.18637/jss.v069.i01.
[15] E. Neuwirth. RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. 2014. <URL: https://CRAN.R-project.org/package=RColorBrewer>.
[16] E. Paradis, S. Blomberg, B. Bolker, et al. ape: Analyses of Phylogenetics and Evolution. R package version 5.4-1. 2020. <URL: http://ape-package.ird.fr/>.
[17] E. Paradis and K. Schliep. “ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R.” In: Bioinformatics 35 (2019), pp. 526-528.
[18] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2019. <URL: https://www.R-project.org/>.
[19] K. Ren and K. Russell. formattable: Create Formattable Data Structures. R package version 0.2.0.1. 2016. <URL: https://CRAN.R-project.org/package=formattable>.
[20] B. Ripley. MASS: Support Functions and Datasets for Venables and Ripley’s MASS. R package version 7.3-53. 2020. <URL: http://www.stats.ox.ac.uk/pub/MASS4/>.
[21] M. R. Smith. TreeTools: Create, Modify and Analyse Phylogenetic Trees. R package version 1.4.0. 2020. <URL: https://CRAN.R-project.org/package=TreeTools>.
[22] V. Spinu, G. Grolemund, and H. Wickham. lubridate: Make Dealing with Dates a Little Easier. R package version 1.7.9.2. 2020. <URL: https://CRAN.R-project.org/package=lubridate>.
[23] W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Fourth. ISBN 0-387-95457-0. New York: Springer, 2002. <URL: http://www.stats.ox.ac.uk/pub/MASS4/>.
[24] G. R. Warnes, B. Bolker, L. Bonebakker, et al. gplots: Various R Programming Tools for Plotting Data. R package version 3.1.0. 2020. <URL: https://github.com/talgalili/gplots>.
[25] H. Wickham. forcats: Tools for Working with Categorical Variables (Factors). R package version 0.5.0. 2020. <URL: https://CRAN.R-project.org/package=forcats>.
[26] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. <URL: https://ggplot2.tidyverse.org>.
[27] H. Wickham. stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. 2019. <URL: https://CRAN.R-project.org/package=stringr>.
[28] H. Wickham and J. Bryan. usethis: Automate Package and Project Setup. R package version 2.0.0. 2020. <URL: https://CRAN.R-project.org/package=usethis>.
[29] H. Wickham, W. Chang, L. Henry, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.3.3. 2020. <URL: https://CRAN.R-project.org/package=ggplot2>.
[30] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.0.2. 2020. <URL: https://CRAN.R-project.org/package=dplyr>.
[31] H. Wickham, J. Hester, and W. Chang. devtools: Tools to Make Developing R Packages Easier. R package version 2.3.2. 2020. <URL: https://CRAN.R-project.org/package=devtools>.
[32] H. Wickham and D. Seidel. scales: Scale Functions for Visualization. R package version 1.1.1. 2020. <URL: https://CRAN.R-project.org/package=scales>.
[33] C. O. Wilke. ggridges: Ridgeline Plots in ggplot2. R package version 0.5.2. 2020. <URL: https://wilkelab.org/ggridges>.
[34] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. ISBN 978-1138700109. Boca Raton, Florida: Chapman and Hall/CRC, 2016. <URL: https://github.com/rstudio/bookdown>.
[35] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.21. 2020. <URL: https://github.com/rstudio/bookdown>.
[36] Y. Xie. Dynamic Documents with R and knitr. 2nd. ISBN 978-1498716963. Boca Raton, Florida: Chapman and Hall/CRC, 2015. <URL: https://yihui.org/knitr/>.
[37] Y. Xie. formatR: Format R Code Automatically. R package version 1.7. 2019. <URL: https://github.com/yihui/formatR>.
[38] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R.” In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. <URL: http://www.crcpress.com/product/isbn/9781466561595>.
[39] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.30. 2020. <URL: https://yihui.org/knitr/>.
[40] Y. Xie, J. Allaire, and G. Grolemund. R Markdown: The Definitive Guide. ISBN 9781138359338. Boca Raton, Florida: Chapman and Hall/CRC, 2018. <URL: https://bookdown.org/yihui/rmarkdown>.
[41] Y. Xie, J. Cheng, and X. Tan. DT: A Wrapper of the JavaScript Library DataTables. R package version 0.16. 2020. <URL: https://github.com/rstudio/DT>.
[42] Y. Xie, C. Dervieux, and E. Riederer. R Markdown Cookbook. ISBN 9780367563837. Boca Raton, Florida: Chapman and Hall/CRC, 2020. <URL: https://bookdown.org/yihui/rmarkdown-cookbook>.
[43] G. Yu and T. T. Lam. ggtree: an R package for visualization of tree and annotation data. R package version 2.0.4. 2020. <URL: https://yulab-smu.github.io/treedata-book/>.
[44] G. Yu, T. T. Lam, H. Zhu, et al. “Two methods for mapping and visualizing associated data on phylogeny using ggtree.” In: Molecular Biology and Evolution 35 (2 2018), pp. 3041-3043. DOI: 10.1093/molbev/msy194. <URL: https://doi.org/10.1093/molbev/msy194>.
[45] G. Yu, D. Smith, H. Zhu, et al. “ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data.” In: Methods in Ecology and Evolution 8 (1 2017), pp. 28-36. DOI: 10.1111/2041-210X.12628. <URL: http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12628/abstract>.
[46] A. Zeileis and G. Grothendieck. “zoo: S3 Infrastructure for Regular and Irregular Time Series.” In: Journal of Statistical Software 14.6 (2005), pp. 1-27. DOI: 10.18637/jss.v014.i06.
[47] A. Zeileis, G. Grothendieck, and J. A. Ryan. zoo: S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations). R package version 1.8-8. 2020. <URL: http://zoo.R-Forge.R-project.org/>.
[48] A. Zeileis and T. Hothorn. “Diagnostic Checking in Regression Relationships.” In: R News 2.3 (2002), pp. 7-10. <URL: https://CRAN.R-project.org/doc/Rnews/>.
[49] A. Zeileis, C. Kleiber, and S. Jackman. “Regression Models for Count Data in R.” In: Journal of Statistical Software 27.8 (2008). <URL: http://www.jstatsoft.org/v27/i08/>.
[50] H. Zhu. kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.2.1. 2020. <URL: https://CRAN.R-project.org/package=kableExtra>.
Version information about R, the operating system (OS) and attached or R loaded packages. This appendix was generated using sessionInfo()
.
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gargle_0.5.0 formattable_0.2.0.1 leaflet_2.0.3
## [4] googlesheets4_0.2.0 kableExtra_1.2.1 dplyr_1.0.2
## [7] kfigr_1.2 scales_1.1.1 lubridate_1.7.9.2
## [10] MASS_7.3-53 forcats_0.5.0 TreeTools_1.4.0
## [13] ggridges_0.5.2 stringr_1.4.0 ape_5.4-1
## [16] ggtree_2.0.4 ggpubr_0.4.0 ggplot2_3.3.3
## [19] chisq.posthoc.test_0.1.2 DT_0.16 lsmeans_2.30-0
## [22] emmeans_1.5.2-1 lmtest_0.9-38 zoo_1.8-8
## [25] pscl_1.5.5 RColorBrewer_1.1-2 gplots_3.1.0
## [28] devtools_2.3.2 usethis_2.0.0 formatR_1.7
## [31] knitcitations_1.0.10 bookdown_0.21 rmarkdown_2.6
## [34] knitr_1.30
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.2.1 fastmatch_1.1-0
## [4] plyr_1.8.6 igraph_1.2.6 lazyeval_0.2.2
## [7] crosstalk_1.1.0.1 digest_0.6.27 htmltools_0.5.0
## [10] fansi_0.4.1 magrittr_2.0.1 memoise_1.1.0
## [13] openxlsx_4.2.2 remotes_2.2.0 R.utils_2.10.1
## [16] askpass_1.1 prettyunits_1.1.1 colorspace_2.0-0
## [19] rvest_0.3.6 haven_2.3.1 rbibutils_1.4
## [22] xfun_0.20 callr_3.5.1 crayon_1.3.4
## [25] jsonlite_1.7.2 phangorn_2.5.5 glue_1.4.2
## [28] gtable_0.3.0 webshot_0.5.2 R.cache_0.14.0
## [31] car_3.0-10 pkgbuild_1.2.0 abind_1.4-5
## [34] mvtnorm_1.1-1 bibtex_0.4.2.3 rstatix_0.6.0
## [37] Rcpp_1.0.5 viridisLite_0.3.0 xtable_1.8-4
## [40] tidytree_0.3.3 foreign_0.8-75 bit_4.0.4
## [43] htmlwidgets_1.5.3 httr_1.4.2 ellipsis_0.3.1
## [46] pkgconfig_2.0.3 R.methodsS3_1.8.1 tidyselect_1.1.0
## [49] rlang_0.4.10 munsell_0.5.0 cellranger_1.1.0
## [52] tools_3.6.1 cli_2.2.0 generics_0.1.0
## [55] broom_0.7.1 evaluate_0.14 yaml_2.2.1
## [58] RefManageR_1.2.12 processx_3.4.5 bit64_4.0.5
## [61] fs_1.5.0 zip_2.1.1 caTools_1.18.0
## [64] purrr_0.3.4 nlme_3.1-149 R.oo_1.24.0
## [67] xml2_1.3.2 compiler_3.6.1 rstudioapi_0.13
## [70] curl_4.3 testthat_3.0.1 ggsignif_0.6.0
## [73] treeio_1.10.0 tibble_3.0.4 stringi_1.5.3
## [76] highr_0.8 ps_1.5.0 desc_1.2.0
## [79] lattice_0.20-41 Matrix_1.2-18 vctrs_0.3.6
## [82] pillar_1.4.7 lifecycle_0.2.0 BiocManager_1.30.10
## [85] Rdpack_2.1 estimability_1.3 data.table_1.13.6
## [88] bitops_1.0-6 gbRd_0.4-11 R6_2.5.0
## [91] KernSmooth_2.23-17 rio_0.5.16 codetools_0.2-16
## [94] sessioninfo_1.1.1 gtools_3.8.2 assertthat_0.2.1
## [97] pkgload_1.1.0 openssl_1.4.3 rprojroot_2.0.2
## [100] withr_2.3.0 parallel_3.6.1 hms_0.5.3
## [103] quadprog_1.5-8 grid_3.6.1 tidyr_1.1.2
## [106] coda_0.19-4 rvcheck_0.1.8 carData_3.0-4
## [109] googledrive_1.0.1