Evaluating Bioinformatic Pipeline Performance for Forensic Microbiome Analysis*,†,‡

Kaszubinski, Sierra F.; Pechal, Jennifer L.; Schmidt, Carl J.; Jordan, Heather R.; Benbow, Mark E.; Meek, Mariah H.

Evaluating Bioinformatic Pipeline Performance for Forensic Microbiome Analysis*,†,‡

dc.contributor.author	Kaszubinski, Sierra F.
dc.contributor.author	Pechal, Jennifer L.
dc.contributor.author	Schmidt, Carl J.
dc.contributor.author	Jordan, Heather R.
dc.contributor.author	Benbow, Mark E.
dc.contributor.author	Meek, Mariah H.
dc.date.accessioned	2020-03-17T18:33:16Z
dc.date.available	WITHHELD_13_MONTHS
dc.date.available	2020-03-17T18:33:16Z
dc.date.issued	2020-03
dc.identifier.citation	Kaszubinski, Sierra F.; Pechal, Jennifer L.; Schmidt, Carl J.; Jordan, Heather R.; Benbow, Mark E.; Meek, Mariah H. (2020). "Evaluating Bioinformatic Pipeline Performance for Forensic Microbiome Analysis*,†,‡." Journal of Forensic Sciences 65(2): 513-525.
dc.identifier.issn	0022-1198
dc.identifier.issn	1556-4029
dc.identifier.uri	https://hdl.handle.net/2027.42/154468
dc.description.abstract	Microbial communities have potential evidential utility for forensic applications. However, bioinformatic analysis of high‐throughput sequencing data varies widely among laboratories. These differences can potentially affect microbial community composition and downstream analyses. To illustrate the importance of standardizing methodology, we compared analyses of postmortem microbiome samples using several bioinformatic pipelines, varying minimum library size or minimum number of sequences per sample, and sample size. Using the same input sequence data, we found that three open‐source bioinformatic pipelines, MG‐RAST, mothur, and QIIME2, had significant differences in relative abundance, alpha‐diversity, and beta‐diversity, despite the same input data. Increasing minimum library size and sample size increased the number of low‐abundant and infrequent taxa detected. Our results show that bioinformatic pipeline and parameter choice affect results in important ways. Given the growing potential application of forensic microbiology to the criminal justice system, continued research on standardizing computational methodology will be important for downstream applications.
dc.publisher	John Wiley & Sons Ltd
dc.subject.other	next‐generation sequencing
dc.subject.other	forensic science
dc.subject.other	bioinformatic pipelines
dc.subject.other	forensic microbiology
dc.subject.other	postmortem microbiome
dc.subject.other	microbial communities
dc.title	Evaluating Bioinformatic Pipeline Performance for Forensic Microbiome Analysis*,†,‡
dc.type	Article
dc.rights.robots	IndexNoFollow
dc.subject.hlbsecondlevel	Science (General)
dc.subject.hlbtoplevel	Science
dc.description.peerreviewed	Peer Reviewed
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/154468/1/jfo14213_am.pdf
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/154468/2/jfo14213.pdf
dc.identifier.doi	10.1111/1556-4029.14213
dc.identifier.source	Journal of Forensic Sciences
dc.identifier.citedreference	Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum‐likelihood trees for large alignments. PLoS ONE 2010; 5 ( 3 ): e9490.
dc.identifier.citedreference	Pechal JL, Schmidt CJ, Jordan HR, Benbow ME. Frozen: thawing and its effect on the postmortem microbiome in two pediatric cases. J Forensic Sci 2017; 62 ( 5 ): 1399 – 405.
dc.identifier.citedreference	Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high‐throughput community sequencing data. Nat Methods 2010; 7 ( 5 ): 335 – 6.
dc.identifier.citedreference	Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. Development of a dual‐index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl Environ Microbiol 2013; 79 ( 17 ): 5112 – 20.
dc.identifier.citedreference	Caporaso JG, Knight R, Kelley ST. Host‐associated and free‐living phage communities differ profoundly in phylogenetic composition. PLoS ONE 2011; 6 ( 2 ): e16900.
dc.identifier.citedreference	Caporaso JG, Lauber CL, Walters WA, Berg‐Lyons D, Huntley J, Fierer N, et al. Ultra‐high‐throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J 2012; 6 ( 8 ): 1621 – 4.
dc.identifier.citedreference	Caporaso JG, Luaber CL, Costello EK, Berg‐Lyons D, Gonzalez A, Stombaugh J, et al. Moving pictures of the human microbiome. Genome Biol 2011; 12 ( 5 ): R50.
dc.identifier.citedreference	Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG‐RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc 2010; 2010 ( 1 ): pdb.prot5368.
dc.identifier.citedreference	Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web‐based tools. Nucleic Acids Res 2013; 41 ( Database issue ): D590 – D596.
dc.identifier.citedreference	Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high resolution sample inference from Illumina amplicon data. Nat Methods 2016; 13 ( 7 ): 581 – 3.
dc.identifier.citedreference	Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013; 30 ( 4 ): 772 – 80.
dc.identifier.citedreference	McDonald D, Clemente JC, Kuczynski J, Rideout JR, Stombaugh J, Wendel D, et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome‐ome. Gigascience 2012; 1 ( 1 ): 7.
dc.identifier.citedreference	Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010; 26 ( 19 ): 2460 – 1.
dc.identifier.citedreference	Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 2016; 4: e2584.
dc.identifier.citedreference	McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 2013; 8 ( 4 ): e61217.
dc.identifier.citedreference	R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2018.
dc.identifier.citedreference	Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis 2015; 26: 27663.
dc.identifier.citedreference	Kruskal WH, Wallis WA. Use of ranks in one‐criterion variance analysis. J Am Stat Assoc 1952; 47 ( 260 ): 583 – 621.
dc.identifier.citedreference	Nemenyi P. Distribution‐free multiple comparisons [dissertation]. Princeton, NJ: Princeton University, 1963.
dc.identifier.citedreference	Pohlert T. The Pairwise Multiple Comparison of Mean Ranks Package (PMCMR), 2014. http://CRAN.R-project.org/package=PMCMR (accessed September 19, 2019).
dc.identifier.citedreference	Oksanen J, Guillaume Blanchet F, Friendly M, Kindt R, Legendre P, McGlinn D, et al. vegan: community ecology package. R package version 2.5‐2, 2018. https://CRAN.R-project.org/package=vegan (accessed September 19, 2019).
dc.identifier.citedreference	Rosa PLS, Deych E, Carter S, Shands B, Yang D, Shannon WD. HMP: hypothesis testing and power calculations for comparing metagenomic samples from HMP. R package version 2.0, 2019. https://CRAN.R-project.org/package=HMP (accessed September 19, 2019).
dc.identifier.citedreference	Liaw A, Wiener M. Classification and regression by randomForest. R News 2002; 2 ( 3 ): 18 – 22.
dc.identifier.citedreference	Shade A, Handelsman J. Beyond the Venn diagram: the hunt for a core microbiome. Environ Microbiol 2012; 14 ( 1 ): 4 – 12.
dc.identifier.citedreference	Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 2007; 39 ( 2 ): 175 – 91.
dc.identifier.citedreference	DeBruyn JM, Hauther KA. Postmortem succession of gut microbial communities in deceased human subjects. PeerJ 2017; 5: e3437.
dc.identifier.citedreference	Budowle B, Schutzer SE, Einseln A, Kelley LC, Walsh AC, Smith JAL, et al. Building microbial forensics as a response to bioterrorism. Science 2003; 301 ( 5641 ): 1852 – 3.
dc.identifier.citedreference	Pechal JL, Schmidt CJ, Jordan HR, Benbow ME. A large‐scale survey of the postmortem human microbiome, and its potential to provide insight into the living health condition. Sci Rep 2018; 8 ( 1 ): 5724.
dc.identifier.citedreference	Metcalf JL, Xu ZZ, Weiss S, Lax S, Treuren WV, Hyde ER, et al. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science 2016; 351 ( 6269 ): 158 – 62.
dc.identifier.citedreference	Schmedes SE, Sajantila A, Budowle B. Expansion of microbial forensics. J Clin Microbiol 2016; 54 ( 8 ): 1964 – 74.
dc.identifier.citedreference	Dobay A, Haas C, Fucile G, Downey N, Morrison HG, Kratzer A, et al. Microbiome‐based body fluid identification of samples exposed to indoor conditions. Forensic Sci Int Genet 2019; 40: 105 – 13.
dc.identifier.citedreference	Schmedes SE, Woerner AE, Budowle B. Forensic human identification using skin microbiomes. Appl Environ Microbiol 2017; 83 ( 22 ): 1672 – 17.
dc.identifier.citedreference	Benbow ME, Pechal JL, Lang JM, Wallace JR. The potential of high‐throughput metagenomic sequencing of aquatic bacterial communities to estimate the postmortem submersion interval. J Forensic Sci 2015; 60 ( 6 ): 1500 – 10.
dc.identifier.citedreference	Pechal JL, Crippen TL, Benbow ME, Tarone AM, Dowd S, Tomberlin JK. The potential use of bacterial community succession in forensics as described by high throughput metagenomic sequencing. Int J Legal Med 2014; 128 ( 1 ): 193 – 205.
dc.identifier.citedreference	Johnson HR, Trinidad DD, Guzman S, Khan Z, Parziale JV, DeBruyn JM, et al. A machine learning approach for using the postmortem skin microbiome to estimate the postmortem interval. PLoS ONE 2016; 11 ( 12 ): e0167370.
dc.identifier.citedreference	Metcalf JL, Parfrey LW, Gonzalez A, Lauber CL, Knights D, Ackermann,, et al. A microbial clock provides an accurate estimate of the postmortem interval in a mouse model system. Elife 2013; 2: 1104.
dc.identifier.citedreference	Carter DO, Tomberlin JK, Benbow ME, Metcalf JL, editors. Forensic microbiology. Hoboken, NJ: John Wiley & Sons Ltd, 2017.
dc.identifier.citedreference	Metcalf JL, Xu ZZ, Bouslimani A, Dorrestein P, Carter DO, Knight R. Microbiome tools for forensic science. Trends Biotech 2017; 35 ( 9 ): 814 – 23.
dc.identifier.citedreference	Leipzig J. A review of bioinformatic pipeline frameworks. Brief Bioinform 2017; 18 ( 3 ): 530 – 6.
dc.identifier.citedreference	Sivarajah U, Kamal MM, Irani Z, Weerakkody V. Critical analysis of big data challenges and analytical methods. J Bus Res 2017; 70: 263 – 86.
dc.identifier.citedreference	Golob JL, Margolis A, Hoffman NG, Fredricks DN. Evaluating the accuracy of amplicon‐based microbiome computational pipelines on simulated human gut microbial communities. BMC Bioinformatics 2017; 18 ( 1 ): 283.
dc.identifier.citedreference	Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al‐Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019; 37 ( 8 ): 852 – 7.
dc.identifier.citedreference	Bokulich NA, Rideout JR, Mercurio WG, Shiffer A, Wolfe B, Maurice CF, et al. mockrobiota: a public resource for microbiome bioinformatics benchmarking. mSystems 2016; 1 ( 5 ): e00062‐16.
dc.identifier.citedreference	Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open‐source, platform‐independent, community‐supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009; 75 ( 23 ): 7537 – 41.
dc.identifier.citedreference	Keegan KP, Glass EM, Meyer F. MG‐RAST, a metagenomics service for analysis of microbial community structure and function. Microbial Env Genet 2016; 1399: 207 – 33.
dc.identifier.citedreference	Plummer E, Twin J, Bulach DM, Garland SM, Tabrizi SN. A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S RRNA gene sequencing data. J Proteomics Bioinform 2015; 8 ( 12 ): 283 – 91.
dc.identifier.citedreference	Siegwald L, Touzet H, Lemoine Y, Hot D, Audebert C, Caboche S. Assessment of common and emerging bioinformatics pipelines for targeted metagenomics. PLoS ONE 2017; 12 ( 1 ): e0169563.
dc.identifier.citedreference	Mysara M, Njima M, Leys N, Raes J, Monsieurs P. From reads to operational taxonomic units: an ensemble processing pipeline for MiSeq amplicon sequencing data. Gigascience 2017; 6 ( 2 ): 1 – 10.
dc.identifier.citedreference	Nilakanta H, Drews KL, Firrell S, Foulkes MA, Jablonski KA. A review of software for analyzing molecular sequences. BMC Res Notes 2014; 7: 830.
dc.identifier.citedreference	McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol 2014; 10 ( 4 ): e1003531.
dc.identifier.citedreference	Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 2017; 27 ( 5 ): 27.
dc.identifier.citedreference	D’Argenio V, Casaburi G, Precone V, Salvatore F. Comparative metagenomic analysis of human gut microbiome composition using two different bioinformatic pipelines. Biomed Res Int 2014; 2014: 325340.
dc.identifier.citedreference	Hyde ER, Haarmann DP, Lynne AM, Bucheli SR, Petrosino JF. The living dead: bacterial community structure of a cadaver at the onset and end of the bloat stage of decomposition. PLoS ONE 2013; 8 ( 10 ): e77733.
dc.identifier.citedreference	Ioannidis JP. Why most published research findings are false. PLoS Medicine 2005; 2 ( 8 ): 696 – 701.
dc.identifier.citedreference	Clarke TH, Gomez A, Singh H, Nelson KE, Brinkac LM. Integrating the microbiome as a resource in the forensics toolkit. Forensic Sci Int Genet 2017; 30: 141 – 7.
dc.identifier.citedreference	Zhang Y, Pechal JL, Schmidt CJ, Jordan HR, Wang WW, Benbow ME, et al. Machine learning performance in a microbial molecular autopsy context: a cross‐sectional postmortem human population study. PLoS ONE 2019; 14 ( 4 ): e0213829.
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: jfo14213_am.pdf
Size:: 401.5KB
Format:: PDF

View/Open

Name:: jfo14213.pdf
Size:: 2.912MB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.