Show simple item record

Estimation of DNA contamination and its sources in genotyped samples

dc.contributor.authorZajac, Gregory J. M.
dc.contributor.authorFritsche, Lars G.
dc.contributor.authorWeinstock, Joshua S.
dc.contributor.authorDagenais, Susan L.
dc.contributor.authorLyons, Robert H.
dc.contributor.authorBrummett, Chad M.
dc.contributor.authorAbecasis, Gonçalo R.
dc.date.accessioned2019-11-12T16:23:41Z
dc.date.availableWITHHELD_14_MONTHS
dc.date.available2019-11-12T16:23:41Z
dc.date.issued2019-12
dc.identifier.citationZajac, Gregory J. M.; Fritsche, Lars G.; Weinstock, Joshua S.; Dagenais, Susan L.; Lyons, Robert H.; Brummett, Chad M.; Abecasis, Gonçalo R. (2019). "Estimation of DNA contamination and its sources in genotyped samples." Genetic Epidemiology 43(8): 980-995.
dc.identifier.issn0741-0395
dc.identifier.issn1098-2272
dc.identifier.urihttps://hdl.handle.net/2027.42/152029
dc.description.abstractArray genotyping is a cost‐effective and widely used tool that enables assessment of up to millions of genetic markers in hundreds of thousands of individuals. Genotyping array data are typically highly accurate but sensitive to mixing of DNA samples from multiple individuals before or during genotyping. Contaminated samples can lead to genotyping errors and consequently cause false positive signals or reduce power of association analyses. Here, we propose a new method to identify contaminated samples and the sources of contamination within a genotyping batch. Through analysis of array intensity and genotype data from intentionally mixed samples and 22,366 samples of the Michigan Genomics Initiative, an ongoing biobank‐based study, we show that our method can reliably estimate contamination. We also show that identifying sources of contamination can implicate problematic sample processing steps and guide process improvements. Compared to existing methods, our approach can estimate the proportion of contaminating DNA more accurately, eliminate the need for external databases of allele frequencies, and provide contamination estimates that are more robust to the ancestral origin of the contaminating sample.
dc.publisherWiley Periodicals, Inc.
dc.subject.otherquality control
dc.subject.othergenotyping array
dc.subject.othergenome‐wide association study
dc.subject.otherDNA contamination
dc.subject.otherbiobank
dc.titleEstimation of DNA contamination and its sources in genotyped samples
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelMolecular, Cellular and Developmental Biology
dc.subject.hlbsecondlevelBiological Chemistry
dc.subject.hlbsecondlevelGenetics
dc.subject.hlbtoplevelHealth Sciences
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/152029/1/gepi22257_am.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/152029/2/gepi22257.pdf
dc.identifier.doi10.1002/gepi.22257
dc.identifier.sourceGenetic Epidemiology
dc.identifier.citedreferenceIllumina ( 2017 ). Infinium® CoreExome‐24 v1.2 BeadChip, San Diego, CA.
dc.identifier.citedreferenceInternational HapMap, C., Altshuler, D. M., Gibbs, R. A., Peltonen, L., Altshuler, D. M., Gibbs, R. A., … McEwen, J. E. ( 2010 ). Integrating common and rare genetic variation in diverse human populations. Nature, 467 ( 7311 ), 52 – 58. https://doi.org/10.1038/nature09298
dc.identifier.citedreferenceVoight, B. F., Kang, H. M., Ding, J., Palmer, C. D., Sidore, C., Chines, P. S., … Boehnke, M. ( 2012 ). The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLOS Genetics, 8 ( 8 ), e1002793. https://doi.org/10.1371/journal.pgen.1002793
dc.identifier.citedreferenceGoes, F. S., McGrath, J., Avramopoulos, D., Wolyniec, P., Pirooznia, M., Ruczinski, I. … Pulver, A. E. ( 2015 ). Genome‐wide association study of schizophrenia in Ashkenazi Jews. American Journal of Medical Genetics, 168 ( 8 ), 649 – 659. https://doi.org/10.1002/ajmg.b.32349.
dc.identifier.citedreferenceHeiss, J. A., & Just, A. C. ( 2018 ). Identifying mislabeled and contaminated DNA methylation microarray data: An extended quality control toolset with examples from GEO. Clinical Epigenetics, 10, 73. https://doi.org/10.1186/s13148‐018‐0504‐1
dc.identifier.citedreferenceHoffmann, T. J., Ehret, G. B., Nandakumar, P., Ranatunga, D., Schaefer, C., Kwok, P. Y., … Risch, N. ( 2017 ). Genome‐wide association analyses using electronic health records identify new loci influencing blood pressure variation. Nature Genetics, 49 ( 1 ), 54 – 64. https://doi.org/10.1038/ng.3715
dc.identifier.citedreferenceHomer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J.,… Craig, D. W. ( 2008 ). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high‐density SNP genotyping microarrays. PLOS Genetics, 4 ( 8 ), e1000167. https://doi.org/10.1371/journal.pgen.1000167
dc.identifier.citedreferenceIllumina. ( 2010 ). Interpreting Infinium® Assay Data for Whole‐Genome Structural Variation. San Diego, CA.
dc.identifier.citedreferenceIllumina. ( 2013 ). Infinium® HTS Assay Protocol Guide. San Diego, CA.
dc.identifier.citedreferenceIllumina. ( 2016 ). GenomeStudio® Genotyping Module v2.0 Software Guide. San Diego, CA.
dc.identifier.citedreferenceSobel, E., Papp, J. C., & Lange, K. ( 2002 ). Detection and integration of genotyping errors in statistical genetics. The American Journal of Human Genetics, 70 ( 2 ), 496 – 508. https://doi.org/10.1086/338920
dc.identifier.citedreferenceSchmieder, R., & Edwards, R. ( 2011 ). Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLOS One, 6 ( 3 ), e17288. https://doi.org/10.1371/journal.pone.0017288
dc.identifier.citedreferencePeiffer, D. A., Le, J. M., Steemers, F. J., Chang, W., Jenniges, T., Garcia, F., … Gunderson, K. L. ( 2006 ). High‐resolution genomic profiling of chromosomal aberrations using Infinium whole‐genome genotyping. Genome Research, 16 ( 9 ), 1136 – 1148. https://doi.org/10.1101/gr.5402306
dc.identifier.citedreferenceMarouli, E., Graff, M., Medina‐Gomez, C., Lo, K. S., Wood, A. R., Kjaer, T. R., … Jhun, M. A. ( 2017 ). Rare and low‐frequency coding variants alter human adult height. Nature, 542 ( 7640 ), 186 – 190. https://doi.org/10.1038/nature21039
dc.identifier.citedreferenceMahajan, A., Go, M. J., Zhang, W., Below, J. E., Gaulton, K. J., Ferreira, T., … Kravic, J. ( 2014 ). Genome‐wide trans‐ancestry meta‐analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nature Genetics, 46, 234 – 244. https://doi.org/10.1038/ng.2897. https://www.nature.com/articles/ng.2897#supplementary‐information.
dc.identifier.citedreferenceLocke, A. E., Kahali, B., Berndt, S. I., Justice, A. E., Pers, T. H., Day, F. R., … Lindström, J. ( 2015 ). Genetic studies of body mass index yield new insights for obesity biology. Nature, 518 ( 7538 ), 197 – 206. https://doi.org/10.1038/nature14177
dc.identifier.citedreferenceLiu, J. Z., van Sommeren, S., Huang, H., Ng, S. C., Alberts, R., Takahashi, A., … Weersma, R. K. ( 2015 ). Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nature Genetics, 47 ( 9 ), 979 – 986. https://doi.org/10.1038/ng.3359
dc.identifier.citedreferenceLi, Y., Willer, C. J., Ding, J., Scheet, P., & Abecasis, G. R. ( 2010 ). MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic epidemiology, 34 ( 8 ), 816 – 834. https://doi.org/10.1002/gepi.20533. [doi]
dc.identifier.citedreferenceLi, G. ( 2016 ). A new model calling procedure for Illumina BeadArray data. BMC Genetics, 17 ( 1 ), 90. https://doi.org/10.1186/s12863‐016‐0398‐x
dc.identifier.citedreferenceKim, W., Gordon, D., Sebat, J., Ye, K. Q., & Finch, S. J. ( 2008 ). Computing power and sample size for case‐control association studies with copy number polymorphism: Application of mixture‐based likelihood ratio test. PLOS One, 3 ( 10 ), e3475. https://doi.org/10.1371/journal.pone.0003475
dc.identifier.citedreferenceJun, G., Flickinger, M., Hetrick, K. N., Romm, J. M., Doheny, K. F., Abecasis, G. R., … Kang, H. M. ( 2012 ). Detecting and estimating contamination of human DNA samples in sequencing and array‐based genotype data. The American Journal of Human Genetics, 91 ( 5 ), 839 – 848. https://doi.org/10.1016/j.ajhg.2012.09.004
dc.identifier.citedreferenceChang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. ( 2015 ). Second‐generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience, 4, 8. https://doi.org/10.1186/s13742‐015‐0047‐8. eCollection 2015.
dc.identifier.citedreferenceCibulskis, K., McKenna, A., Fennell, T., Banks, E., DePristo, M., & Getz, G. ( 2011 ). ContEst: Estimating cross‐contamination of human samples in next‐generation sequencing data. Bioinformatics (Oxford, England), 27 ( 18 ), 2601 – 2602. https://doi.org/10.1093/bioinformatics/btr446.
dc.identifier.citedreferenceDiskin, S. J., Li, M., Hou, C., Yang, S., Glessner, J., Hakonarson, H. … Wang, K. ( 2008 ). Adjustment of genomic waves in signal intensities from whole‐genome SNP genotyping platforms. Nucleic Acids Research, 36 ( 19 ): e126. https://doi.org/10.1093/nar/gkn556.
dc.identifier.citedreferenceFlickinger, M., Jun, G., Abecasis, G. R., Boehnke, M., & Kang, H. M. ( 2015 ). Correcting for sample contamination in genotype calling of DNA sequence data. The American Journal of Human Genetics, 97 ( 2 ), 284 – 290. https://doi.org/10.1016/j.ajhg.2015.07.002.
dc.identifier.citedreferenceFritsche, L. G., Gruber, S. B., Wu, Z., Schmidt, E. M., Zawistowski, M., Moser, S. E. … Mukherjee, B. ( 2018 ). Association of polygenic risk scores for multiple cancers in a phenome‐wide study: Results from The Michigan Genomics Initiative. The American Journal of Human Genetics, 102 ( 6 ), 1048 – 1061. https://doi.org/10.1016/j.ajhg.2018.04.001
dc.identifier.citedreferenceGenomes Project, C., Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., … Abecasis, G. R. ( 2015 ). A global reference for human genetic variation. Nature, 526 ( 7571 ), 68 – 74. https://doi.org/10.1038/nature15393
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.