Estimation of DNA contamination and its sources in genotyped samples
dc.contributor.author | Zajac, Gregory J. M. | |
dc.contributor.author | Fritsche, Lars G. | |
dc.contributor.author | Weinstock, Joshua S. | |
dc.contributor.author | Dagenais, Susan L. | |
dc.contributor.author | Lyons, Robert H. | |
dc.contributor.author | Brummett, Chad M. | |
dc.contributor.author | Abecasis, Gonçalo R. | |
dc.date.accessioned | 2019-11-12T16:23:41Z | |
dc.date.available | WITHHELD_14_MONTHS | |
dc.date.available | 2019-11-12T16:23:41Z | |
dc.date.issued | 2019-12 | |
dc.identifier.citation | Zajac, Gregory J. M.; Fritsche, Lars G.; Weinstock, Joshua S.; Dagenais, Susan L.; Lyons, Robert H.; Brummett, Chad M.; Abecasis, Gonçalo R. (2019). "Estimation of DNA contamination and its sources in genotyped samples." Genetic Epidemiology 43(8): 980-995. | |
dc.identifier.issn | 0741-0395 | |
dc.identifier.issn | 1098-2272 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/152029 | |
dc.description.abstract | Array genotyping is a cost‐effective and widely used tool that enables assessment of up to millions of genetic markers in hundreds of thousands of individuals. Genotyping array data are typically highly accurate but sensitive to mixing of DNA samples from multiple individuals before or during genotyping. Contaminated samples can lead to genotyping errors and consequently cause false positive signals or reduce power of association analyses. Here, we propose a new method to identify contaminated samples and the sources of contamination within a genotyping batch. Through analysis of array intensity and genotype data from intentionally mixed samples and 22,366 samples of the Michigan Genomics Initiative, an ongoing biobank‐based study, we show that our method can reliably estimate contamination. We also show that identifying sources of contamination can implicate problematic sample processing steps and guide process improvements. Compared to existing methods, our approach can estimate the proportion of contaminating DNA more accurately, eliminate the need for external databases of allele frequencies, and provide contamination estimates that are more robust to the ancestral origin of the contaminating sample. | |
dc.publisher | Wiley Periodicals, Inc. | |
dc.subject.other | quality control | |
dc.subject.other | genotyping array | |
dc.subject.other | genome‐wide association study | |
dc.subject.other | DNA contamination | |
dc.subject.other | biobank | |
dc.title | Estimation of DNA contamination and its sources in genotyped samples | |
dc.type | Article | |
dc.rights.robots | IndexNoFollow | |
dc.subject.hlbsecondlevel | Molecular, Cellular and Developmental Biology | |
dc.subject.hlbsecondlevel | Biological Chemistry | |
dc.subject.hlbsecondlevel | Genetics | |
dc.subject.hlbtoplevel | Health Sciences | |
dc.subject.hlbtoplevel | Science | |
dc.description.peerreviewed | Peer Reviewed | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/152029/1/gepi22257_am.pdf | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/152029/2/gepi22257.pdf | |
dc.identifier.doi | 10.1002/gepi.22257 | |
dc.identifier.source | Genetic Epidemiology | |
dc.identifier.citedreference | Illumina ( 2017 ). Infinium® CoreExome‐24 v1.2 BeadChip, San Diego, CA. | |
dc.identifier.citedreference | International HapMap, C., Altshuler, D. M., Gibbs, R. A., Peltonen, L., Altshuler, D. M., Gibbs, R. A., … McEwen, J. E. ( 2010 ). Integrating common and rare genetic variation in diverse human populations. Nature, 467 ( 7311 ), 52 – 58. https://doi.org/10.1038/nature09298 | |
dc.identifier.citedreference | Voight, B. F., Kang, H. M., Ding, J., Palmer, C. D., Sidore, C., Chines, P. S., … Boehnke, M. ( 2012 ). The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLOS Genetics, 8 ( 8 ), e1002793. https://doi.org/10.1371/journal.pgen.1002793 | |
dc.identifier.citedreference | Goes, F. S., McGrath, J., Avramopoulos, D., Wolyniec, P., Pirooznia, M., Ruczinski, I. … Pulver, A. E. ( 2015 ). Genome‐wide association study of schizophrenia in Ashkenazi Jews. American Journal of Medical Genetics, 168 ( 8 ), 649 – 659. https://doi.org/10.1002/ajmg.b.32349. | |
dc.identifier.citedreference | Heiss, J. A., & Just, A. C. ( 2018 ). Identifying mislabeled and contaminated DNA methylation microarray data: An extended quality control toolset with examples from GEO. Clinical Epigenetics, 10, 73. https://doi.org/10.1186/s13148‐018‐0504‐1 | |
dc.identifier.citedreference | Hoffmann, T. J., Ehret, G. B., Nandakumar, P., Ranatunga, D., Schaefer, C., Kwok, P. Y., … Risch, N. ( 2017 ). Genome‐wide association analyses using electronic health records identify new loci influencing blood pressure variation. Nature Genetics, 49 ( 1 ), 54 – 64. https://doi.org/10.1038/ng.3715 | |
dc.identifier.citedreference | Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J.,… Craig, D. W. ( 2008 ). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high‐density SNP genotyping microarrays. PLOS Genetics, 4 ( 8 ), e1000167. https://doi.org/10.1371/journal.pgen.1000167 | |
dc.identifier.citedreference | Illumina. ( 2010 ). Interpreting Infinium® Assay Data for Whole‐Genome Structural Variation. San Diego, CA. | |
dc.identifier.citedreference | Illumina. ( 2013 ). Infinium® HTS Assay Protocol Guide. San Diego, CA. | |
dc.identifier.citedreference | Illumina. ( 2016 ). GenomeStudio® Genotyping Module v2.0 Software Guide. San Diego, CA. | |
dc.identifier.citedreference | Sobel, E., Papp, J. C., & Lange, K. ( 2002 ). Detection and integration of genotyping errors in statistical genetics. The American Journal of Human Genetics, 70 ( 2 ), 496 – 508. https://doi.org/10.1086/338920 | |
dc.identifier.citedreference | Schmieder, R., & Edwards, R. ( 2011 ). Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLOS One, 6 ( 3 ), e17288. https://doi.org/10.1371/journal.pone.0017288 | |
dc.identifier.citedreference | Peiffer, D. A., Le, J. M., Steemers, F. J., Chang, W., Jenniges, T., Garcia, F., … Gunderson, K. L. ( 2006 ). High‐resolution genomic profiling of chromosomal aberrations using Infinium whole‐genome genotyping. Genome Research, 16 ( 9 ), 1136 – 1148. https://doi.org/10.1101/gr.5402306 | |
dc.identifier.citedreference | Marouli, E., Graff, M., Medina‐Gomez, C., Lo, K. S., Wood, A. R., Kjaer, T. R., … Jhun, M. A. ( 2017 ). Rare and low‐frequency coding variants alter human adult height. Nature, 542 ( 7640 ), 186 – 190. https://doi.org/10.1038/nature21039 | |
dc.identifier.citedreference | Mahajan, A., Go, M. J., Zhang, W., Below, J. E., Gaulton, K. J., Ferreira, T., … Kravic, J. ( 2014 ). Genome‐wide trans‐ancestry meta‐analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nature Genetics, 46, 234 – 244. https://doi.org/10.1038/ng.2897. https://www.nature.com/articles/ng.2897#supplementary‐information. | |
dc.identifier.citedreference | Locke, A. E., Kahali, B., Berndt, S. I., Justice, A. E., Pers, T. H., Day, F. R., … Lindström, J. ( 2015 ). Genetic studies of body mass index yield new insights for obesity biology. Nature, 518 ( 7538 ), 197 – 206. https://doi.org/10.1038/nature14177 | |
dc.identifier.citedreference | Liu, J. Z., van Sommeren, S., Huang, H., Ng, S. C., Alberts, R., Takahashi, A., … Weersma, R. K. ( 2015 ). Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nature Genetics, 47 ( 9 ), 979 – 986. https://doi.org/10.1038/ng.3359 | |
dc.identifier.citedreference | Li, Y., Willer, C. J., Ding, J., Scheet, P., & Abecasis, G. R. ( 2010 ). MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic epidemiology, 34 ( 8 ), 816 – 834. https://doi.org/10.1002/gepi.20533. [doi] | |
dc.identifier.citedreference | Li, G. ( 2016 ). A new model calling procedure for Illumina BeadArray data. BMC Genetics, 17 ( 1 ), 90. https://doi.org/10.1186/s12863‐016‐0398‐x | |
dc.identifier.citedreference | Kim, W., Gordon, D., Sebat, J., Ye, K. Q., & Finch, S. J. ( 2008 ). Computing power and sample size for case‐control association studies with copy number polymorphism: Application of mixture‐based likelihood ratio test. PLOS One, 3 ( 10 ), e3475. https://doi.org/10.1371/journal.pone.0003475 | |
dc.identifier.citedreference | Jun, G., Flickinger, M., Hetrick, K. N., Romm, J. M., Doheny, K. F., Abecasis, G. R., … Kang, H. M. ( 2012 ). Detecting and estimating contamination of human DNA samples in sequencing and array‐based genotype data. The American Journal of Human Genetics, 91 ( 5 ), 839 – 848. https://doi.org/10.1016/j.ajhg.2012.09.004 | |
dc.identifier.citedreference | Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. ( 2015 ). Second‐generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience, 4, 8. https://doi.org/10.1186/s13742‐015‐0047‐8. eCollection 2015. | |
dc.identifier.citedreference | Cibulskis, K., McKenna, A., Fennell, T., Banks, E., DePristo, M., & Getz, G. ( 2011 ). ContEst: Estimating cross‐contamination of human samples in next‐generation sequencing data. Bioinformatics (Oxford, England), 27 ( 18 ), 2601 – 2602. https://doi.org/10.1093/bioinformatics/btr446. | |
dc.identifier.citedreference | Diskin, S. J., Li, M., Hou, C., Yang, S., Glessner, J., Hakonarson, H. … Wang, K. ( 2008 ). Adjustment of genomic waves in signal intensities from whole‐genome SNP genotyping platforms. Nucleic Acids Research, 36 ( 19 ): e126. https://doi.org/10.1093/nar/gkn556. | |
dc.identifier.citedreference | Flickinger, M., Jun, G., Abecasis, G. R., Boehnke, M., & Kang, H. M. ( 2015 ). Correcting for sample contamination in genotype calling of DNA sequence data. The American Journal of Human Genetics, 97 ( 2 ), 284 – 290. https://doi.org/10.1016/j.ajhg.2015.07.002. | |
dc.identifier.citedreference | Fritsche, L. G., Gruber, S. B., Wu, Z., Schmidt, E. M., Zawistowski, M., Moser, S. E. … Mukherjee, B. ( 2018 ). Association of polygenic risk scores for multiple cancers in a phenome‐wide study: Results from The Michigan Genomics Initiative. The American Journal of Human Genetics, 102 ( 6 ), 1048 – 1061. https://doi.org/10.1016/j.ajhg.2018.04.001 | |
dc.identifier.citedreference | Genomes Project, C., Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., … Abecasis, G. R. ( 2015 ). A global reference for human genetic variation. Nature, 526 ( 7571 ), 68 – 74. https://doi.org/10.1038/nature15393 | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.