Show simple item record

Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results

dc.contributor.authorChen, Zhongsheng
dc.contributor.authorBoehnke, Michael
dc.contributor.authorFuchsberger, Christian
dc.date.accessioned2020-02-05T15:06:26Z
dc.date.availableWITHHELD_12_MONTHS
dc.date.available2020-02-05T15:06:26Z
dc.date.issued2020-01
dc.identifier.citationChen, Zhongsheng; Boehnke, Michael; Fuchsberger, Christian (2020). "Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results." Genetic Epidemiology 44(1): 41-51.
dc.identifier.issn0741-0395
dc.identifier.issn1098-2272
dc.identifier.urihttps://hdl.handle.net/2027.42/153654
dc.description.abstractIndividual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single‐study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep‐coverage (~82×) exome and low‐coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts.For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta‐analysis has similar power to joint analysis in deep‐coverage sequence data but can be less powerful in low‐coverage sequence data. Given similar data processing and quality control steps, we recommend single‐study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep‐coverage data.
dc.publisherWiley Periodicals, Inc.
dc.subject.othermeta‐analysis
dc.subject.otherjoint analysis
dc.subject.otherrare variants
dc.subject.otherSequencing studies
dc.titleCombining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelBiological Chemistry
dc.subject.hlbsecondlevelGenetics
dc.subject.hlbsecondlevelMolecular, Cellular and Developmental Biology
dc.subject.hlbtoplevelHealth Sciences
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/1/gepi22261_am.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/2/gepi22261.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/3/gepi22261-sup-0002-final_revised_supp_figures_7_19_2019.pdf
dc.identifier.doi10.1002/gepi.22261
dc.identifier.sourceGenetic Epidemiology
dc.identifier.citedreferenceMa, C., Blackwell, T., Boehnke, M., & Scott, L. J., GoT2D Investigators. ( 2013 ). Recommended joint and meta‐analysis strategies for case‐control association testing of single low‐count variants. Genetic Epidemiology, 37 ( 6 ), 539 – 550. https://doi.org/10.1002/gepi.21742
dc.identifier.citedreferenceMcKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., … DePristo, M. A. ( 2010 ). The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Research, 20, 1297 – 1303. https://doi.org/10.1101/gr.107524.110
dc.identifier.citedreferenceOkada, Y., Momozawa, Y., Sakaue, S., Kanai, M., Ishigaki, K., Akiyama, M., … Kamatani, Y. ( 2018 ). Deep whole‐genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nature Communications, 9 ( 1 ), 1631. https://doi.org/10.1038/s41467‐018‐03274‐0
dc.identifier.citedreferencePaltoo, D. N., Rodriguez, L. L., Feolo, M., Gillanders, E., Ramos, E. M., Rutter, J. L., & Caulder, M. ( 2014 ). Data use under the NIH GWAS data sharing policy and future directions. Nature Genetics, 46 ( 9 ), 934 – 938. https://doi.org/10.1038/ng.3062
dc.identifier.citedreferenceLee, S., Abecasis, G. R., Boehnke, M., & Lin, X. ( 2014 ). Rare‐variant association analysis: Study designs and statistical tests. The American Journal of Human Genetics, 95 ( 1 ), 5 – 23. https://doi.org/10.1016/j.ajhg.2014.06.009
dc.identifier.citedreferenceLek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., … MacArthur, D. G. ( 2016 ). Analysis of protein‐coding genetic variation in 60,706 humans. Nature, 536 ( 7616 ), 285 – 291. https://doi.org/10.1038/nature19057
dc.identifier.citedreferenceLi, H., & Durbin, R. ( 2009 ). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25 ( 14 ), 1754 – 1760. https://doi.org/10.1093/bioinformatics/btp324
dc.identifier.citedreferenceLi, Y., Sidore, C., Kang, H. M., Boehnke, M., & Abecasis, G. R. ( 2011 ). Low‐coverage sequencing: Implications for design of complex trait association studies. Genome Research, 21, 940 – 951. https://doi.org/10.1101/gr.117259.110
dc.identifier.citedreferenceLin, D. Y., & Zeng, D. ( 2010 ). Meta‐analysis of genome‐wide association studies: No efficiency gain in using individual participant data. Genetic Epidemiology, 34 ( 1 ), 60 – 66. https://doi.org/10.1002/gepi.20435
dc.identifier.citedreferenceLuo, Y., de Lange, K. M., Jostins, L., Moutsianas, L., Randall, J., Kennedy, N. A., … Serra, E. G. ( 2017 ). Exploring the genetic architecture of inflammatory bowel disease by whole‐genome sequencing identifies association at ADCY7. Nature genetics, 49 ( 2 ), 186. https://doi.org/10.1038/ng.3761
dc.identifier.citedreferenceMägi, R., Horikoshi, M., Sofer, T., Mahajan, A., Kitajima, H., Franceschini, N., … Morris, A. P. ( 2017 ). Trans‐ethnic meta‐regression of genome‐wide association studies accounting for ancestry increases power for discovery and improves fine‐mapping resolution. Human Molecular Genetics, 26 ( 18 ), 3639 – 3650. https://doi.org/10.1093/hmg/ddx280
dc.identifier.citedreferenceZuk, O., Schaffner, S. F., Samocha, K., Do, R., Hechter, E., Kathiresan, S., … Lander, E. S. ( 2014 ). Searching for missing heritability: Designing rare variant association studies. Proceedings of the National Academy of Sciences, 111 ( 4 ), E455 – E464. https://doi.org/10.1073/pnas.1322563111
dc.identifier.citedreferenceXu, C., Wu, K., Zhang, J. G., Shen, H., & Deng, H. W. ( 2017 ). Low‐, high‐coverage, and two‐stage DNA sequencing in the design of the genetic association study. Genetic Epidemiology, 41 ( 3 ), 187 – 197. https://doi.org/10.1002/gepi.22015
dc.identifier.citedreferenceWiller, C. J., Li, Y., & Abecasis, G. R. ( 2010 ). METAL: Fast and efficient meta‐analysis of genomewide association scans. Bioinformatics, 26 ( 17 ), 2190 – 2191. https://doi.org/10.1093/bioinformatics/btq340
dc.identifier.citedreferenceVan der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., Del Angel, G., Levy‐Moonshine, A., … DePristo, M. A. ( 2013 ). From FastQ data to high‐confidence variant calls: The genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics, 43 ( 1 ), 11.10.1 – 11.10.33. https://doi.org/10.1002/0471250953.bi1110s43
dc.identifier.citedreferenceTang, Z. Z., & Lin, D. Y. ( 2015 ). Meta‐analysis for discovering rare‐variant associations: Statistical methods and software programs. The American Journal of Human Genetics, 97 ( 1 ), 35 – 53. https://doi.org/10.1016/j.ajhg.2015.05.001
dc.identifier.citedreference1000 Genomes Project Consortium ( 2015 ). A global reference for human genetic variation. Nature, 526 ( 7571 ), 68 – 74. https://doi.org/10.1038/nature15393
dc.identifier.citedreferenceAuer, P. L., Reiner, A. P., Wang, G., Kang, H. M., Abecasis, G. R., Altshuler, D., … Leal, S. M. ( 2016 ). Guidelines for large‐scale sequence‐based complex trait association studies: Lessons learned from the NHLBI exome sequencing project. The American Journal of Human Genetics, 99 ( 4 ), 791 – 801. https://doi.org/10.1016/j.ajhg.2016.08.012
dc.identifier.citedreferenceBrowning, B. L., & Yu, Z. ( 2009 ). Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false‐positive associations for genome‐wide association studies. The American Journal of Human Genetics, 85 ( 6 ), 847 – 861. https://doi.org/10.1016/j.ajhg.2009.11.004
dc.identifier.citedreferenceDePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., & McKenna, A. ( 2011 ). A framework for variation discovery and genotyping using next‐generation DNA sequencing data. Nature Genetics, 43 ( 5 ), 491. https://doi.org/10.1038/ng.806
dc.identifier.citedreferenceFlannick, J., Fuchsberger, C., Mahajan, A., Teslovich, T. M., Agarwala, V., Gaulton, K. J., … Koistinen, H. A. ( 2017 ). Sequence data and association statistics from 12,940 type 2 diabetes cases and controls. Scientific Data, 4, 170179. https://doi.org/10.1038/sdata.2017.179
dc.identifier.citedreferenceFuchsberger, C., Flannick, J., Teslovich, T. M., Mahajan, A., Agarwala, V., Gaulton, K. J., … Koistinen, H. A. ( 2016 ). The genetic architecture of type 2 diabetes. Nature, 536 ( 7614 ), 41 – 47. https://doi.org/10.1038/nature18642
dc.identifier.citedreferenceHindorff, LA, MacArthur, J, Wise, A, Junkins, HA, Hall, PN, Klemm, AK, & Manolio, TA. ( 2012 ). A catalog of published genome‐wide association studies. NHGRI. Available at: www.ebi.ac.uk/gwas/diagram
dc.identifier.citedreferenceJiang, W., Chen, S. Y., Wang, H., Li, D. Z., & Wiens, J. J. ( 2014 ). Should genes with missing data be excluded from phylogenetic analyses? Molecular Phylogenetics and Evolution, 80, 308 – 318. https://doi.org/10.1016/j.ympev.2014.08.006
dc.identifier.citedreferenceJun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. ( 2015 ). An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Research, 25, 918 – 925. https://doi.org/10.1101/gr.176552.114
dc.identifier.citedreferenceLee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., … Lin, X. ( 2012 ). Optimal unified approach for rare‐variant association testing with application to small‐sample case‐control whole‐exome sequencing studies. The American Journal of Human Genetics, 91 ( 2 ), 224 – 237. https://doi.org/10.1016/j.ajhg.2012.06.007
dc.identifier.citedreferenceLee, S., Teslovich, T. M., Boehnke, M., & Lin, X. ( 2013 ). General framework for meta‐analysis of rare variants in sequencing association studies. The American Journal of Human Genetics, 93 ( 1 ), 42 – 53. https://doi.org/10.1016/j.ajhg.2013.05.010
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.