Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results

Chen, Zhongsheng; Boehnke, Michael; Fuchsberger, Christian

Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results

dc.contributor.author	Chen, Zhongsheng
dc.contributor.author	Boehnke, Michael
dc.contributor.author	Fuchsberger, Christian
dc.date.accessioned	2020-02-05T15:06:26Z
dc.date.available	WITHHELD_12_MONTHS
dc.date.available	2020-02-05T15:06:26Z
dc.date.issued	2020-01
dc.identifier.citation	Chen, Zhongsheng; Boehnke, Michael; Fuchsberger, Christian (2020). "Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results." Genetic Epidemiology 44(1): 41-51.
dc.identifier.issn	0741-0395
dc.identifier.issn	1098-2272
dc.identifier.uri	https://hdl.handle.net/2027.42/153654
dc.description.abstract	Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single‐study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep‐coverage (~82×) exome and low‐coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts.For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta‐analysis has similar power to joint analysis in deep‐coverage sequence data but can be less powerful in low‐coverage sequence data. Given similar data processing and quality control steps, we recommend single‐study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep‐coverage data.
dc.publisher	Wiley Periodicals, Inc.
dc.subject.other	meta‐analysis
dc.subject.other	joint analysis
dc.subject.other	rare variants
dc.subject.other	Sequencing studies
dc.title	Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results
dc.type	Article
dc.rights.robots	IndexNoFollow
dc.subject.hlbsecondlevel	Biological Chemistry
dc.subject.hlbsecondlevel	Genetics
dc.subject.hlbsecondlevel	Molecular, Cellular and Developmental Biology
dc.subject.hlbtoplevel	Health Sciences
dc.subject.hlbtoplevel	Science
dc.description.peerreviewed	Peer Reviewed
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/153654/1/gepi22261_am.pdf
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/153654/2/gepi22261.pdf
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/153654/3/gepi22261-sup-0002-final_revised_supp_figures_7_19_2019.pdf
dc.identifier.doi	10.1002/gepi.22261
dc.identifier.source	Genetic Epidemiology
dc.identifier.citedreference	Ma, C., Blackwell, T., Boehnke, M., & Scott, L. J., GoT2D Investigators. ( 2013 ). Recommended joint and meta‐analysis strategies for case‐control association testing of single low‐count variants. Genetic Epidemiology, 37 ( 6 ), 539 – 550. https://doi.org/10.1002/gepi.21742
dc.identifier.citedreference	McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., … DePristo, M. A. ( 2010 ). The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Research, 20, 1297 – 1303. https://doi.org/10.1101/gr.107524.110
dc.identifier.citedreference	Okada, Y., Momozawa, Y., Sakaue, S., Kanai, M., Ishigaki, K., Akiyama, M., … Kamatani, Y. ( 2018 ). Deep whole‐genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nature Communications, 9 ( 1 ), 1631. https://doi.org/10.1038/s41467‐018‐03274‐0
dc.identifier.citedreference	Paltoo, D. N., Rodriguez, L. L., Feolo, M., Gillanders, E., Ramos, E. M., Rutter, J. L., & Caulder, M. ( 2014 ). Data use under the NIH GWAS data sharing policy and future directions. Nature Genetics, 46 ( 9 ), 934 – 938. https://doi.org/10.1038/ng.3062
dc.identifier.citedreference	Lee, S., Abecasis, G. R., Boehnke, M., & Lin, X. ( 2014 ). Rare‐variant association analysis: Study designs and statistical tests. The American Journal of Human Genetics, 95 ( 1 ), 5 – 23. https://doi.org/10.1016/j.ajhg.2014.06.009
dc.identifier.citedreference	Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., … MacArthur, D. G. ( 2016 ). Analysis of protein‐coding genetic variation in 60,706 humans. Nature, 536 ( 7616 ), 285 – 291. https://doi.org/10.1038/nature19057
dc.identifier.citedreference	Li, H., & Durbin, R. ( 2009 ). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25 ( 14 ), 1754 – 1760. https://doi.org/10.1093/bioinformatics/btp324
dc.identifier.citedreference	Li, Y., Sidore, C., Kang, H. M., Boehnke, M., & Abecasis, G. R. ( 2011 ). Low‐coverage sequencing: Implications for design of complex trait association studies. Genome Research, 21, 940 – 951. https://doi.org/10.1101/gr.117259.110
dc.identifier.citedreference	Lin, D. Y., & Zeng, D. ( 2010 ). Meta‐analysis of genome‐wide association studies: No efficiency gain in using individual participant data. Genetic Epidemiology, 34 ( 1 ), 60 – 66. https://doi.org/10.1002/gepi.20435
dc.identifier.citedreference	Luo, Y., de Lange, K. M., Jostins, L., Moutsianas, L., Randall, J., Kennedy, N. A., … Serra, E. G. ( 2017 ). Exploring the genetic architecture of inflammatory bowel disease by whole‐genome sequencing identifies association at ADCY7. Nature genetics, 49 ( 2 ), 186. https://doi.org/10.1038/ng.3761
dc.identifier.citedreference	Mägi, R., Horikoshi, M., Sofer, T., Mahajan, A., Kitajima, H., Franceschini, N., … Morris, A. P. ( 2017 ). Trans‐ethnic meta‐regression of genome‐wide association studies accounting for ancestry increases power for discovery and improves fine‐mapping resolution. Human Molecular Genetics, 26 ( 18 ), 3639 – 3650. https://doi.org/10.1093/hmg/ddx280
dc.identifier.citedreference	Zuk, O., Schaffner, S. F., Samocha, K., Do, R., Hechter, E., Kathiresan, S., … Lander, E. S. ( 2014 ). Searching for missing heritability: Designing rare variant association studies. Proceedings of the National Academy of Sciences, 111 ( 4 ), E455 – E464. https://doi.org/10.1073/pnas.1322563111
dc.identifier.citedreference	Xu, C., Wu, K., Zhang, J. G., Shen, H., & Deng, H. W. ( 2017 ). Low‐, high‐coverage, and two‐stage DNA sequencing in the design of the genetic association study. Genetic Epidemiology, 41 ( 3 ), 187 – 197. https://doi.org/10.1002/gepi.22015
dc.identifier.citedreference	Willer, C. J., Li, Y., & Abecasis, G. R. ( 2010 ). METAL: Fast and efficient meta‐analysis of genomewide association scans. Bioinformatics, 26 ( 17 ), 2190 – 2191. https://doi.org/10.1093/bioinformatics/btq340
dc.identifier.citedreference	Van der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., Del Angel, G., Levy‐Moonshine, A., … DePristo, M. A. ( 2013 ). From FastQ data to high‐confidence variant calls: The genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics, 43 ( 1 ), 11.10.1 – 11.10.33. https://doi.org/10.1002/0471250953.bi1110s43
dc.identifier.citedreference	Tang, Z. Z., & Lin, D. Y. ( 2015 ). Meta‐analysis for discovering rare‐variant associations: Statistical methods and software programs. The American Journal of Human Genetics, 97 ( 1 ), 35 – 53. https://doi.org/10.1016/j.ajhg.2015.05.001
dc.identifier.citedreference	1000 Genomes Project Consortium ( 2015 ). A global reference for human genetic variation. Nature, 526 ( 7571 ), 68 – 74. https://doi.org/10.1038/nature15393
dc.identifier.citedreference	Auer, P. L., Reiner, A. P., Wang, G., Kang, H. M., Abecasis, G. R., Altshuler, D., … Leal, S. M. ( 2016 ). Guidelines for large‐scale sequence‐based complex trait association studies: Lessons learned from the NHLBI exome sequencing project. The American Journal of Human Genetics, 99 ( 4 ), 791 – 801. https://doi.org/10.1016/j.ajhg.2016.08.012
dc.identifier.citedreference	Browning, B. L., & Yu, Z. ( 2009 ). Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false‐positive associations for genome‐wide association studies. The American Journal of Human Genetics, 85 ( 6 ), 847 – 861. https://doi.org/10.1016/j.ajhg.2009.11.004
dc.identifier.citedreference	DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., & McKenna, A. ( 2011 ). A framework for variation discovery and genotyping using next‐generation DNA sequencing data. Nature Genetics, 43 ( 5 ), 491. https://doi.org/10.1038/ng.806
dc.identifier.citedreference	Flannick, J., Fuchsberger, C., Mahajan, A., Teslovich, T. M., Agarwala, V., Gaulton, K. J., … Koistinen, H. A. ( 2017 ). Sequence data and association statistics from 12,940 type 2 diabetes cases and controls. Scientific Data, 4, 170179. https://doi.org/10.1038/sdata.2017.179
dc.identifier.citedreference	Fuchsberger, C., Flannick, J., Teslovich, T. M., Mahajan, A., Agarwala, V., Gaulton, K. J., … Koistinen, H. A. ( 2016 ). The genetic architecture of type 2 diabetes. Nature, 536 ( 7614 ), 41 – 47. https://doi.org/10.1038/nature18642
dc.identifier.citedreference	Hindorff, LA, MacArthur, J, Wise, A, Junkins, HA, Hall, PN, Klemm, AK, & Manolio, TA. ( 2012 ). A catalog of published genome‐wide association studies. NHGRI. Available at: www.ebi.ac.uk/gwas/diagram
dc.identifier.citedreference	Jiang, W., Chen, S. Y., Wang, H., Li, D. Z., & Wiens, J. J. ( 2014 ). Should genes with missing data be excluded from phylogenetic analyses? Molecular Phylogenetics and Evolution, 80, 308 – 318. https://doi.org/10.1016/j.ympev.2014.08.006
dc.identifier.citedreference	Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. ( 2015 ). An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Research, 25, 918 – 925. https://doi.org/10.1101/gr.176552.114
dc.identifier.citedreference	Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., … Lin, X. ( 2012 ). Optimal unified approach for rare‐variant association testing with application to small‐sample case‐control whole‐exome sequencing studies. The American Journal of Human Genetics, 91 ( 2 ), 224 – 237. https://doi.org/10.1016/j.ajhg.2012.06.007
dc.identifier.citedreference	Lee, S., Teslovich, T. M., Boehnke, M., & Lin, X. ( 2013 ). General framework for meta‐analysis of rare variants in sequencing association studies. The American Journal of Human Genetics, 93 ( 1 ), 42 – 53. https://doi.org/10.1016/j.ajhg.2013.05.010
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: gepi22261_am.pdf
Size:: 580.4KB
Format:: PDF

View/Open

Name:: gepi22261.pdf
Size:: 1.164MB
Format:: PDF

View/Open

Name:: gepi22261-sup-0002-final_revis ...
Size:: 5.106MB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.