Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results
dc.contributor.author | Chen, Zhongsheng | |
dc.contributor.author | Boehnke, Michael | |
dc.contributor.author | Fuchsberger, Christian | |
dc.date.accessioned | 2020-02-05T15:06:26Z | |
dc.date.available | WITHHELD_12_MONTHS | |
dc.date.available | 2020-02-05T15:06:26Z | |
dc.date.issued | 2020-01 | |
dc.identifier.citation | Chen, Zhongsheng; Boehnke, Michael; Fuchsberger, Christian (2020). "Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results." Genetic Epidemiology 44(1): 41-51. | |
dc.identifier.issn | 0741-0395 | |
dc.identifier.issn | 1098-2272 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/153654 | |
dc.description.abstract | Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single‐study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep‐coverage (~82×) exome and low‐coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts.For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta‐analysis has similar power to joint analysis in deep‐coverage sequence data but can be less powerful in low‐coverage sequence data. Given similar data processing and quality control steps, we recommend single‐study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep‐coverage data. | |
dc.publisher | Wiley Periodicals, Inc. | |
dc.subject.other | meta‐analysis | |
dc.subject.other | joint analysis | |
dc.subject.other | rare variants | |
dc.subject.other | Sequencing studies | |
dc.title | Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results | |
dc.type | Article | |
dc.rights.robots | IndexNoFollow | |
dc.subject.hlbsecondlevel | Biological Chemistry | |
dc.subject.hlbsecondlevel | Genetics | |
dc.subject.hlbsecondlevel | Molecular, Cellular and Developmental Biology | |
dc.subject.hlbtoplevel | Health Sciences | |
dc.subject.hlbtoplevel | Science | |
dc.description.peerreviewed | Peer Reviewed | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153654/1/gepi22261_am.pdf | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153654/2/gepi22261.pdf | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153654/3/gepi22261-sup-0002-final_revised_supp_figures_7_19_2019.pdf | |
dc.identifier.doi | 10.1002/gepi.22261 | |
dc.identifier.source | Genetic Epidemiology | |
dc.identifier.citedreference | Ma, C., Blackwell, T., Boehnke, M., & Scott, L. J., GoT2D Investigators. ( 2013 ). Recommended joint and meta‐analysis strategies for case‐control association testing of single low‐count variants. Genetic Epidemiology, 37 ( 6 ), 539 – 550. https://doi.org/10.1002/gepi.21742 | |
dc.identifier.citedreference | McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., … DePristo, M. A. ( 2010 ). The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Research, 20, 1297 – 1303. https://doi.org/10.1101/gr.107524.110 | |
dc.identifier.citedreference | Okada, Y., Momozawa, Y., Sakaue, S., Kanai, M., Ishigaki, K., Akiyama, M., … Kamatani, Y. ( 2018 ). Deep whole‐genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nature Communications, 9 ( 1 ), 1631. https://doi.org/10.1038/s41467‐018‐03274‐0 | |
dc.identifier.citedreference | Paltoo, D. N., Rodriguez, L. L., Feolo, M., Gillanders, E., Ramos, E. M., Rutter, J. L., & Caulder, M. ( 2014 ). Data use under the NIH GWAS data sharing policy and future directions. Nature Genetics, 46 ( 9 ), 934 – 938. https://doi.org/10.1038/ng.3062 | |
dc.identifier.citedreference | Lee, S., Abecasis, G. R., Boehnke, M., & Lin, X. ( 2014 ). Rare‐variant association analysis: Study designs and statistical tests. The American Journal of Human Genetics, 95 ( 1 ), 5 – 23. https://doi.org/10.1016/j.ajhg.2014.06.009 | |
dc.identifier.citedreference | Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., … MacArthur, D. G. ( 2016 ). Analysis of protein‐coding genetic variation in 60,706 humans. Nature, 536 ( 7616 ), 285 – 291. https://doi.org/10.1038/nature19057 | |
dc.identifier.citedreference | Li, H., & Durbin, R. ( 2009 ). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25 ( 14 ), 1754 – 1760. https://doi.org/10.1093/bioinformatics/btp324 | |
dc.identifier.citedreference | Li, Y., Sidore, C., Kang, H. M., Boehnke, M., & Abecasis, G. R. ( 2011 ). Low‐coverage sequencing: Implications for design of complex trait association studies. Genome Research, 21, 940 – 951. https://doi.org/10.1101/gr.117259.110 | |
dc.identifier.citedreference | Lin, D. Y., & Zeng, D. ( 2010 ). Meta‐analysis of genome‐wide association studies: No efficiency gain in using individual participant data. Genetic Epidemiology, 34 ( 1 ), 60 – 66. https://doi.org/10.1002/gepi.20435 | |
dc.identifier.citedreference | Luo, Y., de Lange, K. M., Jostins, L., Moutsianas, L., Randall, J., Kennedy, N. A., … Serra, E. G. ( 2017 ). Exploring the genetic architecture of inflammatory bowel disease by whole‐genome sequencing identifies association at ADCY7. Nature genetics, 49 ( 2 ), 186. https://doi.org/10.1038/ng.3761 | |
dc.identifier.citedreference | Mägi, R., Horikoshi, M., Sofer, T., Mahajan, A., Kitajima, H., Franceschini, N., … Morris, A. P. ( 2017 ). Trans‐ethnic meta‐regression of genome‐wide association studies accounting for ancestry increases power for discovery and improves fine‐mapping resolution. Human Molecular Genetics, 26 ( 18 ), 3639 – 3650. https://doi.org/10.1093/hmg/ddx280 | |
dc.identifier.citedreference | Zuk, O., Schaffner, S. F., Samocha, K., Do, R., Hechter, E., Kathiresan, S., … Lander, E. S. ( 2014 ). Searching for missing heritability: Designing rare variant association studies. Proceedings of the National Academy of Sciences, 111 ( 4 ), E455 – E464. https://doi.org/10.1073/pnas.1322563111 | |
dc.identifier.citedreference | Xu, C., Wu, K., Zhang, J. G., Shen, H., & Deng, H. W. ( 2017 ). Low‐, high‐coverage, and two‐stage DNA sequencing in the design of the genetic association study. Genetic Epidemiology, 41 ( 3 ), 187 – 197. https://doi.org/10.1002/gepi.22015 | |
dc.identifier.citedreference | Willer, C. J., Li, Y., & Abecasis, G. R. ( 2010 ). METAL: Fast and efficient meta‐analysis of genomewide association scans. Bioinformatics, 26 ( 17 ), 2190 – 2191. https://doi.org/10.1093/bioinformatics/btq340 | |
dc.identifier.citedreference | Van der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., Del Angel, G., Levy‐Moonshine, A., … DePristo, M. A. ( 2013 ). From FastQ data to high‐confidence variant calls: The genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics, 43 ( 1 ), 11.10.1 – 11.10.33. https://doi.org/10.1002/0471250953.bi1110s43 | |
dc.identifier.citedreference | Tang, Z. Z., & Lin, D. Y. ( 2015 ). Meta‐analysis for discovering rare‐variant associations: Statistical methods and software programs. The American Journal of Human Genetics, 97 ( 1 ), 35 – 53. https://doi.org/10.1016/j.ajhg.2015.05.001 | |
dc.identifier.citedreference | 1000 Genomes Project Consortium ( 2015 ). A global reference for human genetic variation. Nature, 526 ( 7571 ), 68 – 74. https://doi.org/10.1038/nature15393 | |
dc.identifier.citedreference | Auer, P. L., Reiner, A. P., Wang, G., Kang, H. M., Abecasis, G. R., Altshuler, D., … Leal, S. M. ( 2016 ). Guidelines for large‐scale sequence‐based complex trait association studies: Lessons learned from the NHLBI exome sequencing project. The American Journal of Human Genetics, 99 ( 4 ), 791 – 801. https://doi.org/10.1016/j.ajhg.2016.08.012 | |
dc.identifier.citedreference | Browning, B. L., & Yu, Z. ( 2009 ). Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false‐positive associations for genome‐wide association studies. The American Journal of Human Genetics, 85 ( 6 ), 847 – 861. https://doi.org/10.1016/j.ajhg.2009.11.004 | |
dc.identifier.citedreference | DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., & McKenna, A. ( 2011 ). A framework for variation discovery and genotyping using next‐generation DNA sequencing data. Nature Genetics, 43 ( 5 ), 491. https://doi.org/10.1038/ng.806 | |
dc.identifier.citedreference | Flannick, J., Fuchsberger, C., Mahajan, A., Teslovich, T. M., Agarwala, V., Gaulton, K. J., … Koistinen, H. A. ( 2017 ). Sequence data and association statistics from 12,940 type 2 diabetes cases and controls. Scientific Data, 4, 170179. https://doi.org/10.1038/sdata.2017.179 | |
dc.identifier.citedreference | Fuchsberger, C., Flannick, J., Teslovich, T. M., Mahajan, A., Agarwala, V., Gaulton, K. J., … Koistinen, H. A. ( 2016 ). The genetic architecture of type 2 diabetes. Nature, 536 ( 7614 ), 41 – 47. https://doi.org/10.1038/nature18642 | |
dc.identifier.citedreference | Hindorff, LA, MacArthur, J, Wise, A, Junkins, HA, Hall, PN, Klemm, AK, & Manolio, TA. ( 2012 ). A catalog of published genome‐wide association studies. NHGRI. Available at: www.ebi.ac.uk/gwas/diagram | |
dc.identifier.citedreference | Jiang, W., Chen, S. Y., Wang, H., Li, D. Z., & Wiens, J. J. ( 2014 ). Should genes with missing data be excluded from phylogenetic analyses? Molecular Phylogenetics and Evolution, 80, 308 – 318. https://doi.org/10.1016/j.ympev.2014.08.006 | |
dc.identifier.citedreference | Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. ( 2015 ). An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Research, 25, 918 – 925. https://doi.org/10.1101/gr.176552.114 | |
dc.identifier.citedreference | Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., … Lin, X. ( 2012 ). Optimal unified approach for rare‐variant association testing with application to small‐sample case‐control whole‐exome sequencing studies. The American Journal of Human Genetics, 91 ( 2 ), 224 – 237. https://doi.org/10.1016/j.ajhg.2012.06.007 | |
dc.identifier.citedreference | Lee, S., Teslovich, T. M., Boehnke, M., & Lin, X. ( 2013 ). General framework for meta‐analysis of rare variants in sequencing association studies. The American Journal of Human Genetics, 93 ( 1 ), 42 – 53. https://doi.org/10.1016/j.ajhg.2013.05.010 | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.