A Weighted  U ‐Statistic for Genetic Association Analyses of Sequencing Data

Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J.; Lu, Qing

A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data

dc.contributor.author	Wei, Changshuai	en_US
dc.contributor.author	Li, Ming	en_US
dc.contributor.author	He, Zihuai	en_US
dc.contributor.author	Vsevolozhskaya, Olga	en_US
dc.contributor.author	Schaid, Daniel J.	en_US
dc.contributor.author	Lu, Qing	en_US
dc.date.accessioned	2014-12-09T16:53:58Z
dc.date.available	WITHHELD_13_MONTHS	en_US
dc.date.available	2014-12-09T16:53:58Z
dc.date.issued	2014-12	en_US
dc.identifier.citation	Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J.; Lu, Qing (2014). "A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data." Genetic Epidemiology 38(8): 699-708.	en_US
dc.identifier.issn	0741-0395	en_US
dc.identifier.issn	1098-2272	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/109631
dc.description.abstract	With advancements in next‐generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high‐dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU‐SEQ, for the high‐dimensional association analysis of sequencing data. Based on a nonparametric U ‐statistic, WU‐SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU‐SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy‐tailed distribution). Even when the assumptions were satisfied, WU‐SEQ still attained comparable performance to SKAT. Finally, we applied WU‐SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.	en_US
dc.publisher	Wiley‐Interscience	en_US
dc.subject.other	Weighted U ‐Statistic	en_US
dc.subject.other	Next‐Generation Sequencing	en_US
dc.subject.other	Rare Variants	en_US
dc.title	A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data	en_US
dc.type	Article	en_US
dc.rights.robots	IndexNoFollow	en_US
dc.subject.hlbsecondlevel	Biological Chemistry	en_US
dc.subject.hlbsecondlevel	Genetics	en_US
dc.subject.hlbsecondlevel	Molecular, Cellular and Developmental Biology	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.subject.hlbtoplevel	Health Sciences	en_US
dc.description.peerreviewed	Peer Reviewed	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/109631/1/gepi21864.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/109631/2/gepi21864-sup-0002-SuppMat.pdf
dc.identifier.doi	10.1002/gepi.21864	en_US
dc.identifier.source	Genetic Epidemiology	en_US
dc.identifier.citedreference	Shieh GS, Johnson RA, Frees EW. 1994. Testing independence of bivariate circular data and weighted degenerate U ‐statistics. Stat Sin 4 ( 2 ): 729 – 747.	en_US
dc.identifier.citedreference	Serfling R. 1981. Approximation Theorems of Mathematical Statistics (Wiley Series in Probability and Statistics). New York: Wiley‐Interscience.	en_US
dc.identifier.citedreference	Kryukov GV, Pennacchio LA, Sunyaev SR. 2007. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80 ( 4 ): 727 – 739.	en_US
dc.identifier.citedreference	Ladouceur M, Dastani Z, Aulchenko YS, Greenwood CMT, Richards JB. 2012. The empirical power of rare variant association methods: results from Sanger sequencing in 1998 individuals. PloS Genet 8 ( 2 ): e1002496.	en_US
dc.identifier.citedreference	Lee S, Wu MC, Lin XH. 2012. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13 ( 4 ): 762 – 775.	en_US
dc.identifier.citedreference	Li HZ. 2012. U ‐statistics in genetic association studies. Hum Genet 131 ( 9 ): 1395 – 1401.	en_US
dc.identifier.citedreference	Li BS, Leal SM. 2008. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83 ( 3 ): 311 – 321.	en_US
dc.identifier.citedreference	Li M, Ye CY, Fu WJ, Elston RC, Lu Q. 2011. Detecting genetic interactions for quantitative traits with U ‐statistics. Genet Epidemiol 35 ( 6 ): 457 – 468.	en_US
dc.identifier.citedreference	Lin DY, Tang ZZ. 2011. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89 ( 3 ): 354 – 367.	en_US
dc.identifier.citedreference	Madsen BE, Browning SR. 2009. A groupwise association test for rare mutations using a weighted sum statistic. PloS Genet 5 ( 2 ): e1000384.	en_US
dc.identifier.citedreference	Morgenthaler S, Thilly WG. 2007. A strategy to discover genes that carry multi‐allelic or mono‐allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res Fundam Mol Mech Mutagen 615 ( 1–2 ): 28 – 56.	en_US
dc.identifier.citedreference	Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho‐Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ. 2011. Testing for an unusual distribution of rare variants. PloS Genet 7 ( 3 ): e1001322.	en_US
dc.identifier.citedreference	Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. 2006. Principal components analysis corrects for stratification in genome‐wide association studies. Nat Genet 38 ( 8 ): 904 – 909.	en_US
dc.identifier.citedreference	Pritchard JK. 2001. Are rare variants responsible for susceptibility to complex diseases ? Am J Hum Genet 69 ( 1 ): 124 – 137.	en_US
dc.identifier.citedreference	Raychaudhuri S. 2011. Mapping rare and common causal alleles for complex human diseases. Cell 147 ( 1 ): 57 – 69.	en_US
dc.identifier.citedreference	Romeo S, Yin W, Kozlitina J, Pennacchio LA, Boerwinkle E, Hobbs HH, Cohen JC. 2009. Rare loss‐of‐function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest 119 ( 1 ): 70 – 79.	en_US
dc.identifier.citedreference	Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN. 2005. Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet 76 ( 5 ): 780 – 793.	en_US
dc.identifier.citedreference	Shieh GS. 1997. Weighted degenerate U ‐ and V ‐statistics with estimated parameters. Stat Sin 7 ( 4 ): 1021 – 1038.	en_US
dc.identifier.citedreference	Tzeng J‐Y, Zhang D, Chang S‐M, Thomas DC, Davidian M. 2009. Gene‐trait similarity regression for multimarker‐based association analysis. Biometrics 65 ( 3 ): 822 – 832.	en_US
dc.identifier.citedreference	Wei C, Anthony JC, Lu Q. 2012. Genome‐environmental risk assessment of cocaine dependence. Front Genet 3: 83.	en_US
dc.identifier.citedreference	Wei Z, Li M, Rebbeck T, Li H. 2008. U ‐statistics‐based tests for multiple genes in genetic association studies. Ann Hum Genet 72 ( 6 ): 821 – 833.	en_US
dc.identifier.citedreference	Wei C, Lu Q. 2011. Collapsing ROC approach for risk prediction research on both common and rare variants. BMC Proc 5 ( Suppl 9 ): S42.	en_US
dc.identifier.citedreference	Wei C, Schaid DJ, Lu Q. 2013. Trees assembling Mann‐Whitney approach for detecting genome‐wide joint association among low‐marginal‐effect loci. Genet Epidemiol 37 ( 1 ): 84 – 91.	en_US
dc.identifier.citedreference	Wet TD, Venter JH. 1973. Asymptotic distributions for quadratic forms with applications to tests of fit. Ann Stat 1 ( 2 ): 380 – 387.	en_US
dc.identifier.citedreference	Wu MC, Lee S, Cai TX, Li Y, Boehnke M, Lin XH. 2011. Rare‐variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89 ( 1 ): 82 – 93.	en_US
dc.identifier.citedreference	Xu C, Tachmazidou I, Walter K, Ciampi A, Zeggini E, Greenwood CMT, the UKKC. 2014. Estimating genome‐wide significance for whole‐genome sequencing studies. Genet Epidemiol 38 ( 4 ): 281 – 290.	en_US
dc.identifier.citedreference	Zhu X, Feng T, Li Y, Lu Q, Elston RC. 2010. Detecting rare variants for complex traits using family and unrelated data. Genet Epidemiol 34 ( 2 ): 171 – 187.	en_US
dc.identifier.citedreference	Abecasis G, Altshuler D, Auton A, Brooks L, Durbin R, Gibbs R, Hurles M, McVean G. 2010. A map of human genome variation from population‐scale sequencing. Nature 467 ( 7319 ): 1061 – 1073.	en_US
dc.identifier.citedreference	Ahituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, Doelle H, Ersoy B, Kryukov G, Schmidt S and others. 2007. Medical sequencing at the extremes of human body mass. Am J Hum Genet 80 ( 4 ): 779 – 791.	en_US
dc.identifier.citedreference	Barnett IJ, Lee S, Lin XH. 2013. Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 37 ( 2 ): 142 – 151.	en_US
dc.identifier.citedreference	Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR and others. 2008. Assessing the evolutionary impact of amino acid mutations in the human genome. PloS Genet 4 ( 5 ): e1000083.	en_US
dc.identifier.citedreference	Chen LS, Hsu L, Gamazon ER, Cox NJ, Nicolae DL. 2012. An exponential combination procedure for set‐based association tests in sequencing studies. Am J Hum Genet 91 ( 6 ): 977 – 986.	en_US
dc.identifier.citedreference	Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH. 2004. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305 ( 5685 ): 869 – 872.	en_US
dc.identifier.citedreference	Davies RB. 1980. Algorithm AS 155: the distribution of a linear combination of χ 2 random variables. J R Stat Soc Ser C Appl Stat 29 ( 3 ): 323 – 333.	en_US
dc.identifier.citedreference	Easton DF, Deffenbaugh AM, Pruss D, Frye C, Wenstrup RJ, Allen‐Brady K, Tavtigian SV, Monteiro ANA, Iversen ES, Couch FJ and others. 2007. A systematic genetic assessment of 1433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer‐predisposition genes. Am J Hum Genet 81 ( 5 ): 873 – 883.	en_US
dc.identifier.citedreference	Fay JC, Wyckoff GJ, Wu CI. 2001. Positive and negative selection on the human genome. Genetics 158 ( 3 ): 1227 – 1234.	en_US
dc.identifier.citedreference	Gregory GG. 1977. Large sample theory for U ‐statistics and tests of fit. Ann Stat 5 ( 1 ): 110 – 123.	en_US
dc.identifier.citedreference	Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. 2009. Potential etiologic and functional implications of genome‐wide association loci for human diseases and traits. Proc Natl Acad Sci 106 ( 23 ): 9362 – 9367.	en_US
dc.identifier.citedreference	Ji WZ, Foo JN, O'Roak BJ, Zhao H, Larson MG, Simon DB, Newton‐Cheh C, State MW, Levy D, Lifton RP. 2008. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 40 ( 5 ): 592 – 599.	en_US
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: gepi21864.pdf
Size:: 410.7KB
Format:: PDF

View/Open

Name:: gepi21864-sup-0002-SuppMat.pdf
Size:: 132.9KB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.