A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data
dc.contributor.author | Wei, Changshuai | en_US |
dc.contributor.author | Li, Ming | en_US |
dc.contributor.author | He, Zihuai | en_US |
dc.contributor.author | Vsevolozhskaya, Olga | en_US |
dc.contributor.author | Schaid, Daniel J. | en_US |
dc.contributor.author | Lu, Qing | en_US |
dc.date.accessioned | 2014-12-09T16:53:58Z | |
dc.date.available | WITHHELD_13_MONTHS | en_US |
dc.date.available | 2014-12-09T16:53:58Z | |
dc.date.issued | 2014-12 | en_US |
dc.identifier.citation | Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J.; Lu, Qing (2014). "A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data." Genetic Epidemiology 38(8): 699-708. | en_US |
dc.identifier.issn | 0741-0395 | en_US |
dc.identifier.issn | 1098-2272 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/109631 | |
dc.description.abstract | With advancements in next‐generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high‐dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU‐SEQ, for the high‐dimensional association analysis of sequencing data. Based on a nonparametric U ‐statistic, WU‐SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU‐SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy‐tailed distribution). Even when the assumptions were satisfied, WU‐SEQ still attained comparable performance to SKAT. Finally, we applied WU‐SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. | en_US |
dc.publisher | Wiley‐Interscience | en_US |
dc.subject.other | Weighted U ‐Statistic | en_US |
dc.subject.other | Next‐Generation Sequencing | en_US |
dc.subject.other | Rare Variants | en_US |
dc.title | A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data | en_US |
dc.type | Article | en_US |
dc.rights.robots | IndexNoFollow | en_US |
dc.subject.hlbsecondlevel | Biological Chemistry | en_US |
dc.subject.hlbsecondlevel | Genetics | en_US |
dc.subject.hlbsecondlevel | Molecular, Cellular and Developmental Biology | en_US |
dc.subject.hlbtoplevel | Science | en_US |
dc.subject.hlbtoplevel | Health Sciences | en_US |
dc.description.peerreviewed | Peer Reviewed | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/109631/1/gepi21864.pdf | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/109631/2/gepi21864-sup-0002-SuppMat.pdf | |
dc.identifier.doi | 10.1002/gepi.21864 | en_US |
dc.identifier.source | Genetic Epidemiology | en_US |
dc.identifier.citedreference | Shieh GS, Johnson RA, Frees EW. 1994. Testing independence of bivariate circular data and weighted degenerate U ‐statistics. Stat Sin 4 ( 2 ): 729 – 747. | en_US |
dc.identifier.citedreference | Serfling R. 1981. Approximation Theorems of Mathematical Statistics (Wiley Series in Probability and Statistics). New York: Wiley‐Interscience. | en_US |
dc.identifier.citedreference | Kryukov GV, Pennacchio LA, Sunyaev SR. 2007. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80 ( 4 ): 727 – 739. | en_US |
dc.identifier.citedreference | Ladouceur M, Dastani Z, Aulchenko YS, Greenwood CMT, Richards JB. 2012. The empirical power of rare variant association methods: results from Sanger sequencing in 1998 individuals. PloS Genet 8 ( 2 ): e1002496. | en_US |
dc.identifier.citedreference | Lee S, Wu MC, Lin XH. 2012. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13 ( 4 ): 762 – 775. | en_US |
dc.identifier.citedreference | Li HZ. 2012. U ‐statistics in genetic association studies. Hum Genet 131 ( 9 ): 1395 – 1401. | en_US |
dc.identifier.citedreference | Li BS, Leal SM. 2008. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83 ( 3 ): 311 – 321. | en_US |
dc.identifier.citedreference | Li M, Ye CY, Fu WJ, Elston RC, Lu Q. 2011. Detecting genetic interactions for quantitative traits with U ‐statistics. Genet Epidemiol 35 ( 6 ): 457 – 468. | en_US |
dc.identifier.citedreference | Lin DY, Tang ZZ. 2011. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89 ( 3 ): 354 – 367. | en_US |
dc.identifier.citedreference | Madsen BE, Browning SR. 2009. A groupwise association test for rare mutations using a weighted sum statistic. PloS Genet 5 ( 2 ): e1000384. | en_US |
dc.identifier.citedreference | Morgenthaler S, Thilly WG. 2007. A strategy to discover genes that carry multi‐allelic or mono‐allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res Fundam Mol Mech Mutagen 615 ( 1–2 ): 28 – 56. | en_US |
dc.identifier.citedreference | Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho‐Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ. 2011. Testing for an unusual distribution of rare variants. PloS Genet 7 ( 3 ): e1001322. | en_US |
dc.identifier.citedreference | Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. 2006. Principal components analysis corrects for stratification in genome‐wide association studies. Nat Genet 38 ( 8 ): 904 – 909. | en_US |
dc.identifier.citedreference | Pritchard JK. 2001. Are rare variants responsible for susceptibility to complex diseases ? Am J Hum Genet 69 ( 1 ): 124 – 137. | en_US |
dc.identifier.citedreference | Raychaudhuri S. 2011. Mapping rare and common causal alleles for complex human diseases. Cell 147 ( 1 ): 57 – 69. | en_US |
dc.identifier.citedreference | Romeo S, Yin W, Kozlitina J, Pennacchio LA, Boerwinkle E, Hobbs HH, Cohen JC. 2009. Rare loss‐of‐function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest 119 ( 1 ): 70 – 79. | en_US |
dc.identifier.citedreference | Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN. 2005. Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet 76 ( 5 ): 780 – 793. | en_US |
dc.identifier.citedreference | Shieh GS. 1997. Weighted degenerate U ‐ and V ‐statistics with estimated parameters. Stat Sin 7 ( 4 ): 1021 – 1038. | en_US |
dc.identifier.citedreference | Tzeng J‐Y, Zhang D, Chang S‐M, Thomas DC, Davidian M. 2009. Gene‐trait similarity regression for multimarker‐based association analysis. Biometrics 65 ( 3 ): 822 – 832. | en_US |
dc.identifier.citedreference | Wei C, Anthony JC, Lu Q. 2012. Genome‐environmental risk assessment of cocaine dependence. Front Genet 3: 83. | en_US |
dc.identifier.citedreference | Wei Z, Li M, Rebbeck T, Li H. 2008. U ‐statistics‐based tests for multiple genes in genetic association studies. Ann Hum Genet 72 ( 6 ): 821 – 833. | en_US |
dc.identifier.citedreference | Wei C, Lu Q. 2011. Collapsing ROC approach for risk prediction research on both common and rare variants. BMC Proc 5 ( Suppl 9 ): S42. | en_US |
dc.identifier.citedreference | Wei C, Schaid DJ, Lu Q. 2013. Trees assembling Mann‐Whitney approach for detecting genome‐wide joint association among low‐marginal‐effect loci. Genet Epidemiol 37 ( 1 ): 84 – 91. | en_US |
dc.identifier.citedreference | Wet TD, Venter JH. 1973. Asymptotic distributions for quadratic forms with applications to tests of fit. Ann Stat 1 ( 2 ): 380 – 387. | en_US |
dc.identifier.citedreference | Wu MC, Lee S, Cai TX, Li Y, Boehnke M, Lin XH. 2011. Rare‐variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89 ( 1 ): 82 – 93. | en_US |
dc.identifier.citedreference | Xu C, Tachmazidou I, Walter K, Ciampi A, Zeggini E, Greenwood CMT, the UKKC. 2014. Estimating genome‐wide significance for whole‐genome sequencing studies. Genet Epidemiol 38 ( 4 ): 281 – 290. | en_US |
dc.identifier.citedreference | Zhu X, Feng T, Li Y, Lu Q, Elston RC. 2010. Detecting rare variants for complex traits using family and unrelated data. Genet Epidemiol 34 ( 2 ): 171 – 187. | en_US |
dc.identifier.citedreference | Abecasis G, Altshuler D, Auton A, Brooks L, Durbin R, Gibbs R, Hurles M, McVean G. 2010. A map of human genome variation from population‐scale sequencing. Nature 467 ( 7319 ): 1061 – 1073. | en_US |
dc.identifier.citedreference | Ahituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, Doelle H, Ersoy B, Kryukov G, Schmidt S and others. 2007. Medical sequencing at the extremes of human body mass. Am J Hum Genet 80 ( 4 ): 779 – 791. | en_US |
dc.identifier.citedreference | Barnett IJ, Lee S, Lin XH. 2013. Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 37 ( 2 ): 142 – 151. | en_US |
dc.identifier.citedreference | Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR and others. 2008. Assessing the evolutionary impact of amino acid mutations in the human genome. PloS Genet 4 ( 5 ): e1000083. | en_US |
dc.identifier.citedreference | Chen LS, Hsu L, Gamazon ER, Cox NJ, Nicolae DL. 2012. An exponential combination procedure for set‐based association tests in sequencing studies. Am J Hum Genet 91 ( 6 ): 977 – 986. | en_US |
dc.identifier.citedreference | Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH. 2004. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305 ( 5685 ): 869 – 872. | en_US |
dc.identifier.citedreference | Davies RB. 1980. Algorithm AS 155: the distribution of a linear combination of χ 2 random variables. J R Stat Soc Ser C Appl Stat 29 ( 3 ): 323 – 333. | en_US |
dc.identifier.citedreference | Easton DF, Deffenbaugh AM, Pruss D, Frye C, Wenstrup RJ, Allen‐Brady K, Tavtigian SV, Monteiro ANA, Iversen ES, Couch FJ and others. 2007. A systematic genetic assessment of 1433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer‐predisposition genes. Am J Hum Genet 81 ( 5 ): 873 – 883. | en_US |
dc.identifier.citedreference | Fay JC, Wyckoff GJ, Wu CI. 2001. Positive and negative selection on the human genome. Genetics 158 ( 3 ): 1227 – 1234. | en_US |
dc.identifier.citedreference | Gregory GG. 1977. Large sample theory for U ‐statistics and tests of fit. Ann Stat 5 ( 1 ): 110 – 123. | en_US |
dc.identifier.citedreference | Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. 2009. Potential etiologic and functional implications of genome‐wide association loci for human diseases and traits. Proc Natl Acad Sci 106 ( 23 ): 9362 – 9367. | en_US |
dc.identifier.citedreference | Ji WZ, Foo JN, O'Roak BJ, Zhao H, Larson MG, Simon DB, Newton‐Cheh C, State MW, Levy D, Lifton RP. 2008. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 40 ( 5 ): 592 – 599. | en_US |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.