Show simple item record

A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data

dc.contributor.authorWei, Changshuaien_US
dc.contributor.authorLi, Mingen_US
dc.contributor.authorHe, Zihuaien_US
dc.contributor.authorVsevolozhskaya, Olgaen_US
dc.contributor.authorSchaid, Daniel J.en_US
dc.contributor.authorLu, Qingen_US
dc.date.accessioned2014-12-09T16:53:58Z
dc.date.availableWITHHELD_13_MONTHSen_US
dc.date.available2014-12-09T16:53:58Z
dc.date.issued2014-12en_US
dc.identifier.citationWei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J.; Lu, Qing (2014). "A Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Data." Genetic Epidemiology 38(8): 699-708.en_US
dc.identifier.issn0741-0395en_US
dc.identifier.issn1098-2272en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/109631
dc.description.abstractWith advancements in next‐generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high‐dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU‐SEQ, for the high‐dimensional association analysis of sequencing data. Based on a nonparametric U ‐statistic, WU‐SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU‐SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy‐tailed distribution). Even when the assumptions were satisfied, WU‐SEQ still attained comparable performance to SKAT. Finally, we applied WU‐SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.en_US
dc.publisherWiley‐Interscienceen_US
dc.subject.otherWeighted U ‐Statisticen_US
dc.subject.otherNext‐Generation Sequencingen_US
dc.subject.otherRare Variantsen_US
dc.titleA Weighted U ‐Statistic for Genetic Association Analyses of Sequencing Dataen_US
dc.typeArticleen_US
dc.rights.robotsIndexNoFollowen_US
dc.subject.hlbsecondlevelBiological Chemistryen_US
dc.subject.hlbsecondlevelGeneticsen_US
dc.subject.hlbsecondlevelMolecular, Cellular and Developmental Biologyen_US
dc.subject.hlbtoplevelScienceen_US
dc.subject.hlbtoplevelHealth Sciencesen_US
dc.description.peerreviewedPeer Revieweden_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/109631/1/gepi21864.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/109631/2/gepi21864-sup-0002-SuppMat.pdf
dc.identifier.doi10.1002/gepi.21864en_US
dc.identifier.sourceGenetic Epidemiologyen_US
dc.identifier.citedreferenceShieh GS, Johnson RA, Frees EW. 1994. Testing independence of bivariate circular data and weighted degenerate U ‐statistics. Stat Sin 4 ( 2 ): 729 – 747.en_US
dc.identifier.citedreferenceSerfling R. 1981. Approximation Theorems of Mathematical Statistics (Wiley Series in Probability and Statistics). New York: Wiley‐Interscience.en_US
dc.identifier.citedreferenceKryukov GV, Pennacchio LA, Sunyaev SR. 2007. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80 ( 4 ): 727 – 739.en_US
dc.identifier.citedreferenceLadouceur M, Dastani Z, Aulchenko YS, Greenwood CMT, Richards JB. 2012. The empirical power of rare variant association methods: results from Sanger sequencing in 1998 individuals. PloS Genet 8 ( 2 ): e1002496.en_US
dc.identifier.citedreferenceLee S, Wu MC, Lin XH. 2012. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13 ( 4 ): 762 – 775.en_US
dc.identifier.citedreferenceLi HZ. 2012. U ‐statistics in genetic association studies. Hum Genet 131 ( 9 ): 1395 – 1401.en_US
dc.identifier.citedreferenceLi BS, Leal SM. 2008. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83 ( 3 ): 311 – 321.en_US
dc.identifier.citedreferenceLi M, Ye CY, Fu WJ, Elston RC, Lu Q. 2011. Detecting genetic interactions for quantitative traits with U ‐statistics. Genet Epidemiol 35 ( 6 ): 457 – 468.en_US
dc.identifier.citedreferenceLin DY, Tang ZZ. 2011. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89 ( 3 ): 354 – 367.en_US
dc.identifier.citedreferenceMadsen BE, Browning SR. 2009. A groupwise association test for rare mutations using a weighted sum statistic. PloS Genet 5 ( 2 ): e1000384.en_US
dc.identifier.citedreferenceMorgenthaler S, Thilly WG. 2007. A strategy to discover genes that carry multi‐allelic or mono‐allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res Fundam Mol Mech Mutagen 615 ( 1–2 ): 28 – 56.en_US
dc.identifier.citedreferenceNeale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho‐Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ. 2011. Testing for an unusual distribution of rare variants. PloS Genet 7 ( 3 ): e1001322.en_US
dc.identifier.citedreferencePrice AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. 2006. Principal components analysis corrects for stratification in genome‐wide association studies. Nat Genet 38 ( 8 ): 904 – 909.en_US
dc.identifier.citedreferencePritchard JK. 2001. Are rare variants responsible for susceptibility to complex diseases ? Am J Hum Genet 69 ( 1 ): 124 – 137.en_US
dc.identifier.citedreferenceRaychaudhuri S. 2011. Mapping rare and common causal alleles for complex human diseases. Cell 147 ( 1 ): 57 – 69.en_US
dc.identifier.citedreferenceRomeo S, Yin W, Kozlitina J, Pennacchio LA, Boerwinkle E, Hobbs HH, Cohen JC. 2009. Rare loss‐of‐function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest 119 ( 1 ): 70 – 79.en_US
dc.identifier.citedreferenceSchaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM, Thibodeau SN. 2005. Nonparametric tests of association of multiple genes with human disease. Am J Hum Genet 76 ( 5 ): 780 – 793.en_US
dc.identifier.citedreferenceShieh GS. 1997. Weighted degenerate U ‐ and V ‐statistics with estimated parameters. Stat Sin 7 ( 4 ): 1021 – 1038.en_US
dc.identifier.citedreferenceTzeng J‐Y, Zhang D, Chang S‐M, Thomas DC, Davidian M. 2009. Gene‐trait similarity regression for multimarker‐based association analysis. Biometrics 65 ( 3 ): 822 – 832.en_US
dc.identifier.citedreferenceWei C, Anthony JC, Lu Q. 2012. Genome‐environmental risk assessment of cocaine dependence. Front Genet 3: 83.en_US
dc.identifier.citedreferenceWei Z, Li M, Rebbeck T, Li H. 2008. U ‐statistics‐based tests for multiple genes in genetic association studies. Ann Hum Genet 72 ( 6 ): 821 – 833.en_US
dc.identifier.citedreferenceWei C, Lu Q. 2011. Collapsing ROC approach for risk prediction research on both common and rare variants. BMC Proc 5 ( Suppl 9 ): S42.en_US
dc.identifier.citedreferenceWei C, Schaid DJ, Lu Q. 2013. Trees assembling Mann‐Whitney approach for detecting genome‐wide joint association among low‐marginal‐effect loci. Genet Epidemiol 37 ( 1 ): 84 – 91.en_US
dc.identifier.citedreferenceWet TD, Venter JH. 1973. Asymptotic distributions for quadratic forms with applications to tests of fit. Ann Stat 1 ( 2 ): 380 – 387.en_US
dc.identifier.citedreferenceWu MC, Lee S, Cai TX, Li Y, Boehnke M, Lin XH. 2011. Rare‐variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89 ( 1 ): 82 – 93.en_US
dc.identifier.citedreferenceXu C, Tachmazidou I, Walter K, Ciampi A, Zeggini E, Greenwood CMT, the UKKC. 2014. Estimating genome‐wide significance for whole‐genome sequencing studies. Genet Epidemiol 38 ( 4 ): 281 – 290.en_US
dc.identifier.citedreferenceZhu X, Feng T, Li Y, Lu Q, Elston RC. 2010. Detecting rare variants for complex traits using family and unrelated data. Genet Epidemiol 34 ( 2 ): 171 – 187.en_US
dc.identifier.citedreferenceAbecasis G, Altshuler D, Auton A, Brooks L, Durbin R, Gibbs R, Hurles M, McVean G. 2010. A map of human genome variation from population‐scale sequencing. Nature 467 ( 7319 ): 1061 – 1073.en_US
dc.identifier.citedreferenceAhituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, Doelle H, Ersoy B, Kryukov G, Schmidt S and others. 2007. Medical sequencing at the extremes of human body mass. Am J Hum Genet 80 ( 4 ): 779 – 791.en_US
dc.identifier.citedreferenceBarnett IJ, Lee S, Lin XH. 2013. Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 37 ( 2 ): 142 – 151.en_US
dc.identifier.citedreferenceBoyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR and others. 2008. Assessing the evolutionary impact of amino acid mutations in the human genome. PloS Genet 4 ( 5 ): e1000083.en_US
dc.identifier.citedreferenceChen LS, Hsu L, Gamazon ER, Cox NJ, Nicolae DL. 2012. An exponential combination procedure for set‐based association tests in sequencing studies. Am J Hum Genet 91 ( 6 ): 977 – 986.en_US
dc.identifier.citedreferenceCohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH. 2004. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305 ( 5685 ): 869 – 872.en_US
dc.identifier.citedreferenceDavies RB. 1980. Algorithm AS 155: the distribution of a linear combination of χ 2 random variables. J R Stat Soc Ser C Appl Stat 29 ( 3 ): 323 – 333.en_US
dc.identifier.citedreferenceEaston DF, Deffenbaugh AM, Pruss D, Frye C, Wenstrup RJ, Allen‐Brady K, Tavtigian SV, Monteiro ANA, Iversen ES, Couch FJ and others. 2007. A systematic genetic assessment of 1433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer‐predisposition genes. Am J Hum Genet 81 ( 5 ): 873 – 883.en_US
dc.identifier.citedreferenceFay JC, Wyckoff GJ, Wu CI. 2001. Positive and negative selection on the human genome. Genetics 158 ( 3 ): 1227 – 1234.en_US
dc.identifier.citedreferenceGregory GG. 1977. Large sample theory for U ‐statistics and tests of fit. Ann Stat 5 ( 1 ): 110 – 123.en_US
dc.identifier.citedreferenceHindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. 2009. Potential etiologic and functional implications of genome‐wide association loci for human diseases and traits. Proc Natl Acad Sci 106 ( 23 ): 9362 – 9367.en_US
dc.identifier.citedreferenceJi WZ, Foo JN, O'Roak BJ, Zhao H, Larson MG, Simon DB, Newton‐Cheh C, State MW, Levy D, Lifton RP. 2008. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 40 ( 5 ): 592 – 599.en_US
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.