Application of machine learning algorithms to predict coronary artery calcification with a sibship-based design

Sun, Yan V.; Bielak, Lawrence F.; Peyser, Patricia A.; Turner, Stephen T.; Sheedy, Patrick F.; Boerwinkle, Eric; Kardia, Sharon L. R.

Application of machine learning algorithms to predict coronary artery calcification with a sibship-based design

dc.contributor.author	Sun, Yan V.	en_US
dc.contributor.author	Bielak, Lawrence F.	en_US
dc.contributor.author	Peyser, Patricia A.	en_US
dc.contributor.author	Turner, Stephen T.	en_US
dc.contributor.author	Sheedy, Patrick F.	en_US
dc.contributor.author	Boerwinkle, Eric	en_US
dc.contributor.author	Kardia, Sharon L. R.	en_US
dc.date.accessioned	2008-05-12T13:38:55Z
dc.date.available	2009-05-04T19:09:21Z	en_US
dc.date.issued	2008-05	en_US
dc.identifier.citation	Sun, Yan V.; Bielak, Lawrence F.; Peyser, Patricia A.; Turner, Stephen T.; Sheedy, Patrick F.; Boerwinkle, Eric; Kardia, Sharon L.R. (2008). "Application of machine learning algorithms to predict coronary artery calcification with a sibship-based design." Genetic Epidemiology 32(4): 350-360. <http://hdl.handle.net/2027.42/58570>	en_US
dc.identifier.issn	0741-0395	en_US
dc.identifier.issn	1098-2272	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/58570
dc.identifier.uri	http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db =pubmed&list_uids=18271057&dopt=citation
dc.description.abstract	As part of the Genetic Epidemiology Network of Arteriopathy study, hypertensive non-Hispanic White sibships were screened using 471 single nucleotide polymorphisms (SNPs) to identify genes influencing coronary artery calcification (CAC) measured by computed tomography. Individuals with detectable CAC and CAC quantity ≥70th age- and sex-specific percentile were classified as having a high CAC burden and compared to individuals with CAC quantity <70th percentile. Two sibs from each sibship were randomly chosen and divided into two data sets, each with 360 unrelated individuals. Within each data set, we applied two machine learning algorithms, Random Forests and RuleFit, to identify the best predictors of having high CAC burden among 17 risk factors and 471 SNPs. Using five-fold cross-validation, both methods had ∼70% sensitivity and ∼60% specificity. Prediction accuracies were significantly different from random predictions ( P -value<0.001) based on 1,000 permutation tests. Predictability of using 287 tagSNPs was as good as using all 471 SNPs. For Random Forests, among the top 50 predictors, the same eight tagSNPs and 15 risk factors were found in both data sets while eight tagSNPs and 12 risk factors were found in both data sets for RuleFit. Replicable effects of two tagSNPs (in genes GPR35 and NOS3 ) and 12 risk factors (age, body mass index, sex, serum glucose, high-density lipoprotein cholesterol, systolic blood pressure, cholesterol, homocysteine, triglycerides, fibrinogen, Lp(a) and low-density lipoprotein particle size) were identified by both methods. This study illustrates how machine learning methods can be used in sibships to identify important, replicable predictors of subclinical coronary atherosclerosis. Genet. Epidemiol . 2008. © 2008 Wiley-Liss, Inc.	en_US
dc.format.extent	285597 bytes
dc.format.extent	3118 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.publisher	Wiley Subscription Services, Inc., A Wiley Company	en_US
dc.subject.other	Life and Medical Sciences	en_US
dc.subject.other	Genetics	en_US
dc.title	Application of machine learning algorithms to predict coronary artery calcification with a sibship-based design	en_US
dc.type	Article	en_US
dc.rights.robots	IndexNoFollow	en_US
dc.subject.hlbsecondlevel	Biological Chemistry	en_US
dc.subject.hlbsecondlevel	Genetics	en_US
dc.subject.hlbsecondlevel	Molecular, Cellular and Developmental Biology	en_US
dc.subject.hlbtoplevel	Health Sciences	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.description.peerreviewed	Peer Reviewed	en_US
dc.contributor.affiliationum	Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan ; Department of Epidemiology, School of Public Health, University of Michigan, 109 Observatory, Ann Arbor, MI 48109	en_US
dc.contributor.affiliationum	Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan	en_US
dc.contributor.affiliationum	Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan	en_US
dc.contributor.affiliationum	Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan	en_US
dc.contributor.affiliationother	Division of Nephrology and Hypertension, Mayo Clinic, Rochester, Minnesota	en_US
dc.contributor.affiliationother	Division of Diagnostic Radiology, Mayo Clinic, Rochester, Minnesota	en_US
dc.contributor.affiliationother	Human Genetics Center, University of Texas Health Sciences Center, Houston, Texas	en_US
dc.identifier.pmid	18271057
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/58570/1/20309_ftp.pdf
dc.identifier.doi	http://dx.doi.org/10.1002/gepi.20309	en_US
dc.identifier.source	Genetic Epidemiology	en_US
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: 20309_ftp.pdf
Size:: 278.9KB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.