Show simple item record

Modeling time‐to‐event (survival) data using classification tree analysis

dc.contributor.authorLinden, Ariel
dc.contributor.authorYarnold, Paul R.
dc.date.accessioned2018-02-05T16:43:23Z
dc.date.available2019-01-07T18:34:38Zen
dc.date.issued2017-12
dc.identifier.citationLinden, Ariel; Yarnold, Paul R. (2017). "Modeling time‐to‐event (survival) data using classification tree analysis." Journal of Evaluation in Clinical Practice 23(6): 1299-1308.
dc.identifier.issn1356-1294
dc.identifier.issn1365-2753
dc.identifier.urihttps://hdl.handle.net/2027.42/141923
dc.description.abstractRationale, aims, and objectivesTime to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow‐up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a “decision‐tree”–like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross‐generalizability.MethodUsing empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross‐generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves.ResultsThe Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time.ConclusionsClassification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA‐survival framework.
dc.publisherAPA Books
dc.publisherWiley Periodicals, Inc.
dc.subject.othermachine learning
dc.subject.othersurvival
dc.subject.otherclassification tree analysis
dc.subject.othercensoring
dc.titleModeling time‐to‐event (survival) data using classification tree analysis
dc.typeArticleen_US
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelMedicine (General)
dc.subject.hlbtoplevelHealth Sciences
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/141923/1/jep12779.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/141923/2/jep12779_am.pdf
dc.identifier.doi10.1111/jep.12779
dc.identifier.sourceJournal of Evaluation in Clinical Practice
dc.identifier.citedreferenceGraf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine. 1999; 18: 2529 ‐ 2545.
dc.identifier.citedreferenceYarnold PR, Linden A. Novometric analysis with ordered class variables: The optimal alternative to linear regression analysis. Optimal Data Analysis. 2016; 22: 65 ‐ 73.
dc.identifier.citedreferenceLinden A, Adams J, Roberts N. Strengthening the case for disease management effectiveness: unhiding the hidden bias. Journal of Evaluation in Clinical Practice. 2006; 12: 140 ‐ 147.
dc.identifier.citedreferenceHarrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. Journal of the American Medical Association. 1982; 247: 2543 ‐ 2546.
dc.identifier.citedreferenceGönen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005; 92: 965 ‐ 970.
dc.identifier.citedreferenceGrønnesby JK, Borgan Ø. A method for checking regression models in survival analysis based on the risk score. Lifetime Data Analysis. 1996; 2: 315 ‐ 328.
dc.identifier.citedreferenceMay S, Hosmer DW. A simplified method of calculating an overall goodness‐of‐fit test for the Cox proportional hazards model. Lifetime Data Analysis. 1998; 4: 109 ‐ 120.
dc.identifier.citedreferenceRoyston P, Sauerbrei W. A new measure of prognostic separation in survival data. Statistics in Medicine. 2004; 23: 723 ‐ 748.
dc.identifier.citedreferenceGerds TA, Scheike TH, Blanche P, Ozenne B. ( 2017 ). riskRegression: Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks. R package version 1.3.7. https://cran.r‐project.org/web/packages/riskRegression/index.html [downloaded on April 3, 2017].
dc.identifier.citedreferenceKaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of American Statistical Association. 1958; 53: 457 ‐ 481.
dc.identifier.citedreferenceCleves M, Gould W, Marchenko Y. An Introduction to Survival Analysis Using Stata (revised 3 rd edition). College Station, TX: Stata Press; 2016.
dc.identifier.citedreferenceIavindrasana J, Cohen G, Depeursinge A, Müller H, Meyer R, Geissbuhler A. Clinical data mining: a review. In IMIA Yearbook of Medical Informatics. (eds A. Geissbuhler, C. Kulikowski ). 2009; 48 ( Suppl 1 ): 121 – 133.
dc.identifier.citedreferenceLinden A, Yarnold PR. Using classification tree analysis to generate propensity score weights. Journal of Evaluation in Clinical Practice. https://doi.org/10.1111/jep.12744
dc.identifier.citedreferenceLinden A. Estimating the effect of regression to the mean in health management programs. Disease Management and Health Outcomes. 2007; 15 ( 1 ): 7 ‐ 12.
dc.identifier.citedreferenceLinden A, Adams JL. Using propensity score‐based weighting in the evaluation of health management programme effectiveness. Journal of Evaluation in Clinical Practice. 2010; 16: 175 ‐ 179.
dc.identifier.citedreferenceLinden A, Adams J. Evaluating disease management program effectiveness: an introduction to instrumental variables. Journal of Evaluation in Clinical Practice. 2006; 12: 148 ‐ 154.
dc.identifier.citedreferenceLinden A, Roberts N. Disease management interventions: What’s in the black box? Disease Management. 2004; 7: 275 ‐ 291.
dc.identifier.citedreferenceLinden A, Butterworth S, Roberts N. Disease management interventions II: What else is in the black box? Disease Management. 2006; 9: 73 ‐ 85.
dc.identifier.citedreferenceAltman DG, Bland M. Diagnostic tests 2: predictive values. British Medical Journal. 1994; 309: 102.
dc.identifier.citedreferenceYarnold PR, Linden A. Theoretical aspects of the D statistic. Optimal Data Analysis. 2016; 5: 171 ‐ 174.
dc.identifier.citedreferenceLinden A, Schweitzer SO. Applying survival analysis to health risk assessment data to predict time to first hospitalization. AHSRHP Annual Meeting. 2001; 18: 26.
dc.identifier.citedreferenceD’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008; 117: 743 ‐ 753.
dc.identifier.citedreferenceBiuso TJ, Butterworth S, Linden A. Targeting prediabetes with lifestyle, clinical and behavioral management interventions. Disease Management. 2007; 10 ( 1 ): 6 ‐ 15.
dc.identifier.citedreferenceLinden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to survival analysis. Disease Management. 2004; 7: 180 ‐ 190.
dc.identifier.citedreferenceHarrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996; 15: 361 ‐ 387.
dc.identifier.citedreferenceGordon L, Olshen R. Tree‐structured survival analysis. Cancer Treatment Reports. 1985; 69: 1065 ‐ 1068.
dc.identifier.citedreferenceBrown SF, Branford AJ, Moran W. On the use of artificial neural networks for the analysis of survival data. IEEE Transactions on Neural Networks. 1997; 8: 1071 ‐ 1077.
dc.identifier.citedreferenceKattan MW, Hess KR, Beck JR. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Computers and Biomedical Research. 1998; 31: 363 ‐ 373.
dc.identifier.citedreferenceEvers L, Messow CM. Sparse Kernel Methods for High‐dimensional Survival Data. Bioinformatics. 2008; 24: 1632 ‐ 1638.
dc.identifier.citedreferenceKhan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. Eighth International Conference on Data Mining. 2008; 863 ‐ 868.
dc.identifier.citedreferenceBreiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical Science. 2001; 16: 199 ‐ 231.
dc.identifier.citedreferenceCox DR. Regression models and life tables (with discussion). Journal of the Royal Statistical Society: Series B. 1972; 34: 187 ‐ 220.
dc.identifier.citedreferenceYarnold PR, Soltysik RC. Theoretical distributions of optima for univariate discrimination of random data. Decision Sciences. 1991; 22: 739 ‐ 752.
dc.identifier.citedreferenceLinden A, Yarnold PR. Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments. Journal of Evaluation in Clinical Practice. 2016a; 22: 875 ‐ 885.
dc.identifier.citedreferenceLinden A, Adams J, Roberts N. The generalizability of disease management program results: getting from here to there. Managed Care Interface. 2004; 17 ( 7 ): 38 ‐ 45.
dc.identifier.citedreferenceLinden A, Yarnold PR. Using data mining techniques to characterize participation in observational studies. Journal of Evaluation in Clinical Practice. 2016b; 22: 839 ‐ 847.
dc.identifier.citedreferenceLinden A, Yarnold PR. Using machine learning to assess covariate balance in matching studies. Journal of Evaluation in Clinical Practice. 2016c; 22: 848 ‐ 854.
dc.identifier.citedreferenceYarnold PR, Soltysik RC. Optimal Data Analysis: Guidebook with Software for Windows. Washington, D.C.: APA Books; 2005.
dc.identifier.citedreferenceYarnold PR, Soltysik RC. Maximizing Predictive Accuracy. Chicago, IL: ODA Books, 2016. https://doi.org/10.13140/RG.2.1.1368.3286
dc.identifier.citedreferenceYarnold PR. Discriminating geriatric and non‐geriatric patients using functional status information: An example of classification tree analysis via UniODA. Educational and Psychological Measurement. 1996; 56: 656 ‐ 667.
dc.identifier.citedreferenceYarnold PR, Soltysik RC, Bennett CL. Predicting in‐hospital mortality of patients with AIDS‐related Pneumocystis carinii pneumonia: An example of hierarchically optimal classification tree analysis. Statistics in Medicine. 1997; 16: 1451 ‐ 1463.
dc.identifier.citedreferenceSoltysik RC, Yarnold PR. Automated CTA software: Fundamental concepts and control commands. Optimal Data Analysis. 2010; 1: 144 ‐ 160.
dc.identifier.citedreferenceMahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet. 2014; 383: 999 ‐ 1008.
dc.identifier.citedreferenceDupont WD. Statistical Modeling for Biomedical Researchers. Cambridge, U.K.: Cambridge University Press; 2009.
dc.identifier.citedreferenceLinden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to the bootstrap technique. Disease Management and Health Outcomes. 2005; 13: 159 ‐ 167.
dc.identifier.citedreferenceGrambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994; 81: 515 ‐ 526.
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.