Modeling time‐to‐event (survival) data using classification tree analysis
dc.contributor.author | Linden, Ariel | |
dc.contributor.author | Yarnold, Paul R. | |
dc.date.accessioned | 2018-02-05T16:43:23Z | |
dc.date.available | 2019-01-07T18:34:38Z | en |
dc.date.issued | 2017-12 | |
dc.identifier.citation | Linden, Ariel; Yarnold, Paul R. (2017). "Modeling time‐to‐event (survival) data using classification tree analysis." Journal of Evaluation in Clinical Practice 23(6): 1299-1308. | |
dc.identifier.issn | 1356-1294 | |
dc.identifier.issn | 1365-2753 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/141923 | |
dc.description.abstract | Rationale, aims, and objectivesTime to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow‐up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a “decision‐tree”–like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross‐generalizability.MethodUsing empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross‐generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves.ResultsThe Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time.ConclusionsClassification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA‐survival framework. | |
dc.publisher | APA Books | |
dc.publisher | Wiley Periodicals, Inc. | |
dc.subject.other | machine learning | |
dc.subject.other | survival | |
dc.subject.other | classification tree analysis | |
dc.subject.other | censoring | |
dc.title | Modeling time‐to‐event (survival) data using classification tree analysis | |
dc.type | Article | en_US |
dc.rights.robots | IndexNoFollow | |
dc.subject.hlbsecondlevel | Medicine (General) | |
dc.subject.hlbtoplevel | Health Sciences | |
dc.description.peerreviewed | Peer Reviewed | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/141923/1/jep12779.pdf | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/141923/2/jep12779_am.pdf | |
dc.identifier.doi | 10.1111/jep.12779 | |
dc.identifier.source | Journal of Evaluation in Clinical Practice | |
dc.identifier.citedreference | Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine. 1999; 18: 2529 ‐ 2545. | |
dc.identifier.citedreference | Yarnold PR, Linden A. Novometric analysis with ordered class variables: The optimal alternative to linear regression analysis. Optimal Data Analysis. 2016; 22: 65 ‐ 73. | |
dc.identifier.citedreference | Linden A, Adams J, Roberts N. Strengthening the case for disease management effectiveness: unhiding the hidden bias. Journal of Evaluation in Clinical Practice. 2006; 12: 140 ‐ 147. | |
dc.identifier.citedreference | Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. Journal of the American Medical Association. 1982; 247: 2543 ‐ 2546. | |
dc.identifier.citedreference | Gönen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005; 92: 965 ‐ 970. | |
dc.identifier.citedreference | Grønnesby JK, Borgan Ø. A method for checking regression models in survival analysis based on the risk score. Lifetime Data Analysis. 1996; 2: 315 ‐ 328. | |
dc.identifier.citedreference | May S, Hosmer DW. A simplified method of calculating an overall goodness‐of‐fit test for the Cox proportional hazards model. Lifetime Data Analysis. 1998; 4: 109 ‐ 120. | |
dc.identifier.citedreference | Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Statistics in Medicine. 2004; 23: 723 ‐ 748. | |
dc.identifier.citedreference | Gerds TA, Scheike TH, Blanche P, Ozenne B. ( 2017 ). riskRegression: Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks. R package version 1.3.7. https://cran.r‐project.org/web/packages/riskRegression/index.html [downloaded on April 3, 2017]. | |
dc.identifier.citedreference | Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of American Statistical Association. 1958; 53: 457 ‐ 481. | |
dc.identifier.citedreference | Cleves M, Gould W, Marchenko Y. An Introduction to Survival Analysis Using Stata (revised 3 rd edition). College Station, TX: Stata Press; 2016. | |
dc.identifier.citedreference | Iavindrasana J, Cohen G, Depeursinge A, Müller H, Meyer R, Geissbuhler A. Clinical data mining: a review. In IMIA Yearbook of Medical Informatics. (eds A. Geissbuhler, C. Kulikowski ). 2009; 48 ( Suppl 1 ): 121 – 133. | |
dc.identifier.citedreference | Linden A, Yarnold PR. Using classification tree analysis to generate propensity score weights. Journal of Evaluation in Clinical Practice. https://doi.org/10.1111/jep.12744 | |
dc.identifier.citedreference | Linden A. Estimating the effect of regression to the mean in health management programs. Disease Management and Health Outcomes. 2007; 15 ( 1 ): 7 ‐ 12. | |
dc.identifier.citedreference | Linden A, Adams JL. Using propensity score‐based weighting in the evaluation of health management programme effectiveness. Journal of Evaluation in Clinical Practice. 2010; 16: 175 ‐ 179. | |
dc.identifier.citedreference | Linden A, Adams J. Evaluating disease management program effectiveness: an introduction to instrumental variables. Journal of Evaluation in Clinical Practice. 2006; 12: 148 ‐ 154. | |
dc.identifier.citedreference | Linden A, Roberts N. Disease management interventions: What’s in the black box? Disease Management. 2004; 7: 275 ‐ 291. | |
dc.identifier.citedreference | Linden A, Butterworth S, Roberts N. Disease management interventions II: What else is in the black box? Disease Management. 2006; 9: 73 ‐ 85. | |
dc.identifier.citedreference | Altman DG, Bland M. Diagnostic tests 2: predictive values. British Medical Journal. 1994; 309: 102. | |
dc.identifier.citedreference | Yarnold PR, Linden A. Theoretical aspects of the D statistic. Optimal Data Analysis. 2016; 5: 171 ‐ 174. | |
dc.identifier.citedreference | Linden A, Schweitzer SO. Applying survival analysis to health risk assessment data to predict time to first hospitalization. AHSRHP Annual Meeting. 2001; 18: 26. | |
dc.identifier.citedreference | D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008; 117: 743 ‐ 753. | |
dc.identifier.citedreference | Biuso TJ, Butterworth S, Linden A. Targeting prediabetes with lifestyle, clinical and behavioral management interventions. Disease Management. 2007; 10 ( 1 ): 6 ‐ 15. | |
dc.identifier.citedreference | Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to survival analysis. Disease Management. 2004; 7: 180 ‐ 190. | |
dc.identifier.citedreference | Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996; 15: 361 ‐ 387. | |
dc.identifier.citedreference | Gordon L, Olshen R. Tree‐structured survival analysis. Cancer Treatment Reports. 1985; 69: 1065 ‐ 1068. | |
dc.identifier.citedreference | Brown SF, Branford AJ, Moran W. On the use of artificial neural networks for the analysis of survival data. IEEE Transactions on Neural Networks. 1997; 8: 1071 ‐ 1077. | |
dc.identifier.citedreference | Kattan MW, Hess KR, Beck JR. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Computers and Biomedical Research. 1998; 31: 363 ‐ 373. | |
dc.identifier.citedreference | Evers L, Messow CM. Sparse Kernel Methods for High‐dimensional Survival Data. Bioinformatics. 2008; 24: 1632 ‐ 1638. | |
dc.identifier.citedreference | Khan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. Eighth International Conference on Data Mining. 2008; 863 ‐ 868. | |
dc.identifier.citedreference | Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical Science. 2001; 16: 199 ‐ 231. | |
dc.identifier.citedreference | Cox DR. Regression models and life tables (with discussion). Journal of the Royal Statistical Society: Series B. 1972; 34: 187 ‐ 220. | |
dc.identifier.citedreference | Yarnold PR, Soltysik RC. Theoretical distributions of optima for univariate discrimination of random data. Decision Sciences. 1991; 22: 739 ‐ 752. | |
dc.identifier.citedreference | Linden A, Yarnold PR. Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments. Journal of Evaluation in Clinical Practice. 2016a; 22: 875 ‐ 885. | |
dc.identifier.citedreference | Linden A, Adams J, Roberts N. The generalizability of disease management program results: getting from here to there. Managed Care Interface. 2004; 17 ( 7 ): 38 ‐ 45. | |
dc.identifier.citedreference | Linden A, Yarnold PR. Using data mining techniques to characterize participation in observational studies. Journal of Evaluation in Clinical Practice. 2016b; 22: 839 ‐ 847. | |
dc.identifier.citedreference | Linden A, Yarnold PR. Using machine learning to assess covariate balance in matching studies. Journal of Evaluation in Clinical Practice. 2016c; 22: 848 ‐ 854. | |
dc.identifier.citedreference | Yarnold PR, Soltysik RC. Optimal Data Analysis: Guidebook with Software for Windows. Washington, D.C.: APA Books; 2005. | |
dc.identifier.citedreference | Yarnold PR, Soltysik RC. Maximizing Predictive Accuracy. Chicago, IL: ODA Books, 2016. https://doi.org/10.13140/RG.2.1.1368.3286 | |
dc.identifier.citedreference | Yarnold PR. Discriminating geriatric and non‐geriatric patients using functional status information: An example of classification tree analysis via UniODA. Educational and Psychological Measurement. 1996; 56: 656 ‐ 667. | |
dc.identifier.citedreference | Yarnold PR, Soltysik RC, Bennett CL. Predicting in‐hospital mortality of patients with AIDS‐related Pneumocystis carinii pneumonia: An example of hierarchically optimal classification tree analysis. Statistics in Medicine. 1997; 16: 1451 ‐ 1463. | |
dc.identifier.citedreference | Soltysik RC, Yarnold PR. Automated CTA software: Fundamental concepts and control commands. Optimal Data Analysis. 2010; 1: 144 ‐ 160. | |
dc.identifier.citedreference | Mahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet. 2014; 383: 999 ‐ 1008. | |
dc.identifier.citedreference | Dupont WD. Statistical Modeling for Biomedical Researchers. Cambridge, U.K.: Cambridge University Press; 2009. | |
dc.identifier.citedreference | Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to the bootstrap technique. Disease Management and Health Outcomes. 2005; 13: 159 ‐ 167. | |
dc.identifier.citedreference | Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994; 81: 515 ‐ 526. | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.