Modeling time‐to‐event (survival) data using classification tree analysis

Linden, Ariel; Yarnold, Paul R.

Modeling time‐to‐event (survival) data using classification tree analysis

dc.contributor.author	Linden, Ariel
dc.contributor.author	Yarnold, Paul R.
dc.date.accessioned	2018-02-05T16:43:23Z
dc.date.available	2019-01-07T18:34:38Z	en
dc.date.issued	2017-12
dc.identifier.citation	Linden, Ariel; Yarnold, Paul R. (2017). "Modeling time‐to‐event (survival) data using classification tree analysis." Journal of Evaluation in Clinical Practice 23(6): 1299-1308.
dc.identifier.issn	1356-1294
dc.identifier.issn	1365-2753
dc.identifier.uri	https://hdl.handle.net/2027.42/141923
dc.description.abstract	Rationale, aims, and objectivesTime to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow‐up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a “decision‐tree”–like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross‐generalizability.MethodUsing empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross‐generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves.ResultsThe Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time.ConclusionsClassification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA‐survival framework.
dc.publisher	APA Books
dc.publisher	Wiley Periodicals, Inc.
dc.subject.other	machine learning
dc.subject.other	survival
dc.subject.other	classification tree analysis
dc.subject.other	censoring
dc.title	Modeling time‐to‐event (survival) data using classification tree analysis
dc.type	Article	en_US
dc.rights.robots	IndexNoFollow
dc.subject.hlbsecondlevel	Medicine (General)
dc.subject.hlbtoplevel	Health Sciences
dc.description.peerreviewed	Peer Reviewed
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/141923/1/jep12779.pdf
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/141923/2/jep12779_am.pdf
dc.identifier.doi	10.1111/jep.12779
dc.identifier.source	Journal of Evaluation in Clinical Practice
dc.identifier.citedreference	Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine. 1999; 18: 2529 ‐ 2545.
dc.identifier.citedreference	Yarnold PR, Linden A. Novometric analysis with ordered class variables: The optimal alternative to linear regression analysis. Optimal Data Analysis. 2016; 22: 65 ‐ 73.
dc.identifier.citedreference	Linden A, Adams J, Roberts N. Strengthening the case for disease management effectiveness: unhiding the hidden bias. Journal of Evaluation in Clinical Practice. 2006; 12: 140 ‐ 147.
dc.identifier.citedreference	Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. Journal of the American Medical Association. 1982; 247: 2543 ‐ 2546.
dc.identifier.citedreference	Gönen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005; 92: 965 ‐ 970.
dc.identifier.citedreference	Grønnesby JK, Borgan Ø. A method for checking regression models in survival analysis based on the risk score. Lifetime Data Analysis. 1996; 2: 315 ‐ 328.
dc.identifier.citedreference	May S, Hosmer DW. A simplified method of calculating an overall goodness‐of‐fit test for the Cox proportional hazards model. Lifetime Data Analysis. 1998; 4: 109 ‐ 120.
dc.identifier.citedreference	Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Statistics in Medicine. 2004; 23: 723 ‐ 748.
dc.identifier.citedreference	Gerds TA, Scheike TH, Blanche P, Ozenne B. ( 2017 ). riskRegression: Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks. R package version 1.3.7. https://cran.r‐project.org/web/packages/riskRegression/index.html [downloaded on April 3, 2017].
dc.identifier.citedreference	Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of American Statistical Association. 1958; 53: 457 ‐ 481.
dc.identifier.citedreference	Cleves M, Gould W, Marchenko Y. An Introduction to Survival Analysis Using Stata (revised 3 rd edition). College Station, TX: Stata Press; 2016.
dc.identifier.citedreference	Iavindrasana J, Cohen G, Depeursinge A, Müller H, Meyer R, Geissbuhler A. Clinical data mining: a review. In IMIA Yearbook of Medical Informatics. (eds A. Geissbuhler, C. Kulikowski ). 2009; 48 ( Suppl 1 ): 121 – 133.
dc.identifier.citedreference	Linden A, Yarnold PR. Using classification tree analysis to generate propensity score weights. Journal of Evaluation in Clinical Practice. https://doi.org/10.1111/jep.12744
dc.identifier.citedreference	Linden A. Estimating the effect of regression to the mean in health management programs. Disease Management and Health Outcomes. 2007; 15 ( 1 ): 7 ‐ 12.
dc.identifier.citedreference	Linden A, Adams JL. Using propensity score‐based weighting in the evaluation of health management programme effectiveness. Journal of Evaluation in Clinical Practice. 2010; 16: 175 ‐ 179.
dc.identifier.citedreference	Linden A, Adams J. Evaluating disease management program effectiveness: an introduction to instrumental variables. Journal of Evaluation in Clinical Practice. 2006; 12: 148 ‐ 154.
dc.identifier.citedreference	Linden A, Roberts N. Disease management interventions: What’s in the black box? Disease Management. 2004; 7: 275 ‐ 291.
dc.identifier.citedreference	Linden A, Butterworth S, Roberts N. Disease management interventions II: What else is in the black box? Disease Management. 2006; 9: 73 ‐ 85.
dc.identifier.citedreference	Altman DG, Bland M. Diagnostic tests 2: predictive values. British Medical Journal. 1994; 309: 102.
dc.identifier.citedreference	Yarnold PR, Linden A. Theoretical aspects of the D statistic. Optimal Data Analysis. 2016; 5: 171 ‐ 174.
dc.identifier.citedreference	Linden A, Schweitzer SO. Applying survival analysis to health risk assessment data to predict time to first hospitalization. AHSRHP Annual Meeting. 2001; 18: 26.
dc.identifier.citedreference	D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008; 117: 743 ‐ 753.
dc.identifier.citedreference	Biuso TJ, Butterworth S, Linden A. Targeting prediabetes with lifestyle, clinical and behavioral management interventions. Disease Management. 2007; 10 ( 1 ): 6 ‐ 15.
dc.identifier.citedreference	Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to survival analysis. Disease Management. 2004; 7: 180 ‐ 190.
dc.identifier.citedreference	Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996; 15: 361 ‐ 387.
dc.identifier.citedreference	Gordon L, Olshen R. Tree‐structured survival analysis. Cancer Treatment Reports. 1985; 69: 1065 ‐ 1068.
dc.identifier.citedreference	Brown SF, Branford AJ, Moran W. On the use of artificial neural networks for the analysis of survival data. IEEE Transactions on Neural Networks. 1997; 8: 1071 ‐ 1077.
dc.identifier.citedreference	Kattan MW, Hess KR, Beck JR. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Computers and Biomedical Research. 1998; 31: 363 ‐ 373.
dc.identifier.citedreference	Evers L, Messow CM. Sparse Kernel Methods for High‐dimensional Survival Data. Bioinformatics. 2008; 24: 1632 ‐ 1638.
dc.identifier.citedreference	Khan FM, Zubek VB. Support vector regression for censored data (SVRc): a novel tool for survival analysis. Eighth International Conference on Data Mining. 2008; 863 ‐ 868.
dc.identifier.citedreference	Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical Science. 2001; 16: 199 ‐ 231.
dc.identifier.citedreference	Cox DR. Regression models and life tables (with discussion). Journal of the Royal Statistical Society: Series B. 1972; 34: 187 ‐ 220.
dc.identifier.citedreference	Yarnold PR, Soltysik RC. Theoretical distributions of optima for univariate discrimination of random data. Decision Sciences. 1991; 22: 739 ‐ 752.
dc.identifier.citedreference	Linden A, Yarnold PR. Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments. Journal of Evaluation in Clinical Practice. 2016a; 22: 875 ‐ 885.
dc.identifier.citedreference	Linden A, Adams J, Roberts N. The generalizability of disease management program results: getting from here to there. Managed Care Interface. 2004; 17 ( 7 ): 38 ‐ 45.
dc.identifier.citedreference	Linden A, Yarnold PR. Using data mining techniques to characterize participation in observational studies. Journal of Evaluation in Clinical Practice. 2016b; 22: 839 ‐ 847.
dc.identifier.citedreference	Linden A, Yarnold PR. Using machine learning to assess covariate balance in matching studies. Journal of Evaluation in Clinical Practice. 2016c; 22: 848 ‐ 854.
dc.identifier.citedreference	Yarnold PR, Soltysik RC. Optimal Data Analysis: Guidebook with Software for Windows. Washington, D.C.: APA Books; 2005.
dc.identifier.citedreference	Yarnold PR, Soltysik RC. Maximizing Predictive Accuracy. Chicago, IL: ODA Books, 2016. https://doi.org/10.13140/RG.2.1.1368.3286
dc.identifier.citedreference	Yarnold PR. Discriminating geriatric and non‐geriatric patients using functional status information: An example of classification tree analysis via UniODA. Educational and Psychological Measurement. 1996; 56: 656 ‐ 667.
dc.identifier.citedreference	Yarnold PR, Soltysik RC, Bennett CL. Predicting in‐hospital mortality of patients with AIDS‐related Pneumocystis carinii pneumonia: An example of hierarchically optimal classification tree analysis. Statistics in Medicine. 1997; 16: 1451 ‐ 1463.
dc.identifier.citedreference	Soltysik RC, Yarnold PR. Automated CTA software: Fundamental concepts and control commands. Optimal Data Analysis. 2010; 1: 144 ‐ 160.
dc.identifier.citedreference	Mahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet. 2014; 383: 999 ‐ 1008.
dc.identifier.citedreference	Dupont WD. Statistical Modeling for Biomedical Researchers. Cambridge, U.K.: Cambridge University Press; 2009.
dc.identifier.citedreference	Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to the bootstrap technique. Disease Management and Health Outcomes. 2005; 13: 159 ‐ 167.
dc.identifier.citedreference	Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994; 81: 515 ‐ 526.
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: jep12779.pdf
Size:: 577.1KB
Format:: PDF

View/Open

Name:: jep12779_am.pdf
Size:: 749.9KB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.