Show simple item record

Flexible Competing Risk Modeling for Big Data from Administrative Records and Disease Registries

dc.contributor.authorWu, Wenbo
dc.date.accessioned2022-09-06T16:03:36Z
dc.date.available2022-09-06T16:03:36Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/174287
dc.description.abstractCompeting risks are omnipresent in administrative records and disease registries. The increasing availability of data facilitates a comprehensive investigation on competing risks in various contexts, potentially leading to significant improvements in health care quality, and a deeper understanding of the etiology of deadly diseases. At the same time, the growing volume of data, high-dimensional parameter space, and complexity of modeling necessitate methodological advances beyond existing analytical frameworks. In this dissertation, we develop novel statistical and computational methods for profiling health care providers and characterizing the variation of coefficients of risk factors. These methods are specifically tailored to large-scale competing risks data. The 30-day hospital readmission rate has been widely used in profiling hospitals and dialysis facilities, among other health care providers. Current analyses typically use logistic regression to model readmission as a binary outcome without explicitly considering competing risks (e.g., death). This oversight leads to less comprehensive modeling and distorted provider evaluation. To address these drawbacks, we propose a discrete-time competing risk model, where the cause-specific readmission hazard is used to assess provider-level effects. This readmission-focused assessment utilizes the standardized readmission ratio as the associated quality measure; this ratio is not systematically affected by the rate of competing risks. To facilitate the estimation and inference of thousands of provider effects, we develop an efficient Blockwise Inversion Newton algorithm, and a stabilized robust score test that overcomes the conservative nature of the classical robust score test. An application to Medicare dialysis patients demonstrates improved profiling, model fitting, and outlier detection over existing methods. Time-varying coefficient modeling has proven useful for competing risk analysis. When examining the cause-specific etiology of breast and prostate cancers using the large-scale data from the Surveillance, Epidemiology, and End Results (SEER) Program, we encountered two challenges that existing time-varying coefficient models cannot tackle. First, these methods, dependent on expanding the original observations as repeated measurements, result in formidable time and memory consumption as the sample size escalates to over one million. Second, when binary predictors are present with near-zero variance, existing methods suffer from numerical instability and inaccurate estimation due to ill-conditioned second-order information. To address these issues, we propose a proximal Newton algorithm with a shared-memory parallelization scheme. Applications to the SEER data demonstrate that effects of tumor stage on cause-specific deaths vary substantially with the time since diagnosis. Our investigation into the impact of COVID-19 on dialysis patients suggests that effects of COVID-19 on post-discharge outcomes vary with both post-discharge and calendar time. This evidence motivates us to develop a novel varying coefficient model, where each coefficient is a bivariate function of the event time and an external covariate. The model leverages tensor-product B-splines to account for the coefficient variation in two dimensions. Difference-based anisotropic penalization is introduced to mitigate model overfitting and the wiggliness of the estimated trajectories; various cross-validation methods are considered in the determination of optimal tuning parameters. Hypothesis testing procedures are designed to examine whether the COVID-19 effect varies significantly with post-discharge time and the time since pandemic onset. Simulation experiments are conducted to evaluate the estimation accuracy, type I error rate, statistical power, and model selection procedures. Applications to Medicare dialysis patients demonstrate the real-world performance of the proposed methods. Overall, the approaches presented here offer promising avenues for analyzing high-volume competing risks data of multilevel and multidimensional structure.
dc.language.isoen_US
dc.subjectcause-specific hazards
dc.subjectcomepeting risks
dc.subjectdifference-based anisotropic penalization
dc.subjectparallel computing
dc.subjecttensor-product B-splines
dc.subjectvarying coefficient modeling
dc.titleFlexible Competing Risk Modeling for Big Data from Administrative Records and Disease Registries
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberHe, Zhi
dc.contributor.committeememberMessana, Joseph M
dc.contributor.committeememberKalbfleisch, John D
dc.contributor.committeememberKang, Jian
dc.contributor.committeememberTaylor, Jeremy Michael George
dc.subject.hlbsecondlevelPublic Health
dc.subject.hlbtoplevelHealth Sciences
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/174287/1/wenbowu_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/6018
dc.identifier.orcid0000-0002-7642-9773
dc.identifier.name-orcidWu, Wenbo; 0000-0002-7642-9773en_US
dc.working.doi10.7302/6018en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.