Statistical Methods for Complex Data: Hospital Evaluation and Causal Inference
Xu, Tongbo
2025
Abstract
Real-world data often have complex structures that exceed the limitations of traditional cross-sectional study frameworks, which introduces challenges for statistical analysis. This dissertation addresses these challenges by developing novel statistical methodologies that accommodate various data structures, including complex relationships between outcome and covariates, clustered data (e.g., patients nested within hospitals), and data integration settings. Specifically, Chapter 2 introduces a novel random forest approach for hospital evaluation in clustered data. Chapters 3 and 4 focus on innovative causal inference frameworks and methods designed for clustered data and data integration scenarios. In Chapter 2, we introduce Fixed Effect Clustered Random Forest (FCRF) to model the relationship between outcome and the covariates in the clustered data, with the aim of applications such as hospital evaluation. This approach incorporates the hierarchical structure of fixed-effect clustered models and the flexibility of random forests through an iterative algorithm. To address potential overfitting and bias in random forests, we integrate bias correction techniques and introduce Fixed Effect Clustered Random Forest with Bias Correction (FCRFBC). Simulation studies confirm the effectiveness of these methods. We further illustrate their effectiveness by analyzing data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) to evaluate hospitals in managing patients' hemoglobin levels with standardized medications, demonstrating advantages over conventional approaches. In Chapter 3, we develop a framework for making causal inferences on treatment effects in clustered data, which are frequently encountered in large observational clinical studies, where patients are nested within hospitals. While causal inference methods have been well-established for cross-sectional data, the causal estimands and assumptions in clustered data are explored less. Within a new potential outcome framework designed for clustered data, We define a series of causal estimands and assumptions required for valid inference. We then propose new cluster-level weighted propensity score weighting methods that consistently estimate these treatment effects, demonstrated both theoretically and through simulations. We also apply these methods to the BMC2 dataset for empirical illustration. In Chapter 4, we focus on estimating the causal effects for an internal study of interest, while summary information from multiple external studies can be used to potentially improve the efficiency of estimation, which is a typical data integration scenario. We introduce Penalized Empirical Augmented Inverse Propensity Weighting (PEAIPW), a penalized empirical likelihood method that employs the group lasso technique to select and incorporate external information useful for internal causal effect estimation to improve the efficiency of the causal effect estimation. Through both theoretical analysis and simulations, we investigate the properties and performance of this method, including its selection consistency for external information, double-robustness property, and potential for efficiency gains.Deep Blue DOI
Subjects
causal inference random forest data integration clustered data hospital evaluation
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.