Privacy-Enhanced Learning and Inference With Distributed Clinical Datasets
Hu, Mengtong
2024
Abstract
The integration of data collected from multiple clinical centers can enhance the statistical power of analysis and the generalizability of findings. It is known that merging subject-level data from individual centers for centralized analyses is often logistically non-trivial and may be restricted by data privacy concerns and lawful protection. In practice, this data management task can be rather time-consuming and thus possibly delays scientific discovery. Such a challenge is amplified when data at some centers are of low quantity, leading to unreliable meta-analyses, because associated local estimates may not be properly generated by such data sets. To overcome this issue, we propose several new solutions in that we can perform efficient statistical analyses of multi-center data while protecting patient-level information privacy. Chapter II develops a collaborative average treatment effect inference framework for a multicenter clinical trial to study basal insulin's effect on reducing post-transplantation diabetes mellitus. Our proposed method relies on sequential processing of summary data rather than merging patient-level data. The proposed sequential analytic method delivers an efficient inverse propensity weighting (IPW) estimation of the marginal differential treatment effects between two treatment arms. The statistical efficiency is achieved as the proposed estimation enjoys the convergence rate at the order of the cumulative sample size of all centers involved in the trial. We show theoretically and numerically that this new distributed inference approach has little loss of statistical power compared to the centralized method based on the entire data. Chapter III extends the distributed inference framework to estimate hazard ratios in the Cox proportional hazards model with no need for centralized data access and risk-set construction through maximum likelihood estimation, instead of partial likelihood estimation. The proposed method nonparametrically estimates the baseline hazard function and avoids aggregating individual-level data on the formation of risk sets. Of note, risk-set construction has an ample risk of leaking individual patient information which is unacceptable. The proposed approach of distributed likelihood estimation only shares summary statistics with no reliance on risk sets. We establish large-sample properties of the proposed method and illustrate its performance through simulation experiments and a real-world data example of kidney transplantation in the Organ Procurement and Transplantation Network to understand risk factors associated with 5-year death-censored graft failure for patients who underwent kidney transplants in the USA. Chapter IV concerns a collaborative framework for the Accelerated Failure Time (AFT) model, a popular alternative to the Cox model for the analysis of time-to-failure data. The AFT model directly accounts for the effects of the covariates on times to failure, rather than on hazard functions, thus the assumption of proportional hazards is not required. Consequently, it provides more flexibility in data aggregation than the Cox model. Our proposed distributed inference method focuses on a class of parametric AFT models with Weibull, log-normal, and log-logistic distributions for time-to-event outcomes, in which a distributed likelihood ratio test is established under the generalized gamma distribution to assess the goodness-of-fit across different candidate parametric models. We present large-sample properties for the proposed method and illustrate their performance through simulation experiments and a real-world data example on kidney transplantation.Deep Blue DOI
Subjects
Distributed Inference Federated Learning Data Privacy Collaborative Inference Survival Analysis Causal Inference
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.