Personalized and Distributed Data Analytics in Heterogeneous Environments
dc.contributor.author | Shi, Naichen | |
dc.contributor.advisor | Al Kontar, Raed | |
dc.date.accessioned | 2025-05-15T21:08:00Z | |
dc.date.issued | 2025-04-26 | |
dc.date.submitted | 2025-03-11 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/197384 | |
dc.description.abstract | It is a common wisdom in statistics that more data leads to better models. However, as data are increasingly collected from distributed sources, such as different devices or users, their inherent statistical heterogeneity creates challenges for effective knowledge integration. Conventional population-based models often rely on i.i.d. assumptions, which often neglect variations across data sources. When data distributions differ, understanding their structure and integrating information for predictive modeling becomes non-trivial. This dissertation tackles these challenges through personalized modeling. Instead of fitting one single model for data from all sources, personalized data analytics fits data source-specific models while still encouraging knowledge transfer across sources. This dissertation proposes personalized descriptive and predictive analytics that attempt to answer three key questions: (Q1) How can we develop descriptive analytics to extract shared and unique patterns from heterogeneous data? (Q2) How can we design robust statistical methods that remain reliable in the presence of outliers? (Q3) How can we leverage insights from covariate and concept shifts to construct effective personalized predictive models? To answer these questions, the dissertation proposes three methodological contributions. Chapter 2 proposes Personalized PCA (PerPCA), a novel approach that distinguishes shared and unique features across data sources using mutually orthogonal global and local principal components. Chapter 3 presents Triple Component Matrix Factorization (TCMF) to recover global, local, and noisy components in multi-source data corrupted by outlier noise. Both PerPCA and TCMF are equipped with theoretical guarantees on statistical errors. Chapter 4 develops a predictive modeling framework called Personalized Federated Learning via Domain Adaptation (PFL-DA) that addresses both covariate and concept shifts across distributed sources. The proposed methods provide scalable and interpretable solutions for extracting insights, integrating knowledge, and improving predictive performance in distributed and heterogeneous environments. These findings have broad applications across various domains, including image and video processing, topic modeling, and manufacturing. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Personalization | en_US |
dc.subject | distributed optimization | en_US |
dc.subject | descriptive analytics | en_US |
dc.subject | predictive models | en_US |
dc.title | Personalized and Distributed Data Analytics in Heterogeneous Environments | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | College of Engineering & Computer Science | en_US |
dc.description.thesisdegreegrantor | University of Michigan-Dearborn | en_US |
dc.contributor.committeemember | Fattahi, Salar | |
dc.contributor.committeemember | Jin, Judy | |
dc.contributor.committeemember | Chao, Xiuli | |
dc.contributor.committeemember | Qu, Qing | |
dc.identifier.uniqname | naichens | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/197384/1/Dissertation_premium.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/25809 | |
dc.description.mapping | f8405f0d-6e0a-4b63-83ba-7887953c9151 | en_US |
dc.identifier.orcid | 0009-0003-1700-9159 | en_US |
dc.description.filedescription | Description of Dissertation_premium.pdf : Dissertation | |
dc.identifier.name-orcid | Shi, Naichen; 0009-0003-1700-9159 | en_US |
dc.working.doi | 10.7302/25809 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.