Show simple item record

Personalized and Distributed Data Analytics in Heterogeneous Environments

dc.contributor.authorShi, Naichen
dc.contributor.advisorAl Kontar, Raed
dc.date.accessioned2025-05-15T21:08:00Z
dc.date.issued2025-04-26
dc.date.submitted2025-03-11
dc.identifier.urihttps://hdl.handle.net/2027.42/197384
dc.description.abstractIt is a common wisdom in statistics that more data leads to better models. However, as data are increasingly collected from distributed sources, such as different devices or users, their inherent statistical heterogeneity creates challenges for effective knowledge integration. Conventional population-based models often rely on i.i.d. assumptions, which often neglect variations across data sources. When data distributions differ, understanding their structure and integrating information for predictive modeling becomes non-trivial. This dissertation tackles these challenges through personalized modeling. Instead of fitting one single model for data from all sources, personalized data analytics fits data source-specific models while still encouraging knowledge transfer across sources. This dissertation proposes personalized descriptive and predictive analytics that attempt to answer three key questions: (Q1) How can we develop descriptive analytics to extract shared and unique patterns from heterogeneous data? (Q2) How can we design robust statistical methods that remain reliable in the presence of outliers? (Q3) How can we leverage insights from covariate and concept shifts to construct effective personalized predictive models? To answer these questions, the dissertation proposes three methodological contributions. Chapter 2 proposes Personalized PCA (PerPCA), a novel approach that distinguishes shared and unique features across data sources using mutually orthogonal global and local principal components. Chapter 3 presents Triple Component Matrix Factorization (TCMF) to recover global, local, and noisy components in multi-source data corrupted by outlier noise. Both PerPCA and TCMF are equipped with theoretical guarantees on statistical errors. Chapter 4 develops a predictive modeling framework called Personalized Federated Learning via Domain Adaptation (PFL-DA) that addresses both covariate and concept shifts across distributed sources. The proposed methods provide scalable and interpretable solutions for extracting insights, integrating knowledge, and improving predictive performance in distributed and heterogeneous environments. These findings have broad applications across various domains, including image and video processing, topic modeling, and manufacturing.en_US
dc.language.isoen_USen_US
dc.subjectPersonalizationen_US
dc.subjectdistributed optimizationen_US
dc.subjectdescriptive analyticsen_US
dc.subjectpredictive modelsen_US
dc.titlePersonalized and Distributed Data Analytics in Heterogeneous Environmentsen_US
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineCollege of Engineering & Computer Scienceen_US
dc.description.thesisdegreegrantorUniversity of Michigan-Dearbornen_US
dc.contributor.committeememberFattahi, Salar
dc.contributor.committeememberJin, Judy
dc.contributor.committeememberChao, Xiuli
dc.contributor.committeememberQu, Qing
dc.identifier.uniqnamenaichensen_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/197384/1/Dissertation_premium.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/25809
dc.description.mappingf8405f0d-6e0a-4b63-83ba-7887953c9151en_US
dc.identifier.orcid0009-0003-1700-9159en_US
dc.description.filedescriptionDescription of Dissertation_premium.pdf : Dissertation
dc.identifier.name-orcidShi, Naichen; 0009-0003-1700-9159en_US
dc.working.doi10.7302/25809en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.