Unsupervised Learning Approaches for Large-scale Data
dc.contributor.author | Jin, Yanxin | |
dc.date.accessioned | 2025-01-06T18:17:21Z | |
dc.date.available | 2025-01-06T18:17:21Z | |
dc.date.issued | 2024 | |
dc.date.submitted | 2024 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/196050 | |
dc.description.abstract | Unsupervised learning approaches have gained significant prominence in the field of statistics due to their ability to analyze and interpret complex, unstructured data without the need for labeled datasets. This dissertation primarily focuses on the areas of graphical models and clustering, highlighting both their theoretical foundations and practical applications. For graphical models, inspired by functional magnetic resonance imaging (fMRI) studies, we introduce a novel method for constructing brain connectivity networks with correlated replicates and unmeasured confounders. In fMRI studies, participants are scanned over time, and their mental states, such as being awake or asleep, can vary and are often unmeasurable. To address this, we model the correlation among replicates with a one-lag vector autoregressive model and account for latent effects from unmeasured confounders using a piecewise constant approach. For clustering, traditional methods often fall short in accurately capturing complex relationships within data, particularly when variables can belong to multiple clusters and when clustering structures change over time. In fMRI datasets, regions of interest (ROIs) may contribute to multiple brain functions, and the functions performed by each ROI can vary over time. To address these challenges, we emphasize advanced techniques that account for overlapping clusters and explore the dynamics of these clusters. By applying a latent variable factor model, we estimate time-varying overlapping clusters that can automatically match across different time points. Furthermore, we extend the overlapping clustering method to non-Gaussian data by employing generalized factor models within the clustering structure. This extension broadens the applicability of the clustering methods to a wider range of data types, particularly in fields such as genomics and finance, where non-Gaussian distributions are prevalent. | |
dc.language.iso | en_US | |
dc.subject | Unsupervised Learning Approaches | |
dc.subject | graphical model | |
dc.subject | clustering | |
dc.title | Unsupervised Learning Approaches for Large-scale Data | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | |
dc.description.thesisdegreediscipline | Statistics | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Tan, Kean Ming | |
dc.contributor.committeemember | Kang, Jian | |
dc.contributor.committeemember | Xu, Gongjun | |
dc.contributor.committeemember | Zhu, Ji | |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | |
dc.subject.hlbtoplevel | Science | |
dc.contributor.affiliationumcampus | Ann Arbor | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/196050/1/yanxinj_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/24986 | |
dc.identifier.orcid | 0009-0006-3533-9567 | |
dc.identifier.name-orcid | Jin, Yanxin; 0009-0006-3533-9567 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.