Statistical and Computational Methods for Polygenic Score-Based Phenotype Prediction
dc.contributor.author | Xu, Chang | |
dc.date.accessioned | 2025-01-06T18:17:14Z | |
dc.date.available | 2025-01-06T18:17:14Z | |
dc.date.issued | 2024 | |
dc.date.submitted | 2024 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/196046 | |
dc.description.abstract | Recent advancement in genome-wide association studies (GWASs) has generated a wealth of new information to facilitate the scientific discoveries in the underlying biology of phenotypes and translation toward new therapeutics. The increasing accessibility of these variant-trait association datasets provides abundant resources for the genetic prediction of complex traits and diseases using polygenic score (PGS). In the meantime, larger sample sizes, extended phenotype variability, and increased population heterogeneity in the Biobank-scale datasets have raised substantial challenges for PGS construction and downstream analyses. In this dissertation, we propose several statistical and computational methods to address these challenges, facilitating accurate and scalable PGS applications in the large-scale datasets. In Chapter 2, we develop a method, mtPGS (multi-trait assisted PGS), that constructs accurate PGS for a target trait of interest through leveraging multiple traits relevant to the target trait. mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGS. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We illustrate the benefits of mtPGS with comprehensive simulations. We also apply mtPGS to analyze 25 traits in the UK Biobank, where mtPGS achieves up to 53% accuracy gain compared to the state-of-the-art PGS methods. In Chapter 3, we develop a method PredInterval (PGS-based phenotype prediction interval) to quantify phenotype prediction uncertainty through the construction of well-calibrated prediction intervals. PredInterval is non-parametric by nature and extracts information based on quantiles of phenotypic residuals through cross-validations, thus achieving well-calibrated coverage of true phenotypic values across a range of settings and traits with distinct genetic architecture. In addition, the PredInterval framework is general and can be paired with any PGS method or pre-computed SNP effect size estimates from publicly available resources. We illustrate the benefits of PredInterval through comprehensive simulations and applications to 12 traits in UK Biobank. Our results demonstrate that PredInterval achieves well-calibrated prediction coverage across settings and traits and offers a principled approach to identify high-risk individuals using prediction intervals. In Chapter 4, we develop a method CLAPS (conditional local ancestry based polygenic scores) to construct accurate PGSs for ancestrally admixed individuals. CLAPS incorporates local ancestry information to account for the complex mosaic structure of ancestral segments from different populations in admixed genomes, thus achieving accurate PGSs that are specifically tailored to the ancestral background of admixed individuals. We illustrate the benefits of CLAPS through comprehensive simulations. We also apply CLAPS to analyze seven complex traits, where CLAPS achieves up to 147% accuracy gain compared to existing PGS methods. The accurate predictive performance achieved by CLAPS allows us to extend the utility of PGSs to admixed individuals and reduce disparities in PGS application. | |
dc.language.iso | en_US | |
dc.subject | Polygenic scores | |
dc.subject | Genome-wide association studies | |
dc.subject | Statistical methods | |
dc.title | Statistical and Computational Methods for Polygenic Score-Based Phenotype Prediction | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | |
dc.description.thesisdegreediscipline | Biostatistics | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Ganesh, Santhi K | |
dc.contributor.committeemember | Zhou, Xiang | |
dc.contributor.committeemember | Tsoi, Lam | |
dc.contributor.committeemember | Fritsche, Lars | |
dc.contributor.committeemember | Wen, Xiaoquan William | |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | |
dc.subject.hlbtoplevel | Science | |
dc.contributor.affiliationumcampus | Ann Arbor | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/196046/1/xuchang_1.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/24982 | |
dc.identifier.orcid | 0000-0003-2831-1604 | |
dc.identifier.name-orcid | Xu, Chang; 0000-0003-2831-1604 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.