Statistical and Computational Methods for Polygenic Score-Based Phenotype Prediction

Xu, Chang

Statistical and Computational Methods for Polygenic Score-Based Phenotype Prediction

dc.contributor.author	Xu, Chang
dc.date.accessioned	2025-01-06T18:17:14Z
dc.date.available	2025-01-06T18:17:14Z
dc.date.issued	2024
dc.date.submitted	2024
dc.identifier.uri	https://hdl.handle.net/2027.42/196046
dc.description.abstract	Recent advancement in genome-wide association studies (GWASs) has generated a wealth of new information to facilitate the scientific discoveries in the underlying biology of phenotypes and translation toward new therapeutics. The increasing accessibility of these variant-trait association datasets provides abundant resources for the genetic prediction of complex traits and diseases using polygenic score (PGS). In the meantime, larger sample sizes, extended phenotype variability, and increased population heterogeneity in the Biobank-scale datasets have raised substantial challenges for PGS construction and downstream analyses. In this dissertation, we propose several statistical and computational methods to address these challenges, facilitating accurate and scalable PGS applications in the large-scale datasets. In Chapter 2, we develop a method, mtPGS (multi-trait assisted PGS), that constructs accurate PGS for a target trait of interest through leveraging multiple traits relevant to the target trait. mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGS. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We illustrate the benefits of mtPGS with comprehensive simulations. We also apply mtPGS to analyze 25 traits in the UK Biobank, where mtPGS achieves up to 53% accuracy gain compared to the state-of-the-art PGS methods. In Chapter 3, we develop a method PredInterval (PGS-based phenotype prediction interval) to quantify phenotype prediction uncertainty through the construction of well-calibrated prediction intervals. PredInterval is non-parametric by nature and extracts information based on quantiles of phenotypic residuals through cross-validations, thus achieving well-calibrated coverage of true phenotypic values across a range of settings and traits with distinct genetic architecture. In addition, the PredInterval framework is general and can be paired with any PGS method or pre-computed SNP effect size estimates from publicly available resources. We illustrate the benefits of PredInterval through comprehensive simulations and applications to 12 traits in UK Biobank. Our results demonstrate that PredInterval achieves well-calibrated prediction coverage across settings and traits and offers a principled approach to identify high-risk individuals using prediction intervals. In Chapter 4, we develop a method CLAPS (conditional local ancestry based polygenic scores) to construct accurate PGSs for ancestrally admixed individuals. CLAPS incorporates local ancestry information to account for the complex mosaic structure of ancestral segments from different populations in admixed genomes, thus achieving accurate PGSs that are specifically tailored to the ancestral background of admixed individuals. We illustrate the benefits of CLAPS through comprehensive simulations. We also apply CLAPS to analyze seven complex traits, where CLAPS achieves up to 147% accuracy gain compared to existing PGS methods. The accurate predictive performance achieved by CLAPS allows us to extend the utility of PGSs to admixed individuals and reduce disparities in PGS application.
dc.language.iso	en_US
dc.subject	Polygenic scores
dc.subject	Genome-wide association studies
dc.subject	Statistical methods
dc.title	Statistical and Computational Methods for Polygenic Score-Based Phenotype Prediction
dc.type	Thesis
dc.description.thesisdegreename	PhD
dc.description.thesisdegreediscipline	Biostatistics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Ganesh, Santhi K
dc.contributor.committeemember	Zhou, Xiang
dc.contributor.committeemember	Tsoi, Lam
dc.contributor.committeemember	Fritsche, Lars
dc.contributor.committeemember	Wen, Xiaoquan William
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Science
dc.contributor.affiliationumcampus	Ann Arbor
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/196046/1/xuchang_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/24982
dc.identifier.orcid	0000-0003-2831-1604
dc.identifier.name-orcid	Xu, Chang; 0000-0003-2831-1604	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: xuchang_1.pdf
Size:: 10.20MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.