Show simple item record

Transferability of Polygenic Risk Scores Across Ancestral Populations and Data Integration Methods for Improving Prediction on Small Sample Studies

dc.contributor.authorOrozco del Pino, Pedro
dc.date.accessioned2023-01-30T16:11:16Z
dc.date.available2023-01-30T16:11:16Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/175640
dc.description.abstractProject 1 of this work quantifies the role of different LD structures on a PRS constructed using European genome-wide significant variants. We estimate the change across populations of a European-derived tag variant's predictive ability using extensive simulations. We assume a genetic model with the same effect size of the true underlying risk variant across the five populations of 1000 genomes project. Under this scenario, if the most significant variant is not the risk variant, then its predictive ability depends on the LD between the index and true causal variants. As a transferability measure, we calculate the expected proportion of times for this event to happen across the genome and how much loss of predictive ability we expect in Admixed American, South Asian, East Asian, and African populations. If we are not finding the causal variant, then how much predictive ability do we expect to lose in different populations? Chapter 2 estimates that the reduction in the predictive ability of the most significant variant is modest in Admixed American, South Asian, and East Asian ancestral populations. However, in African populations, the loss of predictive ability can be substantial, reaching up to a 50% reduction in 22% of the time. This chapter presents evidence that suggests that LD score can be informative on the probability of tagging the true risk variant in a region. Generally, when considering an external population as a valuable source of information, it is assumed that any inference that relies on that population is biased at the gain of a reduction in variance. In the context of the transferability of PRS in Project 1, we showed that different LD could cause significant bias in the prediction of PRS. Nevertheless, we also showed that this bias could be slight even in genetically distant populations. Project 2 proposes a method that dynamically adapts to LD and effect size differences across populations to increase the predictive ability in one of them, called the target population. The method works with GWAS summary statistics from two populations and returns an estimate of the multivariate effect size for a region for a target population. We use the Regression with Summary Statistics to infer the multivariate effect size in each population. To account for unadjusted gene-gene and gene-environment interactions, we use a Power Prior to account for the heterogeneity across populations. We simulated GWAS data from European and African populations and showed that our method improves prediction in several measures when the genetic correlation is positive between populations. Project 3 extends the Data Enriched Linear Regression to generalized linear regression link functions. We show that the objective function of DELR is equivalent to the objective function of penalized regression. However, penalized regression does not differentiate between target and external data sources, and thus it requires a tai to find the best penalty. We develop a Cross-Validation algorithm to find the penalty factor that would optimize prediction in the target population. Furthermore, we show through simulations that DEGLR improves prediction when bias is small and converges to ignoring the external study as bias increases. We present a real data analysis to show that our method can increase the prediction ability of a data set when using observational data.
dc.language.isoen_US
dc.subjectData integration
dc.subjectPolygenic risk scores
dc.subjectStatistical genetics
dc.titleTransferability of Polygenic Risk Scores Across Ancestral Populations and Data Integration Methods for Improving Prediction on Small Sample Studies
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberZoellner, Sebastian K
dc.contributor.committeememberSmith, Jennifer Ann
dc.contributor.committeememberBoonstra, Phil
dc.contributor.committeememberMorrison, Jean
dc.contributor.committeememberMukherjee, Bhramar
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/175640/1/porozco_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/6854
dc.identifier.orcid0000-0002-7373-3702
dc.identifier.name-orcidOrozco del Pino, Pedro; 0000-0002-7373-3702en_US
dc.working.doi10.7302/6854en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.