An Investigation of Gene Duplications in Canines
Nguyen, Anthony
2025
Abstract
Segmental duplications are non-mobile element genomic sequences that are present multiple times in a genome at least 1 kb in size and sharing 90% sequence identity. Copy number variants (CNVs) occur when segmental duplications differ in presence and number between individuals or populations. CNVs have influenced genome evolution, although they have been and remain challenging to detect. The advent of long-read sequencing has allowed for resolution of duplications, but in canines, duplications and CNVs remain largely unexplored. This dissertation presents three different analyses of duplications in modern canine genomic datasets. First, in a collection of 2000 canines mapped to the German Shepherd reference genome, we explored paralog-specific CNVs, searching for duplications unique to the sample when compared to the reference genome. We characterized known variants, including the dog-specific amylase expansion that allows for starch digestion, and discovered new variants, such as a wolf-specific high-copy-number duplication on chromosome 26. We also applied our CNV analysis to the pharmacogenetic sphere, searching for high-priority drug targets with varying copy numbers to help set the stage for future functional prioritization. In the second chapter, we used two different methods, read depth analysis and genome assembly self-alignment, to study nine long-read canine genomes for the presence of duplications, and found that 17.27 to 109.78 Mb by self-alignment and 41.17-166.47 Mb by read-depth were duplicated. While we find that most modern assemblies were relatively poor at detecting duplications, we confirmed that a better sequencing technology led to greater rates of concordance and greater duplication counts, as evidenced by the Greenland Wolf genome, which had over 50% concordance between methods, compared to the sub-40% exhibited by other genome assemblies. We found the presence of short, high-recurrence duplications, and determined them to be potential retrocopies. We applied a BLAT-based method to find retrocopies, and found 1,263-1,316 retrocopies across all assemblies, with over 94% displaying at least one signature hallmark of retrotransposition. About 92% of all retrocopies appear to be shared between dog assemblies versus the Greenland Wolf assembly. By estimating a time to lineage divergence, we estimated that retrocopy insertions occur on average in 1 of 3,514 births. With the knowledge that the Greenland Wolf genome was a better sequenced genome and had better duplication concordance, and to explore the effects of choice of reference genome, we remapped the Dog10K dataset to the Greenland Wolf genome using a novel GPU-based method to address concerns of reference bias. We observed the normalization of evolutionary distance between the Dog10K samples and the Greenland Wolf, seeing a particular shift in breed dogs moving from a range of 1.7-2.8 million differences from the German Shepherd genome to a standardized 3.8 million differences from the Greenland Wolf genome. Additionally, we demonstrated that the clustering of German Shepherd and related breeds separately from other modern breeds in the original Dog10K dataset was recapitulated when data was aligned to the Greenland Wolf, indicating that this pattern is not a reference artifact. This dissertation illustrates the impact of new canine genome assemblies on duplication detection and it addresses the lack of genome-wide analysis of segmental duplications, CNVs, and retrogenes since the release of these new genomes. This sets the stage for future analysis of the evolution and biological importance of gene duplications in canines.Deep Blue DOI
Subjects
duplications retrocopies canine comparative genomics
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.