Show simple item record

Development of Copy Number Variation Detection Algorithms and Their Application to Genome Diversity Studies

dc.contributor.authorShen, Feichen
dc.date.accessioned2019-07-08T19:48:53Z
dc.date.available2019-07-08T19:48:53Z
dc.date.issued2019
dc.date.submitted2019
dc.identifier.urihttps://hdl.handle.net/2027.42/150064
dc.description.abstractCopy number variation (CNV) is an important class of variation that contributes to genome evolution and disease. CNVs that become fixed in a species give rise to segmental duplications; and already duplicated sequence is prone to subsequent gain and loss leading to additional copy-number variation. Multiple methods exist for defining CNV based on high-throughput sequencing data, including analysis of mapped read-depth. However, accurately assessing CNV can be computationally costly and multi-mapping-based approaches may not specifically distinguish among paralogs or gene families. We present two rapid CNV estimation algorithms, QuicK-mer and fastCN, for second generation short sequencing data. The QuicK-mer program is a paralog sensitive CNV detector which relies on enumerating unique k-mers from a pre-tabulated reference genome. The latest version of QuicK-mer 2.0 utilizes a newly constructed k-mer counting core based on the DJB hash function and permits multithreaded CNV counting of a large input file. As a result, QuicK-mer 2.0 can produce copy-number profiles form a 10X coverage mammalian genome in less than 5 minutes. The second CNV estimator, fastCN, is based on sequence mapping and has tolerance for mismatches. The pipeline is built around the mrsFAST read mapper and does not use additional time compared to the mrsFAST mapping process. We validated the accuracy of both approaches with existing data on human paralogous regions from the 1000 Genomes Project. We also employed QuicK-mer to perform an assessment of copy number variation on chimpanzee and human Y chromosomes. CNV has also been associated with phenotypic changes that occur also during animal domestication. Large scale CNVs were observed previously in cattle, pigs and chicken domestication. We assessed the role of CNV in dog domestication though a comparison of semi-feral village dogs and a global collection of wolfs. Our CNV selection scan uncovered many previously confirmed duplications and deletions but did not identify fixed variants that may have contributed to the initial domestication process. During this selection study, we uncovered CNVs that are errors in the existing canine reference assembly. We attempted to the complement the current CanFam3.1 reference with the de novo genome assembly of a Great Dane breed dog named Zoey. A 50x PacBio long reads sequencing with median insert size of 7.8kbp was conducted. The resulting assembly shows significant improvement with 20x increased continuity and two third reductions of unplaced contigs. The Zoey Great Dane assembly closes 80% of CanFam3.1 gaps where high GC content was the major culprit in the original assembly. Using unique k-mers assigned in these closed gaps, QuicK-mer was able to find many of these regions are fixed across dogs while small proportion shows variability.
dc.language.isoen_US
dc.subjectCNV
dc.subjectkmer
dc.subjectcanine evolution
dc.subjectgenome assembly
dc.titleDevelopment of Copy Number Variation Detection Algorithms and Their Application to Genome Diversity Studies
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineHuman Genetics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberKidd, Jeffrey
dc.contributor.committeememberShedden, Kerby A
dc.contributor.committeememberBurke, David T
dc.contributor.committeememberKitzman, Jacob
dc.contributor.committeememberMueller, Jacob L
dc.contributor.committeememberWilson, Thomas E
dc.subject.hlbsecondlevelGenetics
dc.subject.hlbtoplevelHealth Sciences
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/150064/1/feichens_1.pdfen
dc.identifier.orcid0000-0001-9689-0375
dc.description.filedescriptionDescription of feichens_1.pdf : Restricted to UM users only.
dc.identifier.name-orcidShen, Feichen; 0000-0001-9689-0375en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.