Show simple item record

Detection of Rare Events in Complex Sequencing Data

dc.contributor.authorWang, Yifan
dc.date.accessioned2019-10-01T18:23:56Z
dc.date.availableNO_RESTRICTION
dc.date.available2019-10-01T18:23:56Z
dc.date.issued2019
dc.date.submitted2019
dc.identifier.urihttps://hdl.handle.net/2027.42/151446
dc.description.abstractThe dramatic development of sequencing technologies in the past several decades has made studies of low frequency events in the human cell accessible for the first time. These rare events can occur in the genome and transcriptome and include genetic variation within small populations of somatic cells as well as single molecule level RNA fusion events. Somatic variations are the mutations that occur during cell division leading to mutations only in a portion of the cells within an individual or tissue. Such somatic variation has been established as a causal feature in various types of cancers. However, somatic mutations in normal human cells and tissues have yet to be well studied. We hence developed our own analysis pipeline to discover somatic SNVs and applied it to human postmortem brain tissues. We then performed amplicon validation in order to compare and validate the somatic SNVs identified from our pipeline. Based on our experience with somatic SNV detection in non-tumor tissues, we have developed a best practice guide to help other researchers. We conclude that it remains difficult to identify low frequency somatic SNVs from bulk sequencing data, however our approach successfully identified a conservative but accurate set of somatic SNVs for future studies. We next shifted our focus to rare transcriptomic events and sought to identify single molecule level RNA fusion events between U6, a critical component of spliceosome, and other RNAs from five human cell lines. We developed a novel pipeline to target these specific fusion events at the RNA level and differentiate them from integrated genomic chimeras. Using this pipeline, we identified 31 individual U6/L1 fusion events that had strong support as RNA fusion candidates. Together with the biochemical and genetics experiments, we were able to support a plausible mechanism for the formation of U6/L1 pseudogenes in the human genome. Single cell RNA sequencing further increased the sensitivity to identify rare events at RNA level. However, isoform quantification in single cell sequencing data is not well developed. We then developed Seekmer to perform a better and faster RNA isoform quantification using both bulk and single cell RNA sequencing data. This approach fills the gap between alignment-based methods and the alignment-free methods in performance and run time aspects. With the imputation module of Seekmer to collect information from other single cells with similar expression profiles, we were able to significantly improve the performance of isoform quantification from both simulated data and spike-in data. Current sequencing technologies contain artifacts that we are unable to fully exclude using computational methods. However, we have demonstrated that with cautious filtering and collecting extra information from other methods or other cells, we can utilize current methods to study the characteristics and possible functions of rare events in the human genome and transcriptome.
dc.language.isoen_US
dc.subjectsomatic and low frequency event
dc.subjectgenome and transcriptome
dc.subjectmethods
dc.titleDetection of Rare Events in Complex Sequencing Data
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineHuman Genetics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberMills, Ryan Edward
dc.contributor.committeememberSartor, Maureen
dc.contributor.committeememberHammoud, Saher Sue
dc.contributor.committeememberLi, Jun
dc.contributor.committeememberMoran, John V
dc.subject.hlbsecondlevelGenetics
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/151446/1/yifwang_1.pdf
dc.identifier.orcid0000-0001-8056-9755
dc.identifier.name-orcidWang, Yifan; 0000-0001-8056-9755en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.