Show simple item record

Outlier Detection for Mixed Model with Application to RNA-Seq Data

dc.contributor.authorLiu, Tzu-Ying
dc.date.accessioned2019-02-07T17:55:09Z
dc.date.availableNO_RESTRICTION
dc.date.available2019-02-07T17:55:09Z
dc.date.issued2018
dc.date.submitted2018
dc.identifier.urihttps://hdl.handle.net/2027.42/147613
dc.description.abstractExtracting messenger RNA (mRNA) molecules using oligo-dT probes targeting on the Poly(A) tail is common in RNA-sequencing (RNA-seq) experiments. This approach, however, is limited when the specimen is profoundly degraded or formalin-fixed such that either the majority of mRNAs have lost their Poly(A) tails or the oligo-dT probes do not anneal with the formalin-altered adenines. For this problem, a new protocol called capture RNA sequencing was developed using probes for target sequences, which gives unbiased estimates of RNA abundance even when the specimens are degraded. However, despite the effectiveness of capture sequencing, mRNA purification by the traditional Poly(A) protocol still underlies most reference libraries. A bridging mechanism that makes the two types of measurements comparable is needed for data integration and efficient use of information. In the first project, we developed an optimization algorithm that was later applied to outlier detection in a linear mixed model for data integration. In particular, we minimized the sum of truncated convex functions, which is often encountered in models with L0 penalty. The solution is exact in one-dimensional and two-dimensional spaces. For higher-dimensional problems, we applied the algorithm in a coordinate descent fashion. Although the global optimality is compromised, this approach generates local solutions with much higher efficiency. In the second project, we investigated the differences between Poly(A) libraries and capture sequencing libraries. We showed that without conversion, directly merging the two types of measurements lead to biases in subsequent analyses. A practical solution was to use a linear mixed model to predict one type of measurements based on the other. The predicted values based on this approach have high correlations, low errors and high efficiency compared with those based on the fixed model. Moreover, the procedure eliminates false positive findings and biases introduced by the technology differences between the two measurements. In the third project, we noted outlying observations and outlying random effects when fitting the mixed model. As they lead to the discovery of dysfunctional probes and batch effects, we developed an algorithm that screened for the outliers and provided a robust estimation. Specifically, we modified the mean-shift model with variable selection using L0 penalties, which was first introduced by Gannaz (2007), McCann and Welsch (2007) and She and Owen (2012). By incorporating the optimization method proposed in the first project, the algorithm became scalable and yielded exact solutions for low-dimensional problems. In particular, under the assumption of normality, there existed analytic expressions for the penalty parameters. In simulation studies, we showed that the proposed algorithm attained reliable outlier detection, delivered robust estimation and achieved efficient computation.
dc.language.isoen_US
dc.subjectOutlier detection
dc.subjectMixed model
dc.titleOutlier Detection for Mixed Model with Application to RNA-Seq Data
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineBiostatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberJiang, Hui
dc.contributor.committeememberSen, Srijan
dc.contributor.committeememberKalbfleisch, John D
dc.contributor.committeememberSong, Peter Xuekun
dc.contributor.committeememberZhu, Ji
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/147613/1/ltzuying_1.pdf
dc.identifier.orcid0000-0001-7533-4134
dc.identifier.name-orcidLiu, Tzu-Ying; 0000-0001-7533-4134en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.