Developing and Application of Statistical Algorithms for High-Demensional Biological Data Analysis

Senbabaoglu, Yasin

Developing and Application of Statistical Algorithms for High-Demensional Biological Data Analysis

dc.contributor.author	Senbabaoglu, Yasin	en_US
dc.date.accessioned	2012-10-12T15:25:40Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2012-10-12T15:25:40Z
dc.date.issued	2012	en_US
dc.date.submitted	2012	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/94038
dc.description.abstract	Various high-throughput technologies have fueled advances in biomedical research in the last decade. Two typical examples are gene expression and genomic hybridization microarrays that quantify RNA and DNA levels respectively. High-dimensional data sets generated by these technologies presented novel opportunities to discover relationships not only among interrogating probes (i.e genes) but also among interrogated specimens (i.e samples). At the same time, however, the necessity to model the variability within and between different high-throughput platforms has created novel statistical challenges. In this thesis, I address the opportunities and challenges with three algorithms. First, I present DynBoost, a new method to infer gene-gene dependence relationships and nonlinear dynamics in gene regulatory networks. DynBoost is a flexible boosting algorithm that shares features from L2-boosting and randomization-based algorithms to perform the tasks of parameter learning and network inference. The performance of the proposed algorithm was evaluated on a number of benchmark data sets from the DREAM3 challenge and the results strongly indicated that it outperformed existing approaches. Second, I revisit consensus clustering (CC) and some other clustering methods in the context of unsupervised sample subtype discovery. I show that many unsupervised partitioning methods are able to divide homogeneous data into pre-specified numbers of clusters, and CC is able to show apparent stability of such chance partitioning of random data. I conclude that CC is a powerful tool for minimizing false negatives in the presence of genuine structure, but can lead to false positives in the exploratory phase of many studies if the implementation and inference are not carried out with caution in line with particular prudent practices. Lastly, I present MPCBS, a new method that integrates DNA copy number analysis across different platforms by pooling statistical evidence during segmentation. I show by comparing the integrated analysis of Affymetrix and Illumina SNP array data with Agilent and fosmid clone end-sequencing results on 8 HapMap samples that MPCBS achieves improved spatial resolution, detection power, and provides a natural consensus across platforms.	en_US
dc.language.iso	en_US	en_US
dc.subject	Consensus Clustering	en_US
dc.subject	Unsupervised Class Discovery	en_US
dc.subject	Reverse-engineering Gene Regulatory Networks	en_US
dc.subject	DNA Copy Number Estimation	en_US
dc.subject	Operator-valued Kernels	en_US
dc.subject	TCGA Glioblastoma Multiforme	en_US
dc.title	Developing and Application of Statistical Algorithms for High-Demensional Biological Data Analysis	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Bioinformatics	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Li, Jun	en_US
dc.contributor.committeemember	Michailidis, George	en_US
dc.contributor.committeemember	Burns Jr., Daniel M.	en_US
dc.contributor.committeemember	Sartor, Maureen A.	en_US
dc.contributor.committeemember	D'alche-Buc, Florence	en_US
dc.subject.hlbsecondlevel	Computer Science	en_US
dc.subject.hlbsecondlevel	Molecular, Cellular and Developmental Biology	en_US
dc.subject.hlbsecondlevel	Science (General)	en_US
dc.subject.hlbsecondlevel	Statistics and Numeric Data	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.subject.hlbtoplevel	Health Sciences	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/94038/1/yasinsen_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: yasinsen_1.pdf
Size:: 17.38MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.