Developing and Application of Statistical Algorithms for High-Demensional Biological Data Analysis
dc.contributor.author | Senbabaoglu, Yasin | en_US |
dc.date.accessioned | 2012-10-12T15:25:40Z | |
dc.date.available | NO_RESTRICTION | en_US |
dc.date.available | 2012-10-12T15:25:40Z | |
dc.date.issued | 2012 | en_US |
dc.date.submitted | 2012 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/94038 | |
dc.description.abstract | Various high-throughput technologies have fueled advances in biomedical research in the last decade. Two typical examples are gene expression and genomic hybridization microarrays that quantify RNA and DNA levels respectively. High-dimensional data sets generated by these technologies presented novel opportunities to discover relationships not only among interrogating probes (i.e genes) but also among interrogated specimens (i.e samples). At the same time, however, the necessity to model the variability within and between different high-throughput platforms has created novel statistical challenges. In this thesis, I address the opportunities and challenges with three algorithms. First, I present DynBoost, a new method to infer gene-gene dependence relationships and nonlinear dynamics in gene regulatory networks. DynBoost is a flexible boosting algorithm that shares features from L2-boosting and randomization-based algorithms to perform the tasks of parameter learning and network inference. The performance of the proposed algorithm was evaluated on a number of benchmark data sets from the DREAM3 challenge and the results strongly indicated that it outperformed existing approaches. Second, I revisit consensus clustering (CC) and some other clustering methods in the context of unsupervised sample subtype discovery. I show that many unsupervised partitioning methods are able to divide homogeneous data into pre-specified numbers of clusters, and CC is able to show apparent stability of such chance partitioning of random data. I conclude that CC is a powerful tool for minimizing false negatives in the presence of genuine structure, but can lead to false positives in the exploratory phase of many studies if the implementation and inference are not carried out with caution in line with particular prudent practices. Lastly, I present MPCBS, a new method that integrates DNA copy number analysis across different platforms by pooling statistical evidence during segmentation. I show by comparing the integrated analysis of Affymetrix and Illumina SNP array data with Agilent and fosmid clone end-sequencing results on 8 HapMap samples that MPCBS achieves improved spatial resolution, detection power, and provides a natural consensus across platforms. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Consensus Clustering | en_US |
dc.subject | Unsupervised Class Discovery | en_US |
dc.subject | Reverse-engineering Gene Regulatory Networks | en_US |
dc.subject | DNA Copy Number Estimation | en_US |
dc.subject | Operator-valued Kernels | en_US |
dc.subject | TCGA Glioblastoma Multiforme | en_US |
dc.title | Developing and Application of Statistical Algorithms for High-Demensional Biological Data Analysis | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Bioinformatics | en_US |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | en_US |
dc.contributor.committeemember | Li, Jun | en_US |
dc.contributor.committeemember | Michailidis, George | en_US |
dc.contributor.committeemember | Burns Jr., Daniel M. | en_US |
dc.contributor.committeemember | Sartor, Maureen A. | en_US |
dc.contributor.committeemember | D'alche-Buc, Florence | en_US |
dc.subject.hlbsecondlevel | Computer Science | en_US |
dc.subject.hlbsecondlevel | Molecular, Cellular and Developmental Biology | en_US |
dc.subject.hlbsecondlevel | Science (General) | en_US |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | en_US |
dc.subject.hlbtoplevel | Engineering | en_US |
dc.subject.hlbtoplevel | Health Sciences | en_US |
dc.subject.hlbtoplevel | Science | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/94038/1/yasinsen_1.pdf | |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.