Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.
dc.contributor.author | Muthukrishnan, Pradeep | en_US |
dc.date.accessioned | 2012-01-26T20:07:35Z | |
dc.date.available | NO_RESTRICTION | en_US |
dc.date.available | 2012-01-26T20:07:35Z | |
dc.date.issued | 2011 | en_US |
dc.date.submitted | 2011 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/89824 | |
dc.description.abstract | Relational data refers to data that contains explicit relations among objects. Nowadays, relational data are universal and have a broad appeal in many different application domains. The problem of estimating similarity between objects is a core requirement for many standard Machine Learning (ML), Natural Language Processing (NLP) and Information Retrieval (IR) problems such as clustering, classiffication, word sense disambiguation, etc. Traditional machine learning approaches represent the data using simple, concise representations such as feature vectors. While this works very well for homogeneous data, i.e, data with a single feature type such as text, it does not exploit the availability of dfferent feature types fully. For example, scientic publications have text, citations, authorship information, venue information. Each of the features can be used for estimating similarity. Representing such objects has been a key issue in efficient mining (Getoor and Taskar, 2007). In this thesis, we propose natural representations for relational data using multiple, connected layers of graphs; one for each feature type. Also, we propose novel algorithms for estimating similarity using multiple heterogeneous features. Also, we present novel algorithms for tasks like topic detection and music recommendation using the estimated similarity measure. We demonstrate superior performance of the proposed algorithms (root mean squared error of 24.81 on the Yahoo! KDD Music recommendation data set and classiffication accuracy of 88% on the ACL Anthology Network data set) over many of the state of the art algorithms, such as Latent Semantic Analysis (LSA), Multiple Kernel Learning (MKL) and spectral clustering and baselines on large, standard data sets. | en_US |
dc.language.iso | en_US | en_US |
dc.subject | Unsupervised Algorithms | en_US |
dc.subject | Graph-based Learning | en_US |
dc.subject | Similarity Learning | en_US |
dc.subject | Machine Learning | en_US |
dc.subject | Multiple Heterogeneous Features | en_US |
dc.title | Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features. | en_US |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Computer Science & Engineering | en_US |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | en_US |
dc.contributor.committeemember | Radev, Dragomir Radkov | en_US |
dc.contributor.committeemember | Abney, Steven P. | en_US |
dc.contributor.committeemember | Lee, Honglak | en_US |
dc.contributor.committeemember | Mei, Qiaozhu | en_US |
dc.contributor.committeemember | Syed, Zeeshan | en_US |
dc.subject.hlbsecondlevel | Computer Science | en_US |
dc.subject.hlbtoplevel | Engineering | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/89824/1/mpradeep_1.pdf | |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.