Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.

Muthukrishnan, Pradeep

Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.

dc.contributor.author	Muthukrishnan, Pradeep	en_US
dc.date.accessioned	2012-01-26T20:07:35Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2012-01-26T20:07:35Z
dc.date.issued	2011	en_US
dc.date.submitted	2011	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/89824
dc.description.abstract	Relational data refers to data that contains explicit relations among objects. Nowadays, relational data are universal and have a broad appeal in many different application domains. The problem of estimating similarity between objects is a core requirement for many standard Machine Learning (ML), Natural Language Processing (NLP) and Information Retrieval (IR) problems such as clustering, classiffication, word sense disambiguation, etc. Traditional machine learning approaches represent the data using simple, concise representations such as feature vectors. While this works very well for homogeneous data, i.e, data with a single feature type such as text, it does not exploit the availability of dfferent feature types fully. For example, scientic publications have text, citations, authorship information, venue information. Each of the features can be used for estimating similarity. Representing such objects has been a key issue in efficient mining (Getoor and Taskar, 2007). In this thesis, we propose natural representations for relational data using multiple, connected layers of graphs; one for each feature type. Also, we propose novel algorithms for estimating similarity using multiple heterogeneous features. Also, we present novel algorithms for tasks like topic detection and music recommendation using the estimated similarity measure. We demonstrate superior performance of the proposed algorithms (root mean squared error of 24.81 on the Yahoo! KDD Music recommendation data set and classiffication accuracy of 88% on the ACL Anthology Network data set) over many of the state of the art algorithms, such as Latent Semantic Analysis (LSA), Multiple Kernel Learning (MKL) and spectral clustering and baselines on large, standard data sets.	en_US
dc.language.iso	en_US	en_US
dc.subject	Unsupervised Algorithms	en_US
dc.subject	Graph-based Learning	en_US
dc.subject	Similarity Learning	en_US
dc.subject	Machine Learning	en_US
dc.subject	Multiple Heterogeneous Features	en_US
dc.title	Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Radev, Dragomir Radkov	en_US
dc.contributor.committeemember	Abney, Steven P.	en_US
dc.contributor.committeemember	Lee, Honglak	en_US
dc.contributor.committeemember	Mei, Qiaozhu	en_US
dc.contributor.committeemember	Syed, Zeeshan	en_US
dc.subject.hlbsecondlevel	Computer Science	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/89824/1/mpradeep_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: mpradeep_1.pdf
Size:: 1.994MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.