Show simple item record

Semantic Feature Extraction Using Multi-Sense Embeddings and Lexical Chains

dc.contributor.authorRuas, Terry L.
dc.contributor.advisorGrosky, William
dc.date.accessioned2019-06-26T14:13:22Z
dc.date.availableNO_RESTRICTIONen_US
dc.date.available2019-06-26T14:13:22Z
dc.date.issued2019-08-23
dc.date.submitted2019-06-13
dc.identifier.urihttps://hdl.handle.net/2027.42/149647
dc.description.abstractThe relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words individually. Natural language understanding has seen an increasing effort in the formation of techniques that try to produce non-trivial features, in the last few years, especially after robust word embeddings models became prominent, when they proved themselves able to capture and represent semantic relationships from massive amounts of data. These new dense vector representations indeed leverage the baseline in natural language processing, but they still fall short in dealing with intrinsic issues in linguistics, such as polysemy and homonymy. Systems that make use of natural language at its core, can be affected by a weak semantic representation of human language, resulting in inaccurate outcomes based on poor decisions. In this subject, word sense disambiguation and lexical chains have been exploring alternatives to alleviate several problems in linguistics, such as semantic representation, definitions, differentiation, polysemy, and homonymy. However, little effort is seen in combining recent advances in token embeddings (e.g. words, documents) with word sense disambiguation and lexical chains. To collaborate in building a bridge between these areas, this work proposes a collection of algorithms to extract semantic features from large corpora as its main contributions, named MSSA, MSSA-D, MSSA-NR, FLLC II, and FXLC II. The MSSA techniques focus on disambiguating and annotating each word by its specific sense, considering the semantic effects of its context. The lexical chains group derive the semantic relations between consecutive words in a document in a dynamic and pre-defined manner. These original techniques' target is to uncover the implicit semantic links between words using their lexical structure, incorporating multi-sense embeddings, word sense disambiguation, lexical chains, and lexical databases. A few natural language problems are selected to validate the contributions of this work, in which our techniques outperform state-of-the-art systems. All the proposed algorithms can be used separately as independent components or combined in one single system to improve the semantic representation of words, sentences, and documents. Additionally, they can also work in a recurrent form, refining even more their results.en_US
dc.language.isoen_USen_US
dc.subjectSynsetsen_US
dc.subjectWordNeten_US
dc.subjectMSSAen_US
dc.subjectNatural language processingen_US
dc.subjectSemanticsen_US
dc.subjectLexical chainsen_US
dc.subject.otherComputer and Information Scienceen_US
dc.titleSemantic Feature Extraction Using Multi-Sense Embeddings and Lexical Chainsen_US
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineCollege of Engineering & Computer Scienceen_US
dc.description.thesisdegreegrantorUniversity of Michigan-Dearbornen_US
dc.contributor.committeememberAbouelenien, Mohamed
dc.contributor.committeememberAgrawal, Rajeev
dc.contributor.committeememberKessentini, Marouane
dc.contributor.committeememberOrtiz, Luis
dc.contributor.committeememberZakarian, Armen
dc.identifier.uniqname7512 2669en_US
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/149647/1/Terry Ruas Final Dissertation.pdf
dc.identifier.orcid0000-0002-9440-780Xen_US
dc.description.filedescriptionDescription of Terry Ruas Final Dissertation.pdf : Dissertation
dc.identifier.name-orcidRuas, Terry; 0000-0002-9440-780Xen_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.