Show simple item record

Doc2Vec on similar document suggestion for pharmaceutical collections

dc.contributor.authorZhu, Hongting
dc.contributor.authorPothukuchi, Ashwin
dc.contributor.authorGuo, Joel
dc.date.accessioned2021-04-29T19:13:43Z
dc.date.available2021-04-29T19:13:43Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/2027.42/167258
dc.identifier.urihttps://youtu.be/Q5ARFlxFZNI
dc.description.abstractProQuest Dialog is a powerful search engine on pharmaceutical and biomedical papers. But the document retrieval algorithm is getting outdated in current days. In this paper, we find a way to improve the similar document suggestions on Dialog interface. The NLP model Doc2Vec PV-DBOW embeds and clusters the similar documents together, and both evaluation methods return a better score for the baseline TF-IDF method, with textual coherence being 36.6% higher on bigram count vectors, 8.3% higher on trigram count vectors, and grant-to-article linkage being 6.1% higher on herfindahl-hirschman index.
dc.subjectMachine Learning
dc.subjectNatural Language Processing
dc.subjectClinical NLP
dc.titleDoc2Vec on similar document suggestion for pharmaceutical collections
dc.typeTechnical Report
dc.subject.hlbtoplevelEngineering
dc.contributor.affiliationumElectrical Engineering and Computer Science
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/167258/1/Capstone_Final_Report_Hongting_Zhu.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/167258/2/Capstone_Presentation_Hongting_Zhu.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/933
dc.working.doi10.7302/933en
dc.owningcollnameHonors Program, The College of Engineering


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.