Grounding Language Learning in Vision for Artificial Intelligence and Brain Research

Zhang, Yizhen

Grounding Language Learning in Vision for Artificial Intelligence and Brain Research

dc.contributor.author	Zhang, Yizhen
dc.date.accessioned	2021-09-24T19:15:17Z
dc.date.available	2023-09-01
dc.date.available	2021-09-24T19:15:17Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/169841
dc.description.abstract	Most models for natural language processing learn words merely from texts. However, humans learn language by referring to real-world experience and knowledge. My research aims to ground language learning in visual perception, taking one step closer to making machines learn language like humans. To achieve this goal, I have designed a two-stream model with deep neural networks. One stream extracts image features. The other stream extracts language features. The two streams merge to connect image and language features in a joint representation space. By contrastive learning, I have first trained the model to align images with their captions, and then refined the model to retrieve visual objects with language queries and infer their visual relations. After training, the model’s language stream is a stand-alone system capable of embedding words in a visually grounded semantic space. This space manifests principal dimensions explainable with human intuition and neurobiological knowledge. The visually grounded language model also enables compositional language understanding based on visual knowledge and multimodal image search with queries based on image-text combination. This model can also explain human brain activity observed with functional magnetic resonance imaging during natural language comprehension. It sheds new light on how the brain stores concepts and organizes concepts by their semantic relations and attributes.
dc.language.iso	en_US
dc.subject	Machine learning and artificial intelligence
dc.subject	Language learning
dc.subject	Visual grounding
dc.subject	Multimodal learning
dc.subject	Neuroscience
dc.subject	Grounded cognition
dc.title	Grounding Language Learning in Vision for Artificial Intelligence and Brain Research
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Electrical and Computer Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Liu, Zhongming
dc.contributor.committeemember	Brang, David Joseph
dc.contributor.committeemember	Fessler, Jeffrey A
dc.contributor.committeemember	Owens, Andrew
dc.subject.hlbsecondlevel	Electrical Engineering
dc.subject.hlbtoplevel	Engineering
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/169841/1/zhyz_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/2886
dc.identifier.orcid	0000-0002-2836-2666
dc.identifier.name-orcid	Zhang, Yizhen; 0000-0002-2836-2666	en_US
dc.working.doi	10.7302/2886	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: zhyz_1.pdf
Size:: 38.75MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.