Integrating Parsing and Word Alignment in Syntax-Based Machine Translation.

Fossum, Victoria L.

Integrating Parsing and Word Alignment in Syntax-Based Machine Translation.

dc.contributor.author	Fossum, Victoria L.	en_US
dc.date.accessioned	2010-08-27T15:05:35Z
dc.date.available	NO_RESTRICTION	en_US
dc.date.available	2010-08-27T15:05:35Z
dc.date.issued	2010	en_US
dc.date.submitted	2010	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/77685
dc.description.abstract	Training a state-of-the-art syntax-based statistical machine translation (MT) system to translate from a source language into a target language requires a large parallel corpus of example sentences in the source language translated into the target language by a human; a word alignment (word-to-word correspondence between each source-target sentence pair); and a parse tree (syntactic representation) of each sentence in the source language, target language, or both. From these resources, the strin-to-tree syntax-based MT system used in this thesis acquires rules governing the process of translating a source string into a target parse tree. After training, these rules are used to translate previously unseen source sentences into the target language. The parallel corpora used to train current state-of-the-art systems are too large for manual annotation; instead, word alignment and parsing must be performed automatically. There are two problems with current approaches to automatic word alignment and parsing. First, both processes introduce errors that propagate through the pipeline. Improving the accuracy of either process can therefore improve translation quality. Second, the two processes are typically performed independently. Since each process produces constraints that can be used to guide the other, we can improve the accuracy of both processes by integrating them more closely. Word alignment and parsing jointly determine the set of translation rules acquired by a system during training, so it is desirable to optimize them both in order to produce the best translation rules possible. In this thesis, we address these two problems as follows. First, we recombine the output of multiple parsers, improving parse and translation quality. Second, we use features of the word alignment to correct parse errors. Third, we use features of the parse trees to correct word alignment errors, improving alignment and translation quality. Fourth, we integrate word alignment and parsing by producing n-best lists of candidates for each process, and discriminatively reranking (word alignment/parse tree) pairs to optimize the quality of the extracted translation rules. Our results demonstrate that integrating word alignment and parsing improves the accuracy of each process, and in some cases improves translation quality relative to a state-of-the-art syntax-based MT system.	en_US
dc.format.extent	692095 bytes
dc.format.extent	1373 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.language.iso	en_US	en_US
dc.subject	Syntax-based Statistical Machine Translation	en_US
dc.subject	Syntax-based Machine Translation	en_US
dc.subject	MT	en_US
dc.subject	Word Alignment	en_US
dc.subject	Parsing	en_US
dc.title	Integrating Parsing and Word Alignment in Syntax-Based Machine Translation.	en_US
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering	en_US
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies	en_US
dc.contributor.committeemember	Abney, Steven P.	en_US
dc.contributor.committeemember	Knight, Kevin	en_US
dc.contributor.committeemember	Pollack, Martha E.	en_US
dc.contributor.committeemember	Radev, Dragomir Radkov	en_US
dc.subject.hlbsecondlevel	Computer Science	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/77685/1/vfossum_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: vfossum_1.pdf
Size:: 675.8KB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.