Show simple item record

Neural Language Models for Data-Driven Programming Support

dc.contributor.authorRong, Xin
dc.date.accessioned2017-10-05T20:26:39Z
dc.date.availableNO_RESTRICTION
dc.date.available2017-10-05T20:26:39Z
dc.date.issued2017
dc.date.submitted
dc.identifier.urihttps://hdl.handle.net/2027.42/138509
dc.description.abstractProgramming can be hard to learn and master. Search engines and social Q&A websites offer tremendous help to programmers, but great expertise (e.g., “Google-fu”) is required to efficiently use these resources and successfully solve complex problems. An integrated system that can recognize a programmer’s tasks and provide contextualized solutions is thus desirable, and ideally programmers can interact with the system using natural input channels, in a way similar to how they communicate with a human expert. To enable such an integrated system, neural language models constitute a promising solution. These models encode programming language in the same high-dimensional space with data of other modalities, and can be trained in an end-to-end fashion. By leveraging the massive data about programming knowledge that are available online, including social Q&A websites, tutorials, blogs, and open-source code repositories, we can train neural language models to support a variety of user intentions, including the long-tail ones. We propose three studies related to using neural language models to solve programming problems in practice. First, we introduce CodeMend, an intelligent programming assistant that supports interactive programming. The system employs a bimodal embedding model to encode programming language and natural language in the same vector space. We demonstrate that this model can effectively understand the code context and associate it with user input to suggest relevant code modifications. We also develop novel user interface to render search results in a way that makes the problem solving process more efficient. Second, we propose a deep learning pipeline that converts data visualization images to source code. The pipeline is built by using computer vision techniques and recurrent neural networks, and it supports the user to get source code generated based on visual examples. We develop novel techniques that augment existing a limited set of training samples via code parameterization and random variation. We also propose strategies that can adapt the general-purpose neural language model to fit the task of predicting source code. Third, we introduce LAMVI, a set of visualization tools for diagnosing issues with neural language models. It tracks the ranks of individual candidate outputs for user-selected queries, and supports the exploration of the corresponding hidden-layer activations. It also tracks influential training instances, and provides guidance for taking actions for tuning the model. The system is evaluated on simulated datasets facilitates the user to efficiently adapt mature neural language models to new datasets or new tasks. Collectively, these three components form an integral solution to computer-assisted problem solving for programmers driven by big data, and may have impact on various different domains, including natural language processing, machine learning, software engineering, and interactive data visualization.
dc.language.isoen_US
dc.subjectneural language models
dc.titleNeural Language Models for Data-Driven Programming Support
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineInformation
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberAdar, Eytan
dc.contributor.committeememberNarayanasamy, Satish
dc.contributor.committeememberLasecki, Walter
dc.contributor.committeememberOney, Steve
dc.contributor.committeememberRadev, Dragomir Radkov
dc.subject.hlbsecondlevelInformation and Library Science
dc.subject.hlbtoplevelSocial Sciences
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/138509/1/ronxin_1.pdf
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.