Learning Deep Controllable and Structured Representations for Image Synthesis, Structure Prediction and Beyond
dc.contributor.author | Yan, Xinchen | |
dc.date.accessioned | 2020-01-27T16:22:21Z | |
dc.date.available | NO_RESTRICTION | |
dc.date.available | 2020-01-27T16:22:21Z | |
dc.date.issued | 2019 | |
dc.date.submitted | 2019 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/153334 | |
dc.description.abstract | Generative modeling is a frontier research topic in machine learning and AI. Despite the recent success in image synthesis, developing a general form of deep generative models on complex data (e.g., multi-modal and structured data) also directly applicable to real-world tasks remains a challenging open problem. The major challenges include high-dimensional representation space, many entangled factors of variation, and lack of existing deep modules for this data generation process. In the thesis, I introduce the concept of controllable representations, a set of factors as intermediate representations, for deep generative modeling. In the context of attribute-to-image synthesis, we consider controllable units as (1) semantic factors described by the visual attributes and (2) other related factors not included in the input attributes for image synthesis (e.g., such as pose and background color). To facilitate ecient learning of such controllable units for attribute-to-image synthesis, I explore novel deep structured modules that can be trained in auto-encoding style that synthesizes images from controllable units. In addition, I demonstrate the representation power of such design in conditional generation (e.g., control partial set of units while keeping the rest unchanged) as well as other related applications including image completion via analysis-by-synthesis optimization. For the rest of the thesis, I investigate and propose several variations to learn controllable and structured representations in several related problems including (1) image manipulation with semantic structures (e.g., object bounding boxes), (2) human motion prediction with transformation-based representations, and (3) 3D shape prediction from single-view with geometry-aware modules. The case studies in the thesis demonstrate not only the representation power but also a common aspect that the representations can be learned in an unsupervised or weakly-supervised manner. In the end, I discuss several future directions in learning deep controllable and structured modules for other multi-modal and structured data as well as the application to adversarial learning. | |
dc.language.iso | en_US | |
dc.subject | Deep Representation Learning | |
dc.subject | Conditional Generative Models | |
dc.subject | Image Synthesis | |
dc.subject | Structure Prediction | |
dc.title | Learning Deep Controllable and Structured Representations for Image Synthesis, Structure Prediction and Beyond | |
dc.type | Thesis | |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Computer Science & Engineering | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Lee, Honglak | |
dc.contributor.committeemember | Corso, Jason | |
dc.contributor.committeemember | Deng, Jia | |
dc.contributor.committeemember | Kuipers, Benjamin | |
dc.subject.hlbsecondlevel | Computer Science | |
dc.subject.hlbtoplevel | Engineering | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/153334/1/xcyan_1.pdf | |
dc.identifier.orcid | 0000-0003-1019-5537 | |
dc.identifier.name-orcid | Yan, Xinchen; 0000-0003-1019-5537 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.