Show simple item record

Enforcing Realism and Temporal Consistency for Large-Scale Video Inpainting

dc.contributor.authorSzeto, Ryan
dc.date.accessioned2021-09-24T19:11:50Z
dc.date.available2021-09-24T19:11:50Z
dc.date.issued2021
dc.date.submitted2021
dc.identifier.urihttps://hdl.handle.net/2027.42/169785
dc.description.abstractToday, people are consuming more videos than ever before. At the same time, video manipulation has rapidly been gaining traction due to the influence of viral videos, as well as the convenience of editing software. Although video manipulation has legitimate entertainment purposes, it can also be incredibly destructive. In order to understand the positive and negative consequences of media manipulation---as well as to maintain the integrity of mass media---it is important to investigate the capabilities of video manipulation techniques. In this dissertation, we focus on the manipulation task of video inpainting, where the goal is to automatically fill in missing parts of a masked video with semantically relevant content. Inpainting results should possess high visual quality with respect to reconstruction performance, realism, and temporal consistency, i.e., they should faithfully recreate missing contents in a way that resembles the real world and exhibits minimal flickering artifacts. Two major challenges have impeded progress toward improving visual quality: semantic ambiguity and diagnostic evaluation. Semantic ambiguity exists for any masked video due to several plausible explanations of the events in the observed scene; however, prior methods have struggled with ambiguity due to their limited temporal contexts. As for diagnostic evaluation, prior work has overemphasized aggregate analysis on large datasets and underemphasized fine-grained analysis on modern inpainting failure modes; as a result, the expected behaviors of models under specific scenarios have remained poorly understood. Our work improves on both models and evaluation techniques for video inpainting, thereby providing deeper insight into how an inpainting model's design impacts the visual quality of its outputs. To advance state-of-the-art in video inpainting, we propose two novel solutions that improve visual quality by expanding the available temporal context. Our first approach, bi-TAI, intelligently integrates information from multiple frames before and after the desired sequence. It produces more realistic results than prior work, which could only consume limited contextual information. Our second approach, HyperCon, suppresses flickering artifacts from frame-wise processing by identifying and propagating consistencies found in high frame-rate space; we successfully apply it to tasks as disparate as video inpainting and style transfer. Aside from methodological improvements, we also propose two novel evaluation tools to diagnose failure modes of modern video inpainting methods. Our first such contribution is the Moving Symbols dataset, which we use to characterize the sensitivity of a state-of-the-art video prediction model to controllable appearance and motion parameters. Our second contribution is the DEVIL benchmark, which provides a dataset and a comprehensive evaluation scheme to quantify how several semantic properties of the input video and mask affect video inpainting quality. Through models that exploit temporal context---as well as evaluation paradigms that reveal fine-grained failure modes of modern inpainting methods at scale---our contributions enforce better visual quality for video inpainting on a larger scale than prior work. We enable the production of more convincing manipulated videos for data processing and social media needs; we also establish replicable fine-grained analysis techniques to cultivate future progress in the field.
dc.language.isoen_US
dc.subjectcomputer vision
dc.subjectvideo manipulation
dc.subjectgenerative visual modeling
dc.titleEnforcing Realism and Temporal Consistency for Large-Scale Video Inpainting
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberCorso, Jason
dc.contributor.committeememberLee, Honglak
dc.contributor.committeememberOwens, Andrew Hale
dc.contributor.committeememberJohnson, Justin Christopher
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/169785/1/szetor_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/2830
dc.identifier.orcid0000-0002-2966-7138
dc.identifier.name-orcidSzeto, Ryan; 0000-0002-2966-7138en_US
dc.working.doi10.7302/2830en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.