Enforcing Realism and Temporal Consistency for Large-Scale Video Inpainting

Szeto, Ryan

Enforcing Realism and Temporal Consistency for Large-Scale Video Inpainting

dc.contributor.author	Szeto, Ryan
dc.date.accessioned	2021-09-24T19:11:50Z
dc.date.available	2021-09-24T19:11:50Z
dc.date.issued	2021
dc.date.submitted	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/169785
dc.description.abstract	Today, people are consuming more videos than ever before. At the same time, video manipulation has rapidly been gaining traction due to the influence of viral videos, as well as the convenience of editing software. Although video manipulation has legitimate entertainment purposes, it can also be incredibly destructive. In order to understand the positive and negative consequences of media manipulation---as well as to maintain the integrity of mass media---it is important to investigate the capabilities of video manipulation techniques. In this dissertation, we focus on the manipulation task of video inpainting, where the goal is to automatically fill in missing parts of a masked video with semantically relevant content. Inpainting results should possess high visual quality with respect to reconstruction performance, realism, and temporal consistency, i.e., they should faithfully recreate missing contents in a way that resembles the real world and exhibits minimal flickering artifacts. Two major challenges have impeded progress toward improving visual quality: semantic ambiguity and diagnostic evaluation. Semantic ambiguity exists for any masked video due to several plausible explanations of the events in the observed scene; however, prior methods have struggled with ambiguity due to their limited temporal contexts. As for diagnostic evaluation, prior work has overemphasized aggregate analysis on large datasets and underemphasized fine-grained analysis on modern inpainting failure modes; as a result, the expected behaviors of models under specific scenarios have remained poorly understood. Our work improves on both models and evaluation techniques for video inpainting, thereby providing deeper insight into how an inpainting model's design impacts the visual quality of its outputs. To advance state-of-the-art in video inpainting, we propose two novel solutions that improve visual quality by expanding the available temporal context. Our first approach, bi-TAI, intelligently integrates information from multiple frames before and after the desired sequence. It produces more realistic results than prior work, which could only consume limited contextual information. Our second approach, HyperCon, suppresses flickering artifacts from frame-wise processing by identifying and propagating consistencies found in high frame-rate space; we successfully apply it to tasks as disparate as video inpainting and style transfer. Aside from methodological improvements, we also propose two novel evaluation tools to diagnose failure modes of modern video inpainting methods. Our first such contribution is the Moving Symbols dataset, which we use to characterize the sensitivity of a state-of-the-art video prediction model to controllable appearance and motion parameters. Our second contribution is the DEVIL benchmark, which provides a dataset and a comprehensive evaluation scheme to quantify how several semantic properties of the input video and mask affect video inpainting quality. Through models that exploit temporal context---as well as evaluation paradigms that reveal fine-grained failure modes of modern inpainting methods at scale---our contributions enforce better visual quality for video inpainting on a larger scale than prior work. We enable the production of more convincing manipulated videos for data processing and social media needs; we also establish replicable fine-grained analysis techniques to cultivate future progress in the field.
dc.language.iso	en_US
dc.subject	computer vision
dc.subject	video manipulation
dc.subject	generative visual modeling
dc.title	Enforcing Realism and Temporal Consistency for Large-Scale Video Inpainting
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Corso, Jason
dc.contributor.committeemember	Lee, Honglak
dc.contributor.committeemember	Owens, Andrew Hale
dc.contributor.committeemember	Johnson, Justin Christopher
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/169785/1/szetor_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/2830
dc.identifier.orcid	0000-0002-2966-7138
dc.identifier.name-orcid	Szeto, Ryan; 0000-0002-2966-7138	en_US
dc.working.doi	10.7302/2830	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: szetor_1.pdf
Size:: 8.050MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.