Show simple item record

Learning Single-Image 3D from the Internet

dc.contributor.authorChen, Weifeng
dc.date.accessioned2020-10-04T23:37:20Z
dc.date.availableNO_RESTRICTION
dc.date.available2020-10-04T23:37:20Z
dc.date.issued2020
dc.date.submitted2020
dc.identifier.urihttps://hdl.handle.net/2027.42/163248
dc.description.abstractSingle-image 3D refers to the task of recovering 3D properties such as depth and surface normals from an RGB image. It is one of the fundamental problems in Computer Vision, and its progress has the potential to bring major advancement to various other fields in vision. Although significant progress has been made in this field, the current best systems still struggle to perform well on arbitrary images "in the wild", i.e. images that depict all kinds of contents and scenes. One major obstacle is the lack of diverse training data. This dissertation makes contributions towards solving the data issue by extracting 3D supervision from the Internet, and proposing novel algorithms to learn from Internet 3D to significantly advance single-view 3D perception. First, we have constructed "Depth in the Wild" (DIW), a depth dataset consisting of 0.5 million diverse images. Each image is manually annotated with randomly sampled points and their relative depth. After benchmarking state-of-the-art single-view 3D systems on DIW, we found that even though current arts perform well on existing datasets, they perform poorly on images in the wild. We then propose a novel algorithm that learns to estimate depth using annotations of relative depth. Compared to the state of the art, our algorithm is simpler and performs better. Experiments show that our algorithm, combined with existing RGB-D data and our new relative depth annotations, significantly improves single-image depth perception in the wild. Second, we have constructed "Surface Normals in the Wild" (SNOW), a dataset with 60K Internet images, each manually annotated with the surface normal for one randomly sampled point. We explore advancing depth perception in the wild using surface normal as supervision. To train networks with surface normal annotations, we propose two novel losses, one that emphasizes depth accuracy, and another one that emphasizes surface normal accuracy. Experiments show that our approach significantly improves the quality of depth estimation in the wild. Third, we have constructed "Open Annotations of Single-Image Surfaces" (OASIS), a large-scale dataset for single-image 3D in the wild. It consists of pixel-wise reconstructions of 3D surfaces for 140K randomly sampled Internet images. Six types of 3D properties are manually annotated for each image: occlusion boundary (depth discontinuity), fold boundary (normal discontinuity), surface normal, relative depth, relative normal (orthogonal, parallel, or neither), and planarity (planar or not). The rich annotations of human 3D perception in OASIS open up new research opportunities on a spectrum of single-image 3D tasks -- they provide in-the-wild ground truths either for the first time, or at a much larger scale than prior work. By benchmarking leading deep learning models on a variety of 3D tasks, we observe a large room for performance improvement, pointing to ample research opportunities for designing new learning algorithms for single-image 3D. Finally, we have constructed "YouTube3D", a large-scale dataset with relative depth annotations for 795K images, spanning 121K videos. YouTube3D is collected fully automatically with a pipeline based on Structure-from-Motion (SfM). The key component is a novel Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. It successfully eliminates erroneous reconstructions to guarantee data quality. Experiments demonstrate that YouTube3D is useful in advancing single-view depth estimation in the wild.
dc.language.isoen_US
dc.subject3D Reconstruction
dc.titleLearning Single-Image 3D from the Internet
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberDeng, Jia
dc.contributor.committeememberFouhey, David Ford
dc.contributor.committeememberMei, Qiaozhu
dc.contributor.committeememberJohnson, Justin Christopher
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/163248/1/wfchen_1.pdfen_US
dc.identifier.orcid0000-0003-3352-0064
dc.identifier.name-orcidChen, Weifeng; 0000-0003-3352-0064en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.