Affordance-grounded Robot Perception and Manipulation in Adversarial, Translucent, and Cluttered Environments
Chen, Xiaotong
2023
Abstract
Robots in the future need to work in natural scenarios and finish a variety of tasks without much supervision from humans. To achieve this goal, we want the robots to perform perception and action robustly and adaptively in unstructured environments. For example, robots are expected to correctly perceive objects in unseen cases, such as dark environments, heavy clutter or transparent materials. Besides, they should learn skills that are transferable across novel objects in categories rather than fixed on known instances. In this dissertation, we focus on the problem of perceiving and manipulating various objects in complex adversarial environments. Specifically, we explore on three aspects including robustness to adversarial environments, synergistic perception and action, and adaptable data-driven perception pipelines for customized settings. First, we explore the possibility to achieve robustness for object pose estimation algorithms against environmental changes, like object occlusions and lighting changes. We contribute a two- stage approach GRIP that combines both the discriminative power of deep convolutional neural networks (CNNs) and robustness in probabilistic generative inference. Our results indicate that GRIP has better accuracy through comparison with end-to-end pose estimation baselines, and efficacy in a grocery packing task in the dark scene. Second, we focus on how to generalize object representation to category-level with grounded affordance for task execution. We propose the Affordance Coordinate Frame (ACF) representation that enables direct connection between perception and executable action. Along with that, an object part category-level scene perception pipeline is contributed to estimate ACFs in cluttered environments on novel objects. Our pipeline outperforms state-of-the-art methods for object detection, as well as category-level pose estimation for object parts. We further demonstrate the applicability of ACF to robot manipulation tasks like grasping, pouring and stirring. Third, we contribute an annotation pipeline that enables large-scale dataset creation and bench- marking on transparent objects. The proposed ProgressLabeller pipeline has a multi-view annotation interface that allows fast and accurate pose annotation on RGB-D video streams. ProgressLabeller is proved to generate more accurate annotations in object pose estimation and grasping experiments. Using ProgressLabeller, we contributed ClearPose as the first large-scale RGB-D transparent object dataset with various adversarial conditions such as lighting changes, object clutters. ClearPose is made to support benchmarking of data-driven approaches on depth completion, object pose estimation and robotic manipulation. Then, we built an object pose estimation based manipulation framework, TransNet, for daily transparent objects. The system aims to generalize the pose esti- mation to unseen novel objects defined in several categories like wine cups and bottles. We finally demonstrate the efficacy of the system with robotic pick-and-place and pouring tasks, paving the way for more complex manipulations such as table-setting and drink serving.Deep Blue DOI
Subjects
visual perception robot manipulation
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.