Multi-Policy Decision Making for Reliable Navigation in Dynamic Uncertain Environments

Mehta, Dhanvin

Multi-Policy Decision Making for Reliable Navigation in Dynamic Uncertain Environments

dc.contributor.author	Mehta, Dhanvin
dc.date.accessioned	2019-07-08T19:46:32Z
dc.date.available	NO_RESTRICTION
dc.date.available	2019-07-08T19:46:32Z
dc.date.issued	2019
dc.date.submitted	2019
dc.identifier.uri	https://hdl.handle.net/2027.42/150017
dc.description.abstract	Navigating everyday social environments, in the presence of pedestrians and other dynamic obstacles remains one of the key challenges preventing mobile robots from leaving carefully designed spaces and entering our daily lives. The complex and tightly-coupled interactions between these agents make the environment dynamic and unpredictable, posing a formidable problem for robot motion planning. Trajectory planning methods, supported by models of typical human behavior and personal space, often produce reasonable behavior. However, they do not account for the future closed-loop interactions of other agents with the trajectory being constructed. As a consequence, the trajectories are unable to anticipate cooperative interactions (such as a human yielding), or adverse interactions (such as the robot blocking the way). Ideally, the robot must account for coupled agent-agent interactions while reasoning about possible future outcomes, and then take actions to advance towards its navigational goal without inconveniencing nearby pedestrians. Multi-Policy Decision Making (MPDM) is a novel framework for autonomous navigation in dynamic, uncertain environments where the robot's trajectory is not explicitly planned, but instead, the robot dynamically switches between a set of candidate closed-loop policies, allowing it to adapt to different situations encountered in such environments. The candidate policies are evaluated based on short-term (five-second) forward simulations of samples drawn from the estimated distribution of the agents' current states. These forward simulations and thereby the cost function, capture agent-agent interactions as well as agent-robot interactions which depend on the ego-policy being evaluated. In this thesis, we propose MPDM as a new method for navigation amongst pedestrians by dynamically switching from amongst a library of closed-loop policies. Due to real-time constraints, the robot's emergent behavior is directly affected by the quality of policy evaluation. Approximating how good a policy is based on only a few forward roll-outs is difficult, especially with the large space of possible pedestrian configurations and the sensitivity of the forward simulation to the sampled configurations. Traditional methods based on Monte-Carlo sampling often missed likely, high-cost outcomes, resulting in an over-optimistic evaluation of a policy and unreliable emergent behavior. By re-formulating policy evaluation as an optimization problem and enabling the quick discovery of potentially dangerous outcomes, we make MPDM more reliable and risk-aware. Even with the increased reliability, a major limitation is that MPDM requires the system designer to provide a set of carefully hand-crafted policies as it can evaluate only a few policies reliably in real-time. We radically enhance the expressivity of MPDM by allowing policies to have continuous-valued parameters, while simultaneously satisfying real-time constraints by quickly discovering promising policy parameters through a novel iterative gradient-based algorithm. Overall, we reformulate the traditional motion planning problem and paint it in a very different light --- as a bilevel optimization problem where the robot repeatedly discovers likely high-cost outcomes and adapts its policy parameters avoid these outcomes. We demonstrate significant performance benefits through extensive experiments in simulation as well as on a physical robot platform operating in a semi-crowded environment.
dc.language.iso	en_US
dc.subject	autonomous navigation
dc.subject	motion planning
dc.subject	planning, prediction and control
dc.subject	mobile robotics
dc.subject	social navigation
dc.title	Multi-Policy Decision Making for Reliable Navigation in Dynamic Uncertain Environments
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Olson, Edwin
dc.contributor.committeemember	Berenson, Dmitry
dc.contributor.committeemember	Jenkins, Odest Chadwicke
dc.contributor.committeemember	Kuipers, Benjamin
dc.contributor.committeemember	Stone, Peter H.
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/150017/1/dhanvinm_1.pdf
dc.identifier.orcid	0000-0001-6797-0974
dc.identifier.name-orcid	Mehta, Dhanvin; 0000-0001-6797-0974	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: dhanvinm_1.pdf
Size:: 32.91MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.