VALUE PROPAGATION NETWORKS

Anonymous authors
Paper under double-blind review
ABSTRACT
We present Value Propagation (VProp), a parameter-efficient differentiable planning
module built on Value Iteration which can successfully be trained in a reinforcement
learning fashion to solve unseen tasks, has the capability to generalize to
larger map sizes, and can learn to navigate in dynamic environments. We evaluate
on configurations of MazeBase grid-worlds, with randomly generated environments
of several different sizes. Furthermore, we show that the module and its variants
provide a simple way to learn to plan when adversarial agents are present and
the environment is stochastic, providing a cost-efficient learning system to build
low-level size-invariant planners for a variety of interactive navigation problems.

你可能感兴趣的:(VALUE PROPAGATION NETWORKS)