逆强化学习(Inverse Reinforcement Learning)

谷歌 DeepMind 首席程序员,AlphaGo 创始人之一,UCL 的 David Silver 教授对于 IRL 的观点是:


Recently, a new set of approaches have been developed for learning from demonstration based on the concept of Inverse Optimal Control

Rather than learn a mapping from perceptual features to actions, these approaches seek to learn a mapping from perceptual features to costs, such that a planner minimizing said costs will achieve the expert demonstrated behavior. 

These methods take advantage of the fact that while it is difficult for an expert to define an ordering of preferences, it is easy for an expert to demonstrate the desired behavior


也就是说,人类(专家)很难对偏好进行排序,但是演示所需的行为是很简单的,这也就是逆强化学习背后的逻辑。

你可能感兴趣的:(模仿学习)