置信域策略优化Trust Region Policy Optimization (TRPO)

1. 置信域方法(Trust Region Methods)

[1]将置信域方法用到强化学习中,并取到了非常好的结果.

1.1 优化问题

置信域策略优化Trust Region Policy Optimization (TRPO)_第1张图片

1.2 置信域

置信域策略优化Trust Region Policy Optimization (TRPO)_第2张图片

1.3 置信域方法的过程

置信域策略优化Trust Region Policy Optimization (TRPO)_第3张图片

References

[1] Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897.

[2] GitHub - wangshusen/DeepLearning

你可能感兴趣的:(Reinforcement,Learning,强化学习)