RL Policy-Based ,基于策略梯度PG的算法:
PG基础: REINFORCE
PG扩展: Actor-Critic,A3C,DPG,DDPG,TRPO,PPO
=============
REINFORCE Algorithms ,Machine Learning,1992
Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992
https://people.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf
Actor-Critic Algorithms, NIPS 1999
https://papers.nips.cc/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
https://www.mit.edu/~jnt/Papers/J094-03-kon-actors.pdf
Asynchronous Advantage Actor-Critic, A3C , ICML 2016
https://arxiv.org/abs/1602.01783
https://github.com/dennybritz/reinforcement-learning/tree/master/PolicyGradient/a3c
A2C,Advantage Actor Aritic:A2C is a synchronous, deterministic variant of
Asynchronous Advantage Actor Critic (A3C)
https://github.com/openai/baselines/blob/master/baselines/a2c/a2c.py
https://openai.com/blog/baselines-acktr-a2c/
Deterministic Policy Gradient Algorithms, DPG ICML 2014
https://hal.inria.fr/file/index/docid/938992/filename/dpg-icml2014.pdf
Continuous Control with Deep Reinforcement Learning,DDPG, ICLR 2016
https://arxiv.org/abs/1509.02971
https://github.com/openai/baselines/tree/master/baselines/ddpg
https://spinningup.openai.com/en/latest/algorithms/ddpg.html
Distributed Distributional Deterministic Policy Gradients,D4PG, ICLR 2018
https://arxiv.org/abs/1804.08617
Trust Region Policy Optimization,TRPO, ICML 2015
https://arxiv.org/abs/1502.05477
Proximal Policy Optimization Algorithms,PPO,2017
https://arxiv.org/abs/1707.06347
https://github.com/openai/baselines/tree/master/baselines/ppo1
Policy Gradient Algorithms,Lilian Weng,OpenAI,2018~
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html
Actor-Critic相关算法小结
https://zhuanlan.zhihu.com/p/29486661
https://blog.csdn.net/WASEFADG/article/details/81042818
深度增强学习(DRL)漫谈 - 从AC(Actor-Critic)到A3C(Asynchronous Advantage Actor-Critic)
https://jinzhuojun.blog.csdn.net/article/details/72851548
Ref:
Reinforcement Learning: A Survey 1996
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/rl-survey.html
REINFORCE Algorithms
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node37.html