RL Policy-Based : Actor-Critic,A3C,DPG,DDPG,TRPO,PPO
RLPolicy-Based,基于策略梯度PG的算法:PG基础:REINFORCEPG扩展:Actor-Critic,A3C,DPG,DDPG,TRPO,PPO=============REINFORCEAlgorithms,MachineLearning,1992RonaldJ.Williams.Simplestatisticalgradient-followingalgorithmsforco