RL Value-Based: off-policy DQN(Deep Q-Learning),on-policy

基于值的方法:V值,Q值。
有价值的是Q值方法,后续Value-Based,一般是指Q值。

Q-Learning,代表一大类相关的算法。

RL  Value-Based:  off-policy DQN(Deep Q-Learning), on-policy

Q Learning->Approximate Q-Learning -> Deep Q-Learning.

DQN(Deep Q-Learning):

Deep Q-Learning was introduced in 2013. Since then, a lot of improvements have been made.
So, today we’ll see four strategies that improve — dramatically — the training and the results of our DQN agents:

  • raw DQN  & fixed Q-targets
  • double DQNs
  • dueling DQN (aka DDQN)
  • Prioritized Experience Replay ( PER)

DeepMind 2013,DQN  &  Fixed Q-Targets
Playing Atari with Deep Reinforcement Learning, NIPS 2013
https://arxiv.org/abs/1312.5602

Human Level Control Through Deep Reinforcement Learning, Nature 2015
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
Code: https://sites.google.com/a/deepmind.com/dqn/

DeepMind 2015,Double DQN
Deep Reinforcement Learning with Double Q-learningAAAI 2016
https://arxiv.org/abs/1509.06461

DeepMind 2015,Prioritized Experience Replay,PER,
Prioritized Experience Replay,  https://arxiv.org/abs/1511.05952 ,ICLR 2016

DeepMind 2015,Dueling DQN,
Dueling Network Architectures for Deep Reinforcement Learning , ICML 2016
https://arxiv.org/abs/1511.06581

=================

Alpha Go, 2016-2017
Mastering the game of Go with deep neural networks and tree search.   Nature 2016
https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

Mastering the game of Go without human knowledge.    Nature 2017
https://www.nature.com/articles/nature24270.epdf
http://augmentingcognition.com/assets/Silver2017a.pdf
https://cs.uwaterloo.ca/~ppoupart/teaching/cs885-spring18/slides/cs885-lecture14a.pdf
https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf

Robot 2016,
Levine, S., Pastor, P., Krizhevsky, A., and Quillen, D. (2016).
Learning hand-eye coordination for robotic grasping with large-scale data collection
https://arxiv.org/abs/1603.02199

drone control 2017,
Kahn, G., Zhang, T., Levine, S., and Abbeel, P. (2017).
PLATO: policy learning using adaptive trajectory optimization, in Proceedings of IEEE International Conference on Robotics and Automation (Singapore), 3342–3349.
https://arxiv.org/abs/1603.00622

============================================

Ref:
https://zhuanlan.zhihu.com/p/107172115
https://theaisummer.com/Taking_Deep_Q_Networks_a_step_further/

https://www.freecodecamp.org/news/improvements-in-deep-q-learning-dueling-double-dqn-prioritized-experience-replay-and-fixed-58b130cc5682/

https://cugtyt.github.io/blog/rl-notes/201807201658.html

深度增强学习(DRL)漫谈 - 从DQN到AlphaGo
https://blog.csdn.net/jinzhuojun/article/details/52752561

强化学习之DQN进阶的三大法宝(Pytorch)
https://blog.csdn.net/MR_kdcon/article/details/111245496
DQN三大改进
https://cloud.tencent.com/developer/article/1092132
https://cloud.tencent.com/developer/article/1092124
https://cloud.tencent.com/developer/article/1092121
使用Python的OpenAI Gym对Deep Q-Learning的实操介绍
https://blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/90306841

请问DeepMind和OpenAI身后的两大RL流派有什么具体的区别
https://www.zhihu.com/question/316626294

你可能感兴趣的:(03.RL)