基于值的方法:V值,Q值。
有价值的是Q值方法,后续Value-Based,一般是指Q值。
Q-Learning,代表一大类相关的算法。
RL Value-Based: off-policy DQN(Deep Q-Learning), on-policy
Q Learning->Approximate Q-Learning -> Deep Q-Learning.
DQN(Deep Q-Learning):
Deep Q-Learning was introduced in 2013. Since then, a lot of improvements have been made.
So, today we’ll see four strategies that improve — dramatically — the training and the results of our DQN agents:
DeepMind 2013,DQN & Fixed Q-Targets
Playing Atari with Deep Reinforcement Learning, NIPS 2013
https://arxiv.org/abs/1312.5602
Human Level Control Through Deep Reinforcement Learning, Nature 2015
https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
Code: https://sites.google.com/a/deepmind.com/dqn/
DeepMind 2015,Double DQN
Deep Reinforcement Learning with Double Q-learning, AAAI 2016
https://arxiv.org/abs/1509.06461
DeepMind 2015,Prioritized Experience Replay,PER,
Prioritized Experience Replay, https://arxiv.org/abs/1511.05952 ,ICLR 2016
DeepMind 2015,Dueling DQN,
Dueling Network Architectures for Deep Reinforcement Learning , ICML 2016
https://arxiv.org/abs/1511.06581
=================
Alpha Go, 2016-2017
Mastering the game of Go with deep neural networks and tree search. Nature 2016
https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf
Mastering the game of Go without human knowledge. Nature 2017
https://www.nature.com/articles/nature24270.epdf
http://augmentingcognition.com/assets/Silver2017a.pdf
https://cs.uwaterloo.ca/~ppoupart/teaching/cs885-spring18/slides/cs885-lecture14a.pdf
https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf
Robot 2016,
Levine, S., Pastor, P., Krizhevsky, A., and Quillen, D. (2016).
Learning hand-eye coordination for robotic grasping with large-scale data collection
https://arxiv.org/abs/1603.02199
drone control 2017,
Kahn, G., Zhang, T., Levine, S., and Abbeel, P. (2017).
PLATO: policy learning using adaptive trajectory optimization, in Proceedings of IEEE International Conference on Robotics and Automation (Singapore), 3342–3349.
https://arxiv.org/abs/1603.00622
============================================
Ref:
https://zhuanlan.zhihu.com/p/107172115
https://theaisummer.com/Taking_Deep_Q_Networks_a_step_further/
https://www.freecodecamp.org/news/improvements-in-deep-q-learning-dueling-double-dqn-prioritized-experience-replay-and-fixed-58b130cc5682/
https://cugtyt.github.io/blog/rl-notes/201807201658.html
深度增强学习(DRL)漫谈 - 从DQN到AlphaGo
https://blog.csdn.net/jinzhuojun/article/details/52752561
强化学习之DQN进阶的三大法宝(Pytorch)
https://blog.csdn.net/MR_kdcon/article/details/111245496
DQN三大改进
https://cloud.tencent.com/developer/article/1092132
https://cloud.tencent.com/developer/article/1092124
https://cloud.tencent.com/developer/article/1092121
使用Python的OpenAI Gym对Deep Q-Learning的实操介绍
https://blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/90306841
请问DeepMind和OpenAI身后的两大RL流派有什么具体的区别
https://www.zhihu.com/question/316626294