AlphaZero 论文集

Nature 论文

Mastering the game of Go without human knowledge

Nature 550, 7676 (2017). doi:10.1038/nature24270

Authors: David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis

网址：https://www.nature.com/nature/journal/v550/n7676/full/nature24270.html

请下载pdf查看！

Mastering the game of Go with deep neural networks and tree search

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis: Nature 529(7587): 484-489 (2016)

Papers

Mastering the Game of Go without Human Knowledge

https://deepmind.com/documents/119/agz_unformatted_nature.pdf

Human level control with deep reinforcement learning

http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

Play Atari game with deep reinforcement learning

https://www.cs.toronto.edu/%7Evmnih/docs/dqn.pdf

Prioritized experience replay

https://arxiv.org/pdf/1511.05952v2.pdf

Dueling DQN

https://arxiv.org/pdf/1511.06581v3.pdf

Deep reinforcement learning with double Q Learning

https://arxiv.org/abs/1509.06461

Deep Q learning with NAF

https://arxiv.org/pdf/1603.00748v1.pdf

Deterministic policy gradient

http://jmlr.org/proceedings/papers/v32/silver14.pdf

Continuous control with deep reinforcement learning) (DDPG)

https://arxiv.org/pdf/1509.02971v5.pdf

Asynchronous Methods for Deep Reinforcement Learning

https://arxiv.org/abs/1602.01783

Policy distillation

https://arxiv.org/abs/1511.06295

Control of Memory, Active Perception, and Action in Minecraft

https://arxiv.org/pdf/1605.09128v1.pdf

Unifying Count-Based Exploration and Intrinsic Motivation

https://arxiv.org/pdf/1606.01868v2.pdf

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

https://arxiv.org/pdf/1507.00814v3.pdf

Action-Conditional Video Prediction using Deep Networks in Atari Games

https://arxiv.org/pdf/1507.08750v2.pdf

Control of Memory, Active Perception, and Action in Minecraft

https://web.eecs.umich.edu/~baveja/Papers/ICML2016.pdf

PathNet

https://arxiv.org/pdf/1701.08734.pdf

Papers for NLP

Coarse-to-Fine Question Answering for Long Documentshttps://homes.cs.washington.edu/~eunsol/papers/acl17eunsol.pdfADeep Reinforced Model for Abstractive Summarizationhttps://arxiv.org/pdf/1705.04304.pdfReinforcementLearning for Simultaneous Machine Translationhttps://www.umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdfDualLearning for Machine Translationhttps://papers.nips.cc/paper/6469-dual-learning-for-machine-translation.pdfLearningto Win by Reading Manuals in a Monte-Carlo Frameworkhttp://people.csail.mit.edu/regina/my_papers/civ11.pdfImprovingInformation Extraction by Acquiring External Evidence with Reinforcement Learninghttp://people.csail.mit.edu/regina/my_papers/civ11.pdfDeepReinforcement Learning with a Natural Language Action Spacehttp://www.aclweb.org/anthology/P16-1153DeepReinforcement Learning for Dialogue Generationhttps://arxiv.org/pdf/1606.01541.pdfReinforcementLearning for Mapping Instructions to Actionshttp://people.csail.mit.edu/branavan/papers/acl2009.pdfLanguageUnderstanding for Text-based Games using Deep Reinforcement Learninghttps://arxiv.org/pdf/1506.08941.pdfEnd-to-endLSTM-based dialog control optimized with supervised and reinforcement learninghttps://arxiv.org/pdf/1606.01269v1.pdfEnd-to-EndReinforcement Learning of Dialogue Agents for Information Accesshttps://arxiv.org/pdf/1609.00777v1.pdfHybridCode Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learninghttps://arxiv.org/pdf/1702.03274.pdfDeepReinforcement Learning for Mention-Ranking Coreference Modelshttps://arxiv.org/abs/1609.08667

精选文章

wikihttps://en.wikipedia.org/wiki/Reinforcement_learningDeepReinforcement Learning: Pong from Pixelshttp://karpathy.github.io/2016/05/31/rl/CS294: Deep Reinforcement Learninghttp://rll.berkeley.edu/deeprlcourse/强化学习系列之一:马尔科夫决策过程http://www.algorithmdog.com/%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0-%E9%A9%AC%E5%B0%94%E7%A7%91%E5%A4%AB%E5%86%B3%E7%AD%96%E8%BF%87%E7%A8%8B强化学习系列之九:Deep Q Network (DQN)http://www.algorithmdog.com/drl强化学习系列之三:模型无关的策略评价http://www.algorithmdog.com/reinforcement-learning-model-free-evalution【整理】强化学习与MDPhttp://www.cnblogs.com/mo-wang/p/4910855.html强化学习入门及其实现代码http://www.jianshu.com/p/165607eaa4f9深度强化学习系列（二）：强化学习http://blog.csdn.net/ikerpeng/article/details/53031551采用深度 Q 网络的 Atari 的 Demo：

Nature 上关于深度 Q 网络 (DQN) 论文:http://www.nature.com/articles/nature14236David视频里所使用的讲义pdfhttps://pan.baidu.com/s/1nvqP7dB什么是强化学习？http://www.cnblogs.com/geniferology/p/what_is_reinforcement_learning.htmlDavidSilver关于深度确定策略梯度 DPG的论文http://www.jmlr.org/proceedings/papers/v32/silver14.pdfNature上关于 AlphaGo 的论文：http://www.nature.com/articles/nature16961AlphaGo相关的资源http://deepmind.com/research/alphago/What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/DeepLearning in a Nutshell: Reinforcement Learninghttps://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-reinforcement-learning/Bellmanequationhttps://en.wikipedia.org/wiki/Bellman_equationReinforcementlearninghttps://en.wikipedia.org/wiki/Reinforcement_learningMasteringthe Game of Go without Human Knowledgehttps://deepmind.com/documents/119/agz_unformatted_nature.pdfReinforcementLearning(RL) for Natural Language Processing(NLP)https://github.com/adityathakker/awesome-rl-nlp

视频教程

强化学习教程(莫烦)https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/强化学习课程 by David Silverhttps://www.bilibili.com/video/av8912293/?from=search&seid=1166472326542614796CS234:Reinforcement Learninghttp://web.stanford.edu/class/cs234/index.html什么是强化学习? (Reinforcement Learning)https://www.youtube.com/watch?v=NVWBs7b3oGk什么是 Q Learning (Reinforcement Learning 强化学习)https://www.youtube.com/watch?v=HTZ5xn12AL4强化学习-莫烦https://morvanzhou.github.io/tutorials/machine-learning/ML-intro/DavidSilver深度强化学习第1课 - 简介 (中文字幕)https://www.bilibili.com/video/av9831889/DavidSilver的这套视频公开课（Youtube）https://www.youtube.com/watch?v=2pWv7GOvuf0&amp;amp;list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxTDavidSilver的这套视频公开课（Bilibili）http://www.bilibili.com/video/av9831889/?from=search&seid=17387316110198388304Deep Reinforcement Learninghttp://videolectures.net/rldm2015_silver_reinforcement_learning/

Tutorial

Reinforcement Learning for NLPhttp://www.umiacs.umd.edu/~jbg/teaching/CSCI_7000/11a.pdfICML2016, Deep Reinforcement Learning tutorialhttp://icml.cc/2016/tutorials/deep_rl_tutorial.pdfDQN tutorialhttps://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-4-deep-q-networks-and-beyond-8438a3e2b8df#.28wv34w3a

代码

OpenAI Gymhttps://github.com/openai/gymGoogleDeepMind 团队深度 Q 网络 (DQN) 源码:http://sites.google.com/a/deepmind.com/dqn/ReinforcementLearningCodehttps://github.com/halleanwoo/ReinforcementLearningCodereinforcement-learninghttps://github.com/dennybritz/reinforcement-learningDQNhttps://github.com/devsisters/DQN-tensorflowDDPGhttps://github.com/stevenpjg/ddpg-aigymA3C01https://github.com/miyosuda/async_deep_reinforceA3C02https://github.com/openai/universe-starter-agent

AlphaZero 论文集

你可能感兴趣的:(AlphaZero 论文集)