Mastering the game of Go without human knowledge笔记

Mastering the game of Go without human knowledge

authors:David Silver, Julian Schrittwieser, Karen Simonyan, ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas baker, Matthew Lai, Adrian bolton, Yutian chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis

Abstract

Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules.

人工智能的一个长期目标便是算法可以在挑战性领域中学习,纯粹的,并有着超过人类表现的能力。最近,AlphaGo成为了第一个能够打败世界围棋冠军的程序。AlphaGo的树搜索方法分析位置并通过深度搜索树来选择下一步。这些神经网络通过监督学习和人类围棋步骤来学习训练,通过加强学习来自我博弈。这里我们介绍了一种只基于增强学习方法的算法,不需要人类的数据,指导或者除了规则之外的其他专业知识。AlphaGo成为了自己的老师:一个神经网络用于预测AlphaGo自身的步骤选择并成为了AlphaGo对局中的赢家。这种神经网络提升了树搜索的强度,通过高质量的步骤选择和在下一次迭代的强大的自我博弈能力。从最空白的时刻开始,我们的新程序AlphaGo Zero达到了超过人类的性能,和前任已经打败了冠军的AlphaGo相比,是100:0的成绩。

Contribution:
- Much progress towards artificial intelligence has been made using
supervised learning systems that are trained to replicate the decisions
of human experts。
- By contrast, reinforcement learning systems are trained from their own experience, in principle allowing them to exceed human capabilities, and to operate in domains where human expertise is lacking.
- Recently, there has been rapid progress towards this
goal, using deep neural networks trained by reinforcement learning.

-->AlphaGo_Fan
-->AlphaGo_Lee
-->AlphaGo_Zero

Our program, AlphaGo Zero, differs from AlphaGo Fan and AlphaGo Lee in several important aspects:
- First and foremost, it is trained solely by self-play reinforcement learning, starting from random play, without any supervision or use of human data.
-

你可能感兴趣的:(论文笔记,Alpha-Zero,笔记)