Asynchronous Methods for Deep Reinforcement Learning 阅读笔记

Asynchronous Methods for Deep Reinforcement Learning 阅读笔记

标签(空格分隔): 增强学习算法 论文笔记


本文的贡献在于提出了异步学习的算法,并应用在A2C Q-learning等算法中

该论文作者提出了异步训练(Asynchronous Methods)的方法应用到强化学习的各个算法中(Sarsa,one-step Q-learning n-step Q-learning和 advantage actor-critic)然后 作者通过实验说明将异步训练方式应用在 A2C中的效果最好,于是就有了A3C(Asynchronous advantage actor-critic).

作者在设计 Asynchronous Methods初衷是为了解决:在线学习获得的训练数据不稳定,而且数据与数据之间的相关性比较大

通常的做法是采用replay memory的机制,这种方法能够保证稳定性以及减少数据之间的相关性,但是replay memeory的机制同时也将算法限制在off-policy的范畴之内了。

关于on-policy 和off-policy:
on-policy: 训练数据都是最新的策略而非老的策略采集而来的;
off-ploicy: 训练数据是由历史的(包括最新的)策略采集而来

除此之外,增强学习需要的数据比较大,需要大量的experience,如果实用relplay memory则将会加大训练成本

因此,作者为了解决训练数据相关性比较大 replay memory 占用大量的资源,提出了Asynchronous Methods方法。

Asynchronous Methods的核心思想是用多个action-learner(相当于多个agnet)来玩一个游戏,由于游戏的初始状态是随机的,这样就能保证数据之间相关性较少且可以on-policy学习。

相对于replay memory, Asynchronous Methods优点在于:
(1)可以将算法应用在on-policy
(2)减少大量的显存,可以在多核CPU上进行训练,大大少训练成本

论文原话:

  1. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network
  2. Aggregating over memory in this way reduces non-stationarity and decorre- lates updates, but at the same time limits the methods to off-policy reinforcement learning algorithms
  3. it uses more memory and computation per real interaction; and it requires off-policy learning algorithms that can update from data generated by an older policy.
  4. Instead of experience replay, we asynchronously execute multiple agents in parallel, on multiple instances of the environment.
  5. Keeping the learners on a single machine removes the communication costs of sending gradients and parameters and enables us to use.

然后 作者将Asynchronous Methods分别应用在Q-learning 和 n-step Q-learning 以及A2C上。
Asynchronous Methods for Deep Reinforcement Learning 阅读笔记_第1张图片
Asynchronous Methods for Deep Reinforcement Learning 阅读笔记_第2张图片
Asynchronous Methods for Deep Reinforcement Learning 阅读笔记_第3张图片

你可能感兴趣的:(强化学习)