很认真的中了两篇AAAI2020的文章:NCC-MARL: Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning.

第一篇:NCC-MARL: Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. 

  1. NCC-MARL is a general RL framework to handle large-scale multi-agent cooperative problems.
  2. We notice that agents maintain consistent cognitions about their environments are crucial for achieving effective system-level cooperation. In contrast, it is hard to imagine that the agents without consensuses on their situated environments can cooperate well.
  3. NCC-MARL decomposes all agents into much smaller neighborhoods. Furthermore, we assume that each neighborhood has a true hidden cognitive variable, then all neighboring agents learn to align their learned neighborhood-specific cognitive representations with this true hidden cognitive variable by variational inference. As a result, all neighboring agents will eventually form consistent neighborhood cognitions, and thus achieve effective cooperations.
  4. NCC-MARL achieves much better performance than many baselines, e.g., VDN, QMIX, MADDPG and ATT-MADDPG.

将认知心理学中的Neighborhood Cognitive Consistency引入到MARL中,应该是第一个这么做的工作。评委给分非常好,最后得到了oral presentation的结果,4%左右,所以非常好了。

为了实现Neighborhood Cognitive Consistency,用到了VAE和GNN等技术。

 

 

第二篇:Gated-ACML: Learning Agent Communication under Limited Bandwidth by Message Pruning.

  1. Gated-ACML is an RL framework to learn the beneficial communication messages among multiple distributed agents (e.g., routers) under limited-bandwidth restriction.
  2. It introduces a gating mechanism to prune unprofitable messages adaptively to control the message quantity around a desired threshold.
  3. The proposed gating mechanism can prune a lot of messages with little impact on performance. Moreover, it is not specifically tailored to any specific DRL architecture, namely, it is applicable to several DRL methods. As far as we know, it is the first formal method to achieve this.

主要是引入了门控机制gating mechanism对消息进行剪枝,进而用有限的通信带宽来传输最有利于智能体协作的消息。

衡量“最有利于智能体协作”的标准是看Q-value是不是足够大,或者说是不是大于一个阈值T。

提出了移动平均和设置固定阈值T的两种方法。

 

你可能感兴趣的:((深度)增强学习)