会议时间:2月2日~7日
会议地点: 新奥尔良市,美国
美国人工智能协会(American Association for Artificial Intelligence)美国人工智能协会是人工智能领域的主要学术组织之一。该协会主办的年会(AAAI, The National Conference on Artificial Intelligence)是一个人工智能领域的主要学术会议。
今年的AAAI本届共收到了3808篇论文投稿,其中录用了938篇,较去年的投稿量增加了47%。
最佳论文
《Memory-Augmented Monte Carlo Tree Search》
Chenjun Xiao, Jincheng Mei and Martin Muller
【Abstract】This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real- time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate M-MCTS in the game of Go. Experimental results show that M- MCTS outperforms the original MCTS with the same number of simulations.
【摘要】本文提出并评估了内存增强蒙特卡罗树搜索(M-MCTS),它提供了一种利用在线实时搜索泛化的新方法。 M-MCTS的关键思想是将MCTS与存储器结构相结合,其中每个条目包含特定状态的信息。 该存储器用于通过组合类似状态的估计来生成近似值估计。 我们证明基于记忆的值近似比在温和条件下具有高概率的香草蒙特卡罗估计更好。 我们在Go游戏中评估M-MCTS。 实验结果表明,M-MCTS在相同数量的模拟中优于原始MCTS。
最佳学生论文
《Counterfactual Multi-Agent Policy Gradients》
Jakob N. Foerster , Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
【Abstract】Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents’ policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent’s action, while keeping the other agents’ actions fixed. COMA also uses a critic representation that al- lows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor- critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
【摘要】许多现实世界的问题,例如网络分组路由和自动驾驶车辆的协调,都很自然地被建模为多智能体协作系统。这类问题非常需要一种新的强化学习方法,可以有效地学习这种系统的分散策略。为此,我们提出一种新的多智能体 actor-critic方法,称为反事实多智能体(counterfactual multi-agent,COMA)策略梯度。COMA使用一个中心化的critic来估计Q函数,以及一个去中心化的actors来优化智能体的策略。此外,为了解决多智能体信度分配的问题,COMA使用一个反事实基线(counterfactual baseline),将单个智能体的行为边缘化,同时保持其他智能体的行为固定不变。COMA还使用critic表示允许在单个前向传播中有效地计算反事实基线。我们在星际争霸单位微操的测试平台上评估COMA,使用具有显着局部可观察性的去中心化变体。在这种条件下,COMA相比其他多智能体actor-critic 方法的平均性能显著要高,而且性能最好的智能体可以与当前最优的中心化控制器相媲美,并能获得全部状态的信息访问。