MARL(multi-agent reinforcement learning)的一些边缘文章(imitation、transfer、security等)


参考:https://github.com/LantaoYu/MARL-Papers

7.4.2、Inverse MARL

[1] Cooperative inverse reinforcement learning by Hadfield-Menell D,Russell S J, Abbeel P, et al. NIPS, 2016.

[2] Comparison of Multi-agent and Single-agent Inverse Learning on aSimulated Soccer Example by Lin X, Beling P A, Cogill R. arXiv, 2014.

[3] Multi-agent inverse reinforcement learning for zero-sum games byLin X, Beling P A, Cogill R. arXiv, 2014.

[4] Multi-robot inverse reinforcement learning under occlusion withinteractions by Bogert K, Doshi P. AAMAS, 2014.

Multi-agent inverse reinforcement learningby Natarajan S, Kunapuli G, Judah K, et al. ICMLA, 2010.

7.4.3、Imitation MARL

[1] Coordinated Multi-Agent Imitation Learning by Le H M, Yue Y,Carr P. arXiv, 2017. 

7.4.4、Multi-Task MARL

[1] Oh J, Singh S, Lee H, et al. Zero-Shot Task Generalization withMulti-Task Deep Reinforcement Learning[J]. 2017. 基本思想是强制Multi-Task的embedding尽量一致实现Zero-Shot泛化。

[2] Omidshafiei S, Pazis J, Amato C, et al. Deep DecentralizedMulti-task Multi-Agent Reinforcement Learning under Partial Observability[J].2017.

7.4.5、Transfer MARL

顾名思义,transfer learning考虑将从一个环境学习的能力迁移到另一个环境中,且保持迁移成本不高;最核心的思想是,学习不同任务之间可能存在的共同结构、子结构,然后在新环境中重用这些信息。

Schema Networks[1]考虑多个任务之间存在repeatablestructure and sub-structure的迁移,并且能够实现zero-shot transfer。Schema Networks are implemented as probabilistic graphical models(PGMs), which provide practical inference and structure learning techniques。然后通过把输入图像分割成多个entity,每个entity添加一些attribute,再设置一些相关信息使PGM能够运行起来。其实文章写的很晦涩,没看懂。

文章[2]考虑如下的transfer learning问题:states S of the source and the target domains can be quitedifferent, while the action spaces A are shared and the transitions T andreward functions R have structural similarity,即domainadaptation。解决思路是:We propose tackling both of theseissues by focusing instead on learning representations which capture anunderlying low-dimensional factorised representation of the world and aretherefore not task or domain specific。为了从raw image中unsupervised学习latent representation,他们采用了β-VAE模型。

[1] Kansky K, Silver T, Mély D A, et al. Schema Networks: Zero-shotTransfer with a Generative Causal Model of Intuitive Physics[J]. 2017.

[2] Higgins I, Pal A, Rusu A A, et al. DARLA: Improving Zero-ShotTransfer in Reinforcement Learning[J]. 2017.

[3] Transfer Learning for Multiagent Reinforcement Learning Systemsby da Silva, Felipe Leno, and Anna Helena Reali Costa. IJCAI, 2016.

[4] Accelerating multi-agent reinforcement learning with dynamicco-learning by Garant D, da Silva B C, Lesser V, et al. Technical report, 2015

[5] Transfer learning in multi-agent systems through paralleltransfer by Taylor, Adam, et al. ICML, 2013.

[6] Transfer learning in multi-agent reinforcement learning domainsby Boutsioukis, Georgios, Ioannis Partalas, and Ioannis Vlahavas. EuropeanWorkshop on Reinforcement Learning, 2011.

[7] Transfer Learning for Multi-agent Coordination by Vrancx, Peter,Yann-Michaël De Hauwere, and Ann Nowé. ICAART, 2011.

[8] Multiagent reinforcement learning with sparse interactions bynegotiation and knowledge transfer by Zhou L, Yang P, Chen C, et al. IEEEtransactions on cybernetics, 2016.

7.4.6、Security

[1] Markov Security Games: Learning in Spatial Security Problems byKlima R, Tuyls K, Oliehoek F. The Learning, Inference and Control ofMulti-Agent Systems at NIPS, 2016.

[2] Cooperative Capture by Multi-Agent using Reinforcement Learning,Application for Security Patrol Systems by Yasuyuki S, Hirofumi O, Tadashi M,et al. Control Conference (ASCC), 2015

[3] Improving learning and adaptation in security games byexploiting information asymmetry by He X, Dai H, Ning P. INFOCOM, 2015.

[4] Alshiekh, Mohammed, et al. "Safe Reinforcement Learning viaShielding." arXiv preprint arXiv:1708.08611 (2017).


你可能感兴趣的:((深度)增强学习)