Multiagent cooperation and competition with deep reinforcement learning

论文复现 :

tensorflow_2player_pong

论文详述

Multiagent cooperation and competition with deep reinforcement learning

pong game-two agents

  • 基础模型:pong game, two agents
  • 算法结构:dqn
    • reward:scoring:(-1,1) conceding(-1)
      未击中球得-1,击中球得分between (-1,1)
      双方均击中球得分0,游戏继续
reward
  • 训练参数
    • 50 epochs, 250000 time steps each.
    • exploration rate: 1.0 to 0.05(in the 1000000 time steps) and stays fixed at that value
parameters.png
  • 结果分析
    • 是否收敛:monitor average maximal Q-values of 500 randomly selected game situations, set aside before training begins


      Q values
    • 训练效果反馈:

      • Average paddle-bounces per point 在一方得分前球在players间来回的次数
      • Average wall-bounces per paddle-bounce 球在到达一方前撞墙的次数
      • Average serving time per point 球丢了以后players restart game的反应时间(一些rewarding scheme下players不希望重启游戏,serving time很长,如p = -1)

结果分析

  • scoring = -1时,双方为合作状态(均不希望球掉落)
    最终双方均升至页面最上方,球水平传来传去
    合作模式video-youtube
    1.png
  • scoring = 1时,双方为竞争模式(希望自己多得分)
    竞争模式video-youtube
    2.png
  • p range from -1 to 1
3.png
  • multiplayer dqn vs single-player
    (score表示a胜b的得分)


    4

本文遵守知识共享协议:署名-非商业性使用-相同方式共享 (BY-NC-SA)及协议
转载请注明:作者空空格格,首发 Jianshu.com

你可能感兴趣的:(Multiagent cooperation and competition with deep reinforcement learning)