QMIX

文章目录

  • Net_Structure
  • Tips
  • constraint

Net_Structure

QMIX_第1张图片

Tips

参考文献

  1. we can learn a fully centralised stateaction value function Q_tot and then use it to guide the optimisation of decentralised policies in an actor-critic framework
  2. QMIX consists of agent networks representing each Qa,
    and a mixing network that combines them into Q_tot, not
    as a simple sum as in VDN, but in a complex non-linear way that ensures consistency between the centralised and decentralised policies
  3. non-linear mixing of agent Q-values in
    order to achieve consistent performance across tasks.
  4. cooperative setting

constraint

QMIX_第2张图片
This function allows each agent to participate in a decentralised execution by choosing greedy actions with respect to its value function.

你可能感兴趣的:(MADRL)