Reinforcement Learning 第十二周课程笔记

This week

  • watch Game Theory.
  • The readings are Littman (1994), Littman and Stone (2003), Greenwald and Hall (2003), *Munoz de Cote *and Littman (2008).
  • Assignment 10 is up.
Reinforcement Learning 第十二周课程笔记_第1张图片
Game Theory III
Reinforcement Learning 第十二周课程笔记_第2张图片
Definition of Game Theory
  • Game theory is mathematics of conflict of interests.
  • It generalizes the RL from single agent to multiple agents.
  • It is of the interest of economics, politics, sociology or even biology since these fields often deal with agents with many many agents with conflicts of interests.

Example

Reinforcement Learning 第十二周课程笔记_第3张图片
Quiz 1: simple game example
  • The example game is represented as a tree, nodes are states, edges are transitions and leafs are rewards.
  • the example game has 2 agent (a and b), all the rewards added up to be a constant ( zero-sum), no stochastic transitions. The leafs show the reward that agent a can get and b gets (-1 * reward).
  • Strategies: what action should the agent take at each state it could be in. (equivalent to policy in RL)
  • The agents are assumed to be rational.
Reinforcement Learning 第十二周课程笔记_第4张图片
Quiz 2: Represent the tree using a matix
  • this is a simple example of 2-agent zero-sum game
    • zero-sum means the reward of A and B will sum to 0 for any strategy.
  • A matrix is enough to represent a game.
  • once the matrix is here, the tree doesn't matter now.
  • need to learn to generate the matrix from the tree.

Minimax

Reinforcement Learning 第十二周课程笔记_第5张图片
MiniMax
  • A and B has the same goal, maximize their own reward ( and minimize others reward).
  • If A and B behave rationally, they will reach the same strategy.
Reinforcement Learning 第十二周课程笔记_第6张图片
Von Neumann theorem

This is important so I am writing it down:

  • In a 2-agent, zero-sum, deterministic game of perfect information, Minimax ≡ Maximin,
  • and there always exist an optimal pure strategy for each agent.
  • Based on the strategy matrix, one can always build at least one tree.

Now, to make the problem a bit more complex, we change the game to be non-deterministic:

Reinforcement Learning 第十二周课程笔记_第7张图片
quiz 3: strategy matrix for non-deterministic game
  • Introduce the chance box, so that transition is non-deterministic.
  • construct the strategy matrix based on tree ( but could not reconstruct tree based on matrix)
    • note: from the matrix, we don't know if the tree is deterministic or not.
  • the minimax theorem (Von Neumann theorem) still holds
    • Minimax ≡ Maximin
    • Optimal pure strategy exists.

Minipoker

Reinforcement Learning 第十二周课程笔记_第8张图片
Mini-Poker: description
Reinforcement Learning 第十二周课程笔记_第9张图片
Mini-Poker: Tree and Matrix
  • In the minipoker game, b will not know all the information, so it's a 2-agent, zero-sum, non-deterministic game of hidden information
  • In this case, Minimax ≠ Maximin and there will be no optimal pure strategy.
  • but there will be mixed Strategy, which is a distribution of strategies.

Mixed Strategy

Reinforcement Learning 第十二周课程笔记_第10张图片
Quiz 5: Given B's strategy, we can figure out A's expected profit
  • A's expected strategy are linear equation, which can be represented by lines.
Reinforcement Learning 第十二周课程笔记_第11张图片
Quiz 6: A's expected value is dependent on B
  • the mixed strategy should be at the intercept of the two lines in this case.
  • if the two lines both have positive slope, the mixed strategy should be at the far right; if negative slope, the strategy should be at the far left.
  • the expected value of the game is deterministic still.
  • although the strategy is mixed, there is still an expected value, in this case, the expected value is when p is 0.4, and value is 1.

Snitch

Reinforcement Learning 第十二周课程笔记_第12张图片
Prisonders' Dilemma
  • Now, we are making the game non-zero-sum.
  • The prisoner's dilemma is a 2-agent, non-zero-sum, non-deterministic game of hidden information
  • Assume the agents are rational, both of them should defect.

A Beautiful Equilibrium

Reinforcement Learning 第十二周课程笔记_第13张图片
Nash Equilibrium
  • in practice, if we are in a Nash Equilibrium, any agent will have no good reason to change strategy ( pure or mixed).
Reinforcement Learning 第十二周课程笔记_第14张图片
Theorem
  • in the n-player pure strategy game, if elimination of strictly dominated strategies eliminates all but one combination, that combination is the unique NE.
  • Any N.E. will survive elimination of strictly dominated strategies
  • if n is finite, for each set of finite strategies, then there will be at least one strategy is N.E.
Reinforcement Learning 第十二周课程笔记_第15张图片
Play the game multiple times: won't change NE
  • what if playing the game multiple times?
  • only the last game matters-> the last game will be N.E -> since the last game is known now, the last game moves to the game before it.-> the last game will be N.E. -> repeat.... ->all the game should will be N.E.

Recap

Reinforcement Learning 第十二周课程笔记_第16张图片
Recap

Andrew Moore's slides on Zero-Sum Games
Andrew Moore's slides on Non-Zero-Sum Games

2015-11-03 初稿
2016-04-26 复习并添加部分内容

你可能感兴趣的:(Reinforcement Learning 第十二周课程笔记)