AI 笔记 Week 01-02 Game Playing

  • Course schedule
  • syllabus

Week01学习内容:

  1. Game Playing through Depth-limited Search;
  2. review/learn Python; play each other in Isolation;
  3. Chapter 1-2; Chapter 5.0-5.2
  4. R&N slides on Game Playing

Week 01 任务清单:

  • Introduction on Piazza
  • Start-of-Course Survey

Book used in the class Artificial Intelligence: A Modern Approach (3rd edition) by Russell and Norvig (R&N).
下载链接一下载链接二(expired)


Lesson 1: Game Playing

AI 笔记 Week 01-02 Game Playing_第1张图片
Intro
AI 笔记 Week 01-02 Game Playing_第2张图片
Couse outline
AI 笔记 Week 01-02 Game Playing_第3张图片
Goal of the lesson
  • to build an AI to play the games such as "Isolation" - deterministic games
  • apply minimax algorithm to play games and use alpha beta pruning to speed up the minimax algorithm
  • with some modification, "expectimax" algorithm, we can also build AI's to play games "have an element of chance."

Challenging Quiz

AI 笔记 Week 01-02 Game Playing_第4张图片
Quiz 1: mark the branches of the tree that can be pruned out using probabilistic alpha-beta pruning, i.e., do not need to be explored any further.

The Isolation Game

  • The game has two players: x and o. The players take turns, with player x moving first at the beginning of each game.
  • Player x starts at position (1,1) while o starts at (8,8).
  • a player can move like a queen in chess (in any of the eight directions) as long as her path does not cross a square already filled in or occupied but the other.
  • After moving, the space vacated by the player is designated as filled and cannot be moved to again. Notice that only the space that was occupied is filled, not the entire path.
  • The game ends when one player can no longer move, leaving the other player as the winner.

Building a Game Tree

AI 笔记 Week 01-02 Game Playing_第5张图片
game tree
  • Game tree illustrates all the possible steps of the "isolation" game.
  • by examing the tree, we can identify branches that lead to a "lose" situation and warn the computer player.
  • the entire tree is here: Isolation game tree with leaf values

MiniMax

AI 笔记 Week 01-02 Game Playing_第6张图片
Open book
  • the win (+1) and lose (-1) leaves indicated the result of each branch of the tree
  • The question becomes how to figure out which first-move is the best automatically
  • Minimax algorithm: computer player (AI) wanted to maximize the score to win, and opponent wanted to minimize the score to win.
  • start at the maximizing level
  • to determine wins or losses of a game, we need to Propagating Values Up The Tree.
  • the win or lose is determined by who will make a choice at the level and how the win/loss situation is under the node.
  • at the max node, choose the max value and at min level, choose the min value.

quiz

AI 笔记 Week 01-02 Game Playing_第7张图片
quiz 2
  • from top-down, the second level (max level), the AI has the move so it can decide which branch to go down. Although there are two branches can lead to loss, the AI can avoid them to go to the right branch which guarantees a win.
  • here is the whole tree with every node marked, download to see

Optional Reading: Artificial Intelligence - A Modern Approach (AIMA): Chapter 5.1-5.2

AI 笔记 Week 01-02 Game Playing_第8张图片
Minimax
  • s is state or note of the game tree
  • the minimax value of the leaf nodes are known
  • when the node is max, the minimax value of the node is the max of all the values down below
  • when the node is min, the minimax value of the node is the smallest of all its subnodes.
  • the decision of actions at each stage (minimax decision) depends on the minimax algorithm

The minimax algorithm performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is m and there are b legal moves at each point, then the time complexity of the minimax algorithm is O(b^m). The space complexity is O(bm) for an algorithm that generates all actions at once or O(m) for an algorithm that generates actions one at a time.

Multiplayer, non-zero sum games are more complicated

for each state, its minimax value becomes a vector which contains the value for all the players
there will be dynamically forming alliances, for example, A and B play against C.
collaboration can also occur with just two players in a non-zero sum game.


Max Number of Nodes Visited

  • on a 5x5 isolation board, the number of possible nodes is bounded to 25!, but it's no big deal. see Quiz below: In the third move, move to which square will result in maximum options for next move?
AI 笔记 Week 01-02 Game Playing_第9张图片
Quiz 3
  • my answer is the middle square. it has eight directions to go in the next move.
  • the maximum possible number of moves from the state is 16.

the branching factor

  • the branching factor indicates how many options for each move in the isolation game.
AI 笔记 Week 01-02 Game Playing_第10张图片
Quiz 4
  • I guessed this one and got it right. My rationale:
    • branching factor (b) is roughly the number of possible move options in each level of moves. there are d levels, so the number of nodes that the minimax should visit is about b^d. b*d and d^2 are underestimated. d^b is over estimated.

an average branching factor can be calculated through game simulation.

AI 笔记 Week 01-02 Game Playing_第11张图片
average branching factor
  • the average b = 8. but still, there are 8^25 nodes to visit, which might take millions of year.
  • The goal is, to not let the human player wait for more than 2 seconds.
  • Say if the computer can visit 10^9 nodes in a second, so we need 2 seconds. based on this we can calculate the maximum depth of the game.
  • 2*10^9 = 8^x; x < 10.3; here, x is the depth of the search.
  • this is depth limited search: limit the depth of the search so it can finish within a deadline.

Now the question becomes "at level 9, how can we tell which node is good or bad" based on the level 10 nodes.

  • an evaluation function will be needed. one way the evaluation function can be constructed to count the possible moves left for the computer player(count-my-move).

Quize

AI 笔记 Week 01-02 Game Playing_第12张图片
Quize
  • Solution method: count the possible moves. Max level: select maximum of the lower level nodes; Min level: select the minimum of the lower level nodes
AI 笔记 Week 01-02 Game Playing_第13张图片
Quiz: testing the evalutation function with level 3 tree
  • Click to open: Isolation Game Tree - Level 3
  • do the calculation for the whole tree.
AI 笔记 Week 01-02 Game Playing_第14张图片
Quize: test evaluation question with level 2 of the tree

The results do not agree with the results of level 3 tree. Which means when to search for level 2 and level 3, the evaluation function will give a different answer.

  • It does not mean the evaluation function does not work
  • it might mean the search is not deep enough.
  • we can go deep to reach the point that when deeper in levels did not change the result.

A problem

  • we try to avoid going deeper because the number of nodes increases exponentially as we go deep.
  • Quiescence search is an algorithm typically used to evaluate minimax game trees in game-playing computer programs. It is a remedy for the horizon problem faced by AI engines for various games like chess and Go.
  • Trick: interactive deepening.

interactive deepening.

  • the problem: return the best answer before time run out (2 seconds).
  • Steps:
    1. search level 1, record the best answer, answer 1.
    2. if time is up, return answer 1, if not, search level 2 and get the best answer.
    3. check time, if time is up, return level 2 answer, if not, deepen the search to next level ...

As we deepen the search, the number of nodes on the tree and the interactive deepening nodes increase

AI 笔记 Week 01-02 Game Playing_第15张图片
Quize: Exponential B
  • when b = 2, n = 2 (d+1) - 1
  • when b = 3, n = (3 (d+1) - 1)/ 2
  • when b = k, n = (k (d+1) - 1)/ (k -1)

interactive deepening on Isolation game

  1. branching factor of the Isolation game varies
  2. at the beginning of the game, branching factor (B) is large, and searching needs more time.
  3. at the end of the game B is smaller, and searching is shallow.

Horizon Effect

I did not get it. But it seems leading to the discussion of evaluation functions.

AI 笔记 Week 01-02 Game Playing_第16张图片
Quiz: Good Evaluation Functions
  • the answer was chosen based on that it is more likely to be positively correlated with the chance of winning
  • the new evaluation question can lead to the correct move for this board. Evaluation function should represent a good winning strategy

Alpha-Beta Pruning

Alpha-Beta Pruning allows the AI to skip a section of the game tree but still get the same answer of the minimax algorithm. [More efficient than minimax]

AI 笔记 Week 01-02 Game Playing_第17张图片
minimax
AI 笔记 Week 01-02 Game Playing_第18张图片
image.png
AI 笔记 Week 01-02 Game Playing_第19张图片
Alpha-Beta Pruning Quiz 1
AI 笔记 Week 01-02 Game Playing_第20张图片
Alpha-Beta Pruning Quiz 2

Searching Complex Games

Reading AIMA: Chapter 5.3-5.4 [Done, did not track time]

Solving 5x5 Isolation

Some strategies for the 5x5 game: open game search table; check for reflections to minimize the open game; occupy center; reflex move; minimax and alpha-beta pruning.

3-Player Games

AI 笔记 Week 01-02 Game Playing_第21张图片
3-Player Games
  • minimax doesn't apply for a 3-player game, instead, we evaluate each node based on which players move at each move.
  • the end node will be a vector of utility for each player.
  • max-N algorithm
AI 笔记 Week 01-02 Game Playing_第22张图片
3-Player Games quize

3-Player Alpha-Beta Pruning

  • Question: can alpha-beta pruning work for 3-player games?
  • Answer: as long as the game utility has a lower-bound and individual utility has an upper-bound, then yes.

E.g., shallow pruning and Immediate pruning are possible, but not deep pruning (why?)

AI 笔记 Week 01-02 Game Playing_第23张图片
alpha-beta pruning
AI 笔记 Week 01-02 Game Playing_第24张图片
3 player pruning
  • from the most left nodes, we get (3,3,3) for move 2 based on player 2's preference
  • Thus, player one should have a value which is >= 3.
  • in the middle branch, after evaluating the most left node, player 2 will update its value when the value is >=7, and make value for player 1 to be <= 2, but player 1 already had a better solution, then there is no need to look for the other 2 nodes in the middle branch.
  • same logic will determine the last node can be pruned.

Probabilistic games

One type of probabilistic games is that each action might lead to two or more states with a certain probability. Sloppy isolation is an example:

Sloppy Isolation

AI 笔记 Week 01-02 Game Playing_第25张图片
Sloppy Isolation
AI 笔记 Week 01-02 Game Playing_第26张图片
Sloppy isolation with constrain

In this version of isolation, the player might overshoot or undershoot with a 10% chance for each error, and the player will have 80% chance meet the goal.

This can be complicated as the player's moves were limited by available of squares. E.g., sometimes there will be no chance of undershoot or overshoot (see pic above). In this case, the change of hitting the target will be 90%, and the chance of overshoot/undershoot will still be 10%.

expectedMax

AI 笔记 Week 01-02 Game Playing_第27张图片
expectedMax

When populating the game tree, we need to generate branch and leaves based on a probability of hit or miss the target.

Pruning might still work.

AI 笔记 Week 01-02 Game Playing_第28张图片
Expectimax Alpha-Beta Pruning

Further Reading

Peter Norvig and Sebastian Thrun’s lectures on Game Playing and Game Theory.


Games

Games type and techniques
  • St: stochastic games: MDP is for ST
  • PO: partially observable games: POMDP is for partially observable environment
  • U: Unknown games: RL deals with Unknown environment
  • CL: Computational limitations: A* search, h functions, and Monte Carlo focus on computational limitations.
  • A: Adversaries: there are more than one players play against each other.
Quize: match games with features

Single Player Game

Single player game
  • to solve a single player game, the player needs to explore the game tree until finding the goal state, and then backup to find the path to the goal state.
  • On the left of the screen, parameters to describe a game are listed. state, player, action, transition function, utility function, terminal state checker, etc.

Two Player Game

Chess as an example
  • Two player adversary games can be solved by MiniMax algorithm while player Max will maximize its utility, and player Min will minimize Max's utility.
  • The tree search will go all the way down to the terminal state. Utility function will define the utilities of terminal states. When the search reaches the terminal, the value of the path will back-up to the start state. the player will choose the path based on its utility in each state.
AI 笔记 Week 01-02 Game Playing_第29张图片
Function and pseudo code for two player games
AI 笔记 Week 01-02 Game Playing_第30张图片
Quiz: Time Complexity Question
  • b is branching factor, m is the depth of the game tree.
AI 笔记 Week 01-02 Game Playing_第31张图片
Space Complexity Question
  • the space complexity is better than the time complexity.
AI 笔记 Week 01-02 Game Playing_第32张图片
Complexity of the Chess game
  • given b=30, m= 40, and we can use all the computers in the world, search through the tree of a chess game will take forever.
AI 笔记 Week 01-02 Game Playing_第33张图片
Complexity reduction
  • to deal with complexity problem, we need to try to reduce b, m and the structure of the game representation.
AI 笔记 Week 01-02 Game Playing_第34张图片
Queston review of miniMax
  • fill in the utility for each node.

reduce B

AI 笔记 Week 01-02 Game Playing_第35张图片
Pruning
  • the first path in the middle node can define the value of the second node in the min level will be less than 2, which can never be large than 3, so no matter that value of the two terminal nodes will be, the results of the top node will not change. Thus, the player doesn't need to visit the two nodes. Pruning means to ignore some nodes to reduce the branching factor b.
AI 笔记 Week 01-02 Game Playing_第36张图片
Qize: which node can be pruned?
  • the 8 does not need to be visited since the upper node needs to be less than 1 to be updated, which is smaller than the current best answer for the top node.

Reduce M

AI 笔记 Week 01-02 Game Playing_第37张图片
Reduce M
  • Cut off the search at a certain depth. Since we did not reach terminal yet, we can not use utility function. Instead, we can use evaluation function.
  • Here, the evaluation function will be the keep to win the game. and this method is the Alpha-Beta pruning.
  • Alpha-Beta pruning can reduce the game from a O(bm) problem to a O(bm/2) problem if the tree nodes are ordered perfectly.
AI 笔记 Week 01-02 Game Playing_第38张图片
Updating the pseudo code for the value function
AI 笔记 Week 01-02 Game Playing_第39张图片
Updating the pseudo code for maxValue function
AI 笔记 Week 01-02 Game Playing_第40张图片
Horizon Effect
  • Cutting off depth of search can suffer from Horizon Effect
  • When in the certain cutting off the depth of search preventing the player to see the real winning path, it can give the wrong answer. Unless the searching depth gets deeper, the player will never reach the node which can decide the winning path.

Dealing with stochastics

AI 笔记 Week 01-02 Game Playing_第41张图片
Dealing with Stochastic Games
  • using the expected-value function for the chance node.
AI 笔记 Week 01-02 Game Playing_第42张图片
Game tree of example of a Stochastics game
AI 笔记 Week 01-02 Game Playing_第43张图片
Quiz: fill out the values for each node
AI 笔记 Week 01-02 Game Playing_第44张图片
possible end note of the game if players play rationally
  • The game could start at the left or the right branch. if starting at 5, player max will choose b, if start at 3, it will choose A...
AI 笔记 Week 01-02 Game Playing_第45张图片
Evaluation Function 1
AI 笔记 Week 01-02 Game Playing_第46张图片
Evaluation Function 2
  • The above two quiz showed how evaluation question can affect the player's choice.

Conclusion

AI 笔记 Week 01-02 Game Playing_第47张图片
components of solution of games
  • Minimax algorithm uses maxValue() and minValue() to find out the value of a state.
  • if the state is the terminal state, then its value is defined by the utility function.
  • to reduce the complexity of the problem, we can cut-off the search at a depth level and use Evaluation function to determine the value of the states.
  • When a state is stochastic, it's value is determined by a expValue() function.
  • alpha and beta will be used for pruning the tree.

Example codes for Games, or Adversarial Search. (Chapters 6)


2017-09-10 First draft, took 6 hours
To do: all the readings in the section. Notes of the reading should be done.

你可能感兴趣的:(AI 笔记 Week 01-02 Game Playing)