Understanding The Minimax Algorithm-理解极小极大搜索算法
One of the most interesting avenues of computerscience is that of programming a computer to play a game against a humanopponent. Examples abound, with the most famous that of programming a computerto play chess. But no matter what the game is, the programming tends to followan algorithm called minimax, with various attendant sub-algorithms in tow.
First,a definition: a two-player zero-sum game is one played between two playerswhere the players play alternately, the whole game is visible to both and there’s a winner and a loser (or there’s a draw). It’s zero-sum because if the game is played for money, theloser pays the winner and overall there’s no loss of money. (A bit like energy in a reaction: nomoney is created or destroyed.)
Oneof the simplest two-player zero-sum games is noughts and crosses, where theplayers alternately place Xs and Os in a 3 x 3 grid, with the winner being thefirst player to place three of their symbol in a row, column or diagonal line.Like me, you probably played this as a child and, as you played it, you learnedhow to force a win or draw every time. In fact, once both players get thatinsight, every game is guaranteed to result in a draw. The only way to win isto play a novice player.
算法 (The algorithm)
Analysingnoughts and crosses with the minimax algorithm is pretty standard in gametheory, so I’lldiscuss a different game called Nim to illustrate minimax and its variants. Nimis interesting because it’seasily understood, fairly unfamiliar and simply modelled. Plus, there are nodraws in Nim, so the whole winner/loser thing is much simpler: someone alwayswins. But who?
InNim, the players face three piles of stones with, say, five stones in eachpile. Each player takes it in turn to play by removing from a single pileanything from one stone to the entire pile. The loser is the one who is forcedto remove the final stone from the final pile, leaving all three piles empty.(Another way of looking at it is that the winner is the first player to befaced with three empty piles.)
Forexample, suppose our two players are named Max and Minnie. Max starts (healways does, not being a gentleman) and decides to remove all the stones frompile one. Minnie then removes all but two stones from pile two. Max thinks fora while, then removes all but two stones from pile three. Minnie resigns,because no matter what she does, Max will win. (If she removes one stone from apile, Max removes both stones from the other, and she’s left with the final stone. If she removes both stones froma pile, Max removes one stone from the other, leaving her with the finalstone.)
遍历节点 (Traversing nodes)
Gamessuch as Nim are modelled as game trees. You start off with the initial state ofthe game as a node, the root of the tree. From this node, each possible move ismodelled as a link to another node, which stands in for another state orposition of the game.
图1 The first few levels in the noughts and crosses game tree.
So,for example, in noughts and crosses, the root node is the empty grid.Traditionally X starts and there are three possible moves: the centre, a cornerand the middle cell along an edge (all the cells are equivalent to one of thosethree). So, the initial root node has three links to other game states. Each ofthose new nodes has different possible moves for O, as shown in Figure 1 above.You can imagine going further and drawing more levels.
Nim’s tree is more complex. The initialstate has 15 possible links, corresponding to removing one, two, three, four orfive stones from each of the three piles. Each of these 15 possible states ofthe game then has up to 14 possible links to other states for the secondplayer, and so on. You can imagine that the number of game states (that is,nodes in the game tree) explodes pretty quickly.
Ifyou happened to have a big enough piece of paper, it would be possible to mapout the entire game tree for the version of Nim that I described. For the leafnodes of the tree (that is, the nodes with no links coming out of them), youwould be able to identify the loser of the game for the path taken through thetree to each particular leaf. Figure 2, below, shows a particularly daft paththrough the tree where the players take all the stones from each pile in turn(not exactly an insightful game, but nevertheless a possible one under therules). The loser is Max, because he takes all the stones from pile three inthe third move.
图2 An allowable but idiotic game play for Nim, resulting in Max losing.
Wecan assign a value to each leaf node to indicate who wins (or loses). To makesure we don’tget completely confused, we assign a monetary value from the viewpoint of thefirst player, Max. Let’ssay the winner of the path to the leaf receives £1 and the loser has to pay out that amount – so if the winner is Max, the value ofthe node is £1,while if the winner is Minnie, the value is -£1 (since Max has to pay that amount to her).
第一个玩家 (Player one)
Let’s imagine that we set up the entiregame tree from the viewpoint of Max, the player who makes his move first. Eachgame position corresponds to a node in the tree, and if you think about it, awhole level of the tree will correspond to a given player. So, the root of thetree is what Max is faced with at the very start of the game: five stones ineach of the three piles, and 15 possible game positions to leave for Minnie.What does Max choose to play in this situation? What he should do is analyseall possible moves from the bottom up and assign a value to each node as heworks his way up the tree, according to the amount he could win on that node ifhe played optimally.
图3 A simple choice in a game tree, to calculate the minimax value of the root node.
Let’s take a look at a made-up example,shown in Figure 3 above. Here, the root node shows a game position from whichMax must play. There are two possibilities: playing the left-hand option goesto a game position that he’salready worked out means he wins £1;playing the right-hand option goes to a game position where he loses £1. (Remember, all payouts are from Max’s viewpoint.) I don’t know about you, but I’d choose the first play. This meansthat the current game position also has a value of £1. For every game position where it’s his turn to play, Max would choose the option that wouldmaximise his winnings. Minnie, who is just as perceptive as Max, would, ofcourse, choose plays that would result in the best result for her and ignoreall the others. So she would always choose a play that maximised her winnings,which, from Max’sperspective, means minimising his.
Ifyou had the entire tree, you could work out a value for each node working fromthe bottom up. If it was a ‘Maxnode’ (that is,Max had to play from it), it would have a value that was the maximum of thechild nodes. If it was a ‘Minnienode’ it wouldhave a value that was the smallest (the minimum) of the child nodes. This, inessence, is the minimax algorithm: build the tree, work out the value of eachnode using an alternate minimise/maximise constraint, and the value of the rootis the value of the entire game for player one (Max, as we called him).
递归函数 (The recursive method)
Insteadof building the entire tree and then analysing it, the best approach is totraverse the tree recursively (a postfix traversal, in fact) and calculate whatyou need when you need it (and destroy the stuff you don’t need when you’re done). In essence, since a tree isdefined recursively, you calculate the minimax value by calculating the maximum(or minimum) of the minimaxes of all the child trees. Remember that the levelsalternate between maximising and minimising (sometimes you look at it fromMinnie’s viewpointinstead of Max’s).
Figure4, below, shows a very simplified Nim game (one pile of five stones, you canremove one, two or three stones each play), fully expanded into a game tree.The number inside each node is the number of stones left in the pile after themove, and the letter alongside each node is the minimax value for Max (W = win,L = lose). Note that the value of the game is L – that is, Max will always lose (if you like, this simplifiedNim is always a win for the second player).
图4 The complete game tree for a simplified Nim game
Althoughthe minimax algorithm is always guaranteed to find the best play for Max, thereis a big problem. The game tree can be huge – mind-bogglingly huge. Consider chess, the classic archetypeof a two-player zero-sum game. At each game position there could be somethinglike 30 possible moves. Since each chess game is made up of about 80 plays (40back-and-forths), it would mean that the lowest level of the tree would havesomething like 10118 nodes. (Note that in tournaments it’s rare for a game to Go to checkmate – the losing player is likely to resignwell before then.) As a comparison, there are around 10^80 atoms in theobservable universe, meaning that, in essence, there’s no possible way for a computer to map the entire chessgame tree. So what can we do?
Thefirst optimisation is to limit the depth to which we evaluate the game treeusing the minimax algorithm. Since we may not actually reach a leaf node indoing this, we make use of an approximation function – a heuristic –to approximate the value of the node or game position. Of necessity, this valueis not going to be accurate, but it will enable us to apply the minimaxalgorithm without having to evaluate all the nodes down to the leaves. Thebetter the heuristic, the better the chances of devising a winning game playand the more accurate our minimax values will be.
限制深度 (Limiting depth)
Inour recursive algorithm for minimax, we’ll need to limit the depth of the recursion instead ofallowing the recursion to reach the leaves. The simplest way to do this is topass a depth parameter to the recursive minimax function and decrement itsvalue at every recursive call. At the lowest level of the recursion, we use theheuristic function to calculate the minimax value of the current game position.
Now,the resulting minimax value at the root of the game tree is only going to be anapproximation. The deeper we allow the partial minimax algorithm to go, themore accurate its value will be (because we’re more likely to find leaf nodes in our traversal), but thelonger the traversal will take. We have to strike a balance between accuracyand the time taken to calculate the minimax value (and hence the move to play).
Onceit’s our turnagain to make a move, we should recalculate the minimax value at our new gameposition, making it, in effect, the root of the current state of the game.Every move would be made after a new minimax calculation based on the currentgame state.
Inmany chess programs that run on standard PC hardware, the depth of the minimaxsearch is limited to some six full-width levels – around a billion possible game positions. Any more thanthat and the time taken to analyse the game positions would be far too long tobe practical. For example, analysing positions at a rate of a million persecond, six full-width levels would take about a quarter of an hour.
Alpha-beta剪枝 (Alpha-beta pruning)
Firstproposed by John McCarthy at a conference in 1956 (although only named as suchlater on), alpha-beta pruning is a method for cutting off whole branches of thegame tree so that they don’thave to be evaluated with minimax. In essence, the algorithm maintains twoextra values during the minimax recursion: alpha and beta. Alpha is the minimumvalue for Max (biggest loss for him) and beta is the maximum value for Minnie(biggest win for Max). They start out as negative infinity for alpha andpositive infinity for beta. As the minimax recursion proceeds, the value foralpha is replaced when a new minimax value that is larger is found (ditto forbeta, when a smaller value is calculated). If they cross at any time, thebranch of the tree currently being investigated is no good for either playerand can be further ignored, or pruned. It can be shown that this algorithmdoesn’t mistakenlyprune branches that will benefit either player and so it’s widely used in minimaximplementations.