Understanding The Minimax Algorithm-理解极小极大搜索算法
One of the most interesting avenues of computerscience is that of programming a computer to play a game against a humanopponent. Examples abound, with the most famous that of programming a computerto play chess. But no matter what the game is, the programming tends to followan algorithm called minimax, with various attendant sub-algorithms in tow.
计算机科学中最有趣的事情之一就是编写一个人机博弈的程序。有大量的例子,最出名的是编写一个国际象棋的博弈机器。但不管是什么游戏,程序趋向于遵循一个被称为极小极大的算法,伴随着各种各样的子算法在一块。
First,a definition: a two-player zero-sum game is one played between two playerswhere the players play alternately, the whole game is visible to both and there’s a winner and a loser (or there’s a draw). It’s zero-sum because if the game is played for money, theloser pays the winner and overall there’s no loss of money. (A bit like energy in a reaction: nomoney is created or destroyed.)
Oneof the simplest two-player zero-sum games is noughts and crosses, where theplayers alternately place Xs and Os in a 3 x 3 grid, with the winner being thefirst player to place three of their symbol in a row, column or diagonal line.Like me, you probably played this as a child and, as you played it, you learnedhow to force a win or draw every time. In fact, once both players get thatinsight, every game is guaranteed to result in a draw. The only way to win isto play a novice player.
算法 (The algorithm)
Analysingnoughts and crosses with the minimax algorithm is pretty standard in gametheory, so I’lldiscuss a different game called Nim to illustrate minimax and its variants. Nimis interesting because it’seasily understood, fairly unfamiliar and simply modelled. Plus, there are nodraws in Nim, so the whole winner/loser thing is much simpler: someone alwayswins. But who?
用极小极大算法来分析井字棋,在博弈论中是相当常见的。因此我将谈论一个不一样的游戏,叫做Nim(取物游戏),用来说明极小极大及其一些变形。因为取物游戏容易理解、非常陌生和建模简单,它是有趣的。另外,在取物游戏中没有和局,因此整个输赢情况更加简单:总会有人胜出。但那是谁呢?
InNim, the players face three piles of stones with, say, five stones in eachpile. Each player takes it in turn to play by removing from a single pileanything from one stone to the entire pile. The loser is the one who is forcedto remove the final stone from the final pile, leaving all three piles empty.(Another way of looking at it is that the winner is the first player to befaced with three empty piles.)
在取物游戏中,玩家面对3堆石子,每堆里分别有5个石子。每个玩家每轮只能在其中一个堆中取走任意数量的石子或整堆。输家是被迫取走最后一个石子的那个,这时3堆石子都被取空了。(换个角度看就是,赢家是第一个面对3堆石子被清空的玩家)
Forexample, suppose our two players are named Max and Minnie. Max starts (healways does, not being a gentleman) and decides to remove all the stones frompile one. Minnie then removes all but two stones from pile two. Max thinks fora while, then removes all but two stones from pile three. Minnie resigns,because no matter what she does, Max will win. (If she removes one stone from apile, Max removes both stones from the other, and she’s left with the final stone. If she removes both stones froma pile, Max removes one stone from the other, leaving her with the finalstone.)
举个例子,假设我们的两个玩家的名字是Max和Minnie。Max先下(他总是这样,没有绅士风度)并决定取走第一堆的所有石子。然后Minnie从第二堆取走一些石子,只留下两个。Max思考了一会儿,然后从第三堆中取走一些石子,留下了2个。然后Minnie弃权了,因为无论她怎么做,Max都会胜出。(如果她从任一堆取走一个石子,Max取走另一堆的所有石子,留给Minnie的是最后一个石子。如果Minnie取走任一堆的所有石子,Max取走另一堆中的一个石子,留给Minnie的还是最后一个石子。)
遍历节点 (Traversing nodes)
Gamessuch as Nim are modelled as game trees. You start off with the initial state ofthe game as a node, the root of the tree. From this node, each possible move ismodelled as a link to another node, which stands in for another state orposition of the game.
图1 The first few levels in the noughts and crosses game tree.
So,for example, in noughts and crosses, the root node is the empty grid.Traditionally X starts and there are three possible moves: the centre, a cornerand the middle cell along an edge (all the cells are equivalent to one of thosethree). So, the initial root node has three links to other game states. Each ofthose new nodes has different possible moves for O, as shown in Figure 1 above.You can imagine going further and drawing more levels.
因此,作为一个例子,在井字棋中,根节点是一个空的网格。习惯上X先下,有3种可能的走法:正中、角上和靠边的中间(其他的走法等同于这三个中的一个)。因此,初始的根节点有3个关联的博弈状态。对于O,这些新节点每个都有不同可能的走法,如上面图1总所示。你可以作进一步猜想并绘出更多的层级。
Nim’s tree is more complex. The initialstate has 15 possible links, corresponding to removing one, two, three, four orfive stones from each of the three piles. Each of these 15 possible states ofthe game then has up to 14 possible links to other states for the secondplayer, and so on. You can imagine that the number of game states (that is,nodes in the game tree) explodes pretty quickly.
Nim的博弈树更复杂。初始状态就有15种可能性,对应于3个堆中分别取走1,2,3,4或5个石子。这15种可能的博弈状态的每一个都连接着14种可能的状态(对于第二个玩家来说),如此类推。你可以想象到博弈状态(即博弈树中的节点)的数量在爆炸式地增长。
Ifyou happened to have a big enough piece of paper, it would be possible to mapout the entire game tree for the version of Nim that I described. For the leafnodes of the tree (that is, the nodes with no links coming out of them), youwould be able to identify the loser of the game for the path taken through thetree to each particular leaf. Figure 2, below, shows a particularly daft paththrough the tree where the players take all the stones from each pile in turn(not exactly an insightful game, but nevertheless a possible one under therules). The loser is Max, because he takes all the stones from pile three inthe third move.
图2 An allowable but idiotic game play for Nim, resulting in Max losing.
Wecan assign a value to each leaf node to indicate who wins (or loses). To makesure we don’tget completely confused, we assign a monetary value from the viewpoint of thefirst player, Max. Let’ssay the winner of the path to the leaf receives £1 and the loser has to pay out that amount – so if the winner is Max, the value ofthe node is £1,while if the winner is Minnie, the value is -£1 (since Max has to pay that amount to her).
我们可以赋予每一个叶节点一个值来指出谁胜出(或输了)。为了确保不造成困惑,我们以第一个玩家Max的视角并分配一个货币值。我们设定该路径中胜出的一方获得£1,输了的一方需要支付同等数量的货币给对方——因此,如果赢家是Max,节点的值将是£1,如果赢家是Minnie则值是-£1(因为Max需要支持同等数额给她)。
第一个玩家 (Player one)
Let’s imagine that we set up the entiregame tree from the viewpoint of Max, the player who makes his move first. Eachgame position corresponds to a node in the tree, and if you think about it, awhole level of the tree will correspond to a given player. So, the root of thetree is what Max is faced with at the very start of the game: five stones ineach of the three piles, and 15 possible game positions to leave for Minnie.What does Max choose to play in this situation? What he should do is analyseall possible moves from the bottom up and assign a value to each node as heworks his way up the tree, according to the amount he could win on that node ifhe played optimally.
图3 A simple choice in a game tree, to calculate the minimax value of the root node.
Let’s take a look at a made-up example,shown in Figure 3 above. Here, the root node shows a game position from whichMax must play. There are two possibilities: playing the left-hand option goesto a game position that he’salready worked out means he wins £1;playing the right-hand option goes to a game position where he loses £1. (Remember, all payouts are from Max’s viewpoint.) I don’t know about you, but I’d choose the first play. This meansthat the current game position also has a value of £1. For every game position where it’s his turn to play, Max would choose the option that wouldmaximise his winnings. Minnie, who is just as perceptive as Max, would, ofcourse, choose plays that would result in the best result for her and ignoreall the others. So she would always choose a play that maximised her winnings,which, from Max’sperspective, means minimising his.
让我们来看一下上面图3中展示的一个捏造的例子。所显示的根节点是Max所需要下棋的位置。有两种可能的选择:走左边,已经知道会获胜;走右边,已知道会输掉(记住,所有支出都是站在Max的立场来看的)。我不知道你会怎样选择,但我会选择第一个,这意味着当前博弈位置的值也为£1。对于每个轮到Max下的博弈位置,Max都会选择最大化其利益的选项。Minnie和Max一样,将会选择对其最有利的选项。因此,她始终是最大化自己的获益,在Max看来就是最小化他的利益。
Ifyou had the entire tree, you could work out a value for each node working fromthe bottom up. If it was a ‘Maxnode’ (that is,Max had to play from it), it would have a value that was the maximum of thechild nodes. If it was a ‘Minnienode’ it wouldhave a value that was the smallest (the minimum) of the child nodes. This, inessence, is the minimax algorithm: build the tree, work out the value of eachnode using an alternate minimise/maximise constraint, and the value of the rootis the value of the entire game for player one (Max, as we called him).
如果你拥有整棵博弈树,你就能够自底向上地给每个节点算出一个值。如果是属于Max的节点(即轮到Max走棋),它的值将会是其子节点中的最大值。如果它是属于Minnie的节点,它的值会是其子节点中的最小值。实质上,这就是极小极大算法:构造博弈树,交替地使用最小/最大约束来算出每个节点的值。对先下的玩家(这里是Max)来说,根节点的值就是整个博弈的值。
递归函数 (The recursive method)
Insteadof building the entire tree and then analysing it, the best approach is totraverse the tree recursively (a postfix traversal, in fact) and calculate whatyou need when you need it (and destroy the stuff you don’t need when you’re done). In essence, since a tree isdefined recursively, you calculate the minimax value by calculating the maximum(or minimum) of the minimaxes of all the child trees. Remember that the levelsalternate between maximising and minimising (sometimes you look at it fromMinnie’s viewpointinstead of Max’s).
相对于构造并分析整棵博弈树,最好的途径是递归遍历博弈树(实际上是一个后缀遍历)并在你需要时计算出所需的数值(同时在你完成时销毁你不需要的杂项)。实质上,由于博弈树被以递归的方式来遍历,所以通过计算各个子树的极小极大的最大(或最小)值来算出极小极大值。记住相邻层级是最大化和最小化相互交替的(或许你会以Minnie的视角来进行观察,而非Max)。
Figure4, below, shows a very simplified Nim game (one pile of five stones, you canremove one, two or three stones each play), fully expanded into a game tree.The number inside each node is the number of stones left in the pile after themove, and the letter alongside each node is the minimax value for Max (W = win,L = lose). Note that the value of the game is L – that is, Max will always lose (if you like, this simplifiedNim is always a win for the second player).
图4 The complete game tree for a simplified Nim game
Althoughthe minimax algorithm is always guaranteed to find the best play for Max, thereis a big problem. The game tree can be huge – mind-bogglingly huge. Consider chess, the classic archetypeof a two-player zero-sum game. At each game position there could be somethinglike 30 possible moves. Since each chess game is made up of about 80 plays (40back-and-forths), it would mean that the lowest level of the tree would havesomething like 10118 nodes. (Note that in tournaments it’s rare for a game to Go to checkmate – the losing player is likely to resignwell before then.) As a comparison, there are around 10^80 atoms in theobservable universe, meaning that, in essence, there’s no possible way for a computer to map the entire chessgame tree. So what can we do?
尽管极小极大算法始终确保为Max找到最好的走法,但存在一个大问题。博弈树可能很庞大——惊人的庞大。例如国际象棋,一个二人零和博弈的经典原型。每个博弈位置都有30种可能的走法。由于每盘博弈都大约要走80步(40个回合),这意味着树的最底层会有将近10118个节点。(注意,在比赛中很少有被将军的情况——失利的一方会提前放弃。)作为对照,在宏观宇宙中有将近10^8种原子,这意味着,实质上计算机是不可能构造出整个象棋的博弈树的。那我们能做些什么呢?
Thefirst optimisation is to limit the depth to which we evaluate the game treeusing the minimax algorithm. Since we may not actually reach a leaf node indoing this, we make use of an approximation function – a heuristic –to approximate the value of the node or game position. Of necessity, this valueis not going to be accurate, but it will enable us to apply the minimaxalgorithm without having to evaluate all the nodes down to the leaves. Thebetter the heuristic, the better the chances of devising a winning game playand the more accurate our minimax values will be.
第一项优化是在极小极大算法中限制我们对博弈树估算的深度。这么做的话我们可能没有真正地遍历到叶节点,因此我们使用一个逼近函数——启发式的——的近似值作为节点或博弈位置的值。不可避免的,该数值是不精确的,但它能让我们在使用极小极大算法时不必对底层的所有节点进行的估值。启发性能越好,越能发现制胜的棋步,也更接近精确的极小极大值。
限制深度 (Limiting depth)
Inour recursive algorithm for minimax, we’ll need to limit the depth of the recursion instead ofallowing the recursion to reach the leaves. The simplest way to do this is topass a depth parameter to the recursive minimax function and decrement itsvalue at every recursive call. At the lowest level of the recursion, we use theheuristic function to calculate the minimax value of the current game position.
对于极小极大的递归算法,我们需要限制递归的深度而不是让他一直递归到叶节点。最简单的实现方法是将一个深度参数传递给递归的极小极大函数并在每次递归中减少它的值。在最底层的递归,我们使用启发式函数计算出当前博弈位置的极小极大值。
Now,the resulting minimax value at the root of the game tree is only going to be anapproximation. The deeper we allow the partial minimax algorithm to go, themore accurate its value will be (because we’re more likely to find leaf nodes in our traversal), but thelonger the traversal will take. We have to strike a balance between accuracyand the time taken to calculate the minimax value (and hence the move to play).
现在得到的博弈树的根节点的极小极大值仅仅是一个近似值。极小极大算法探索得越深,该值将越精确(因为我们更有机会遍历到叶节点),但会耗费更长的时间。我们计算极小极大值(指导如何走棋)时需要权衡精确度和耗时。
Onceit’s our turnagain to make a move, we should recalculate the minimax value at our new gameposition, making it, in effect, the root of the current state of the game.Every move would be made after a new minimax calculation based on the currentgame state.
每当再次轮到我们下棋,对于新的博弈位置,我们需要重新计算它的极小极大值。每步移动都是根据基于当前博弈状态计算出的极小极大值做出的决定。
Inmany chess programs that run on standard PC hardware, the depth of the minimaxsearch is limited to some six full-width levels – around a billion possible game positions. Any more thanthat and the time taken to analyse the game positions would be far too long tobe practical. For example, analysing positions at a rate of a million persecond, six full-width levels would take about a quarter of an hour.
在很多运行在标准PC硬件的国际象棋程序中,极小极大搜索的深度被限制在6层左右——包含了十亿个可能的博弈位置。超过这个层数会导致的分析博弈位置的耗时更长,这是不现实的。例如,以1百万/s的比率分析博弈位置,6层的深度需要耗费约一刻钟。
Alpha-beta剪枝 (Alpha-beta pruning)
Firstproposed by John McCarthy at a conference in 1956 (although only named as suchlater on), alpha-beta pruning is a method for cutting off whole branches of thegame tree so that they don’thave to be evaluated with minimax. In essence, the algorithm maintains twoextra values during the minimax recursion: alpha and beta. Alpha is the minimumvalue for Max (biggest loss for him) and beta is the maximum value for Minnie(biggest win for Max). They start out as negative infinity for alpha andpositive infinity for beta. As the minimax recursion proceeds, the value foralpha is replaced when a new minimax value that is larger is found (ditto forbeta, when a smaller value is calculated). If they cross at any time, thebranch of the tree currently being investigated is no good for either playerand can be further ignored, or pruned. It can be shown that this algorithmdoesn’t mistakenlyprune branches that will benefit either player and so it’s widely used in minimaximplementations.
alpha-beta剪枝是由JohnMcCarthy在1956年的一次会议中首先被提出(尽管该命名是后来的事了)。alpha-beta剪枝是一个用来裁剪掉博弈树某个完整分支的方法,因此这些分支不需要被极小极大进行评估。实质上,算法在极小极大的递归中维护两个额外的值:alpha和beta。alpha是Max的最小值(对Max来说,是其最大损失),beta是Minnie的最大值(对Max来说,是其最大得益)。一开始,alpha的取值为负无穷,beta的取值为正无穷。在极小极大递归的过程中,当极小极大值比alpha更大时,alpha被替换为该值(beta也是一样的,当算出的值比其更小时)。如果它们在某个时刻相交了(即alpha>=beta),那么当前查找的分支对于所有玩家都不会带来得益,因此可以被忽略掉,或裁剪掉。这表明算法不会错误地裁剪掉对任何一方玩家有利的分支,因此alpha-beta剪枝被广泛地应用于极小极大的实现中。
转自:http://blog.csdn.net/cnlht/article/details/19233323