所以我为什么要用 python 写这个鬼玩意,导致现在搜三层有余搜四层搜不出来的尴尬境地。
核心就是简单的 min-max 搜索。那么每次我们需要评估每次落子后局面的价值。对于五子棋来说,这个价值和有多少活二、活三之类棋型有关,对于一个局部的某一方向上的棋型,我大概分成了十种情况,用 number, potential, sides
三个参数来进行描述,分别表示极大相连棋子数,是否有连成五子的潜力,以及自由端(没被地方堵死或者碰到边界)的数量。对于不同的三元组,给予不同的价值,那么整个局面的价值大概可以描述为每个位置上,每个方向上的棋型的价值之和。另外,有些局部棋型的组合可以创造额外的价值,比如双活三冲四活三等,通过计算这些价值可以使得 AI 的进攻更加凌厉。
但是,如果对于每个位置都这么算就太慢了。观察到在搜索树某一点上,我们只需要知道落子在不同位置的相对价值即可,因此可以只计算落子前后的价值增量,从而提高程序效率。
至于 alpha-beta 剪枝,网上资料挺多了,大概是假设 A,B 两人博弈,A 期望和最大,B 期望和最小,那么 alpha, beta 分别表示的是 A 已经能够保证答案大于等于 alpha,并且 B 已经能够保证答案小于等于 beta。因此,B决策节点的 beta 小于等于任何一个父节点的 alpha 时,直接返回;当一个A决策节点的 alpha 值大于等于任何一个父节点的 beta 值时,直接返回。
从上面的分析可知,alpha-beta 剪枝的效果和搜索顺序密切相关,如果总是先搜索可能性比较大的位置,那么搜索速度会大大加快。我对于每个位置的初步估计方法是当前落子在这里的增量加上对手落子在这里的增量,按照这个从大到小的顺序进行搜索。
写到这里,我的 AI 只能搜三层,而且运行结果和我想象的不一样,它只会自己下自己的,开局就一直往一个方向连。
思考良久,我发现搜三层这个做法是错误的。我试了下,搜两层就没有什么问题。原因是在搜索的三层里面 AI 下了两手,而玩家只下了一手,由于不同棋子数赋值的差距很大,因此 AI 最佳选择肯定是不停进攻,以获得更大的第二手收益。
我推测,搜索四层就不会出现这种问题,我简单下了几步,事实果然如此。
我的眼前出现了两种解决方法,一种是再优化我的程序,让它跑出第四层;二是通过手写达到尽可能好的效果。
经过两个多小时的奋斗,我最终还是没能跑出第四层。一旦下的步数多了,就需要等待好几十秒。
我接着想,既然第四层搜不出来,那能不能不搜索达到尽可能好的效果呢?如果 AI 智障的原因是想在第三层获得更多的利益,那么我把第三层的价值相对缩减会怎么样?经过试验,我发现效果出奇的好。只是改一下价值,就把一个完全只会进攻的智障变成了能和我对弈许久的 AI 了。但经过了数十局的博弈,我总结出了若干的问题,包括还是有的时候不防守。这些问题应该都是因为没有第四层这个硬伤导致的。
继续思考,能否达到“仿佛有”第四层的效果?假设 AI 下完第三层,如果是正常搜索,那么玩家在第四层肯定是挑权值最大的位置下。既然这样,那么我的 AI 在第三层就把这一步给堵上,怎么样呢?
事实证明,做到这一点的程序又有了提升。
于是这个坑终于填完了。
import pygame as pg
import sys
size = width, height = 700, 700
lastUpperLeft = None
roundNumber = 1
mousex, mousey = 0, 0
chessboard = pg.image.load(r'images\bg1.png')
cross = pg.image.load(r'images\cross1.png')
hover = pg.image.load(r'images\hover4.png')
stones = [None,
pg.image.load(r'images\storn_black.png'),
pg.image.load(r'images\storn_white.png')]
whichTurn = 1 # 1: black; -1: white
gridLength = 45.55
searchLimit = 3
direction = ((1, 0), (0, 1), (1, 1), (1, -1))
stoneWidth, stoneHeight = stones[1].get_width(), stones[1].get_height()
hover = pg.transform.scale(hover, (stoneWidth, stoneHeight))
cross = pg.transform.scale(cross, (stoneWidth, stoneHeight))
grid = [[0 for col in range(15)] for row in range(15)]
gridCenters = [[(gridLength * (row + 1) - 14, gridLength * (col + 1) - 14)
for col in range(15)] for row in range(15)]
gridUpperLeft = [[(gridLength * (row + 1) - 14 - stoneWidth / 2, gridLength * (col + 1) - 14 - stoneHeight / 2)
for col in range(15)] for row in range(15)]
def drawMap():
screen.blit(chessboard, (0, 0))
for row in range(15):
for col in range(15):
if grid[row][col]:
screen.blit(stones[grid[row][col]], gridUpperLeft[row][col])
if grid[mousex][mousey] == 0:
screen.blit(hover, gridUpperLeft[mousex][mousey])
if lastUpperLeft != None:
screen.blit(cross, lastUpperLeft)
pg.display.flip()
def checkWin():
'''
返回刚刚落子的玩家是否胜利
'''
for k in range(4):
for row in range(15):
for col in range(15):
if not grid[row][col]:
continue
i, j = row, col; count = 0
for step in range(5):
if 0 <= i < 15 and 0 <= j < 15:
count += grid[i][j] == grid[row][col]
i += direction[k][0]
j += direction[k][1]
if count == 5:
i, j = row, col
for step in range(5):
if 0 <= i < 15 and 0 <= j < 15:
screen.blit(cross, gridUpperLeft[i][j])
i += direction[k][0]
j += direction[k][1]
pg.display.flip()
return True
return False
def getPosition(position):
'''
获取距离当前鼠标最近的格点,无返回值
'''
global mousex, mousey
for row in range(15):
if gridCenters[row][0][0] + gridLength/2 > position[0]:
mousex = row
break
for col in range(15):
if gridCenters[0][col][1] + gridLength/2 > position[1]:
mousey = col
break
def mouseClick():
'''
玩家落子
'''
global whichTurn, lastUpperLeft, roundNumber
roundNumber += 1
grid[mousex][mousey] = whichTurn
lastUpperLeft = gridUpperLeft[mousex][mousey]
whichTurn *= -1
def AImove():
'''
AI落子
'''
global whichTurn, lastUpperLeft, roundNumber
roundNumber += 1
tmp = maxSearch(-1e18, 1e18, 1, whichTurn)
x, y = tmp[0]
grid[x][y] = whichTurn
lastUpperLeft = gridUpperLeft[x][y]
whichTurn *= -1
def countStones(row, col):
count = 0
for i in range(-2, 3):
for j in range(-2, 3):
if 0 <= row + i < 15 and 0 <= col + j < 15 and grid[row + i][col + j]:
count += 5 - abs(i) + abs(j)
return count
def maxSearch(alpha, beta, depth, color):
'''
例如A期望和最大,B期望和最小
那么alpha, beta 分别表示的是A已经能够保证答案大于等于alpha,并且B已经能够保证答案小于等于beta
因此,B决策节点的beta小于等于任何一个父节点的alpha时,直接返回
当一个A决策节点的alpha值大于等于任何一个父节点的beta值时,直接返回
'''
ksj = -1e18
t = []
for row in range(15):
for col in range(15):
count = countStones(row, col)
if count > 2:
t.append([row, col, evaluate(row, col, color, True)])
t.sort(key = lambda s: -s[2])
for i, v in enumerate(t):
row, col = v[0], v[1]
if not grid[row][col]:
if depth != searchLimit:
grid[row][col] = color
ddd = evaluate(row, col, color, True)
tmp = ddd + minSearch(alpha - ddd, beta - ddd, depth + 1, -color)[1]
grid[row][col] = 0
else:
tmp = evaluate(row, col, color) // 10
if tmp > ksj:
bestPostion = (row, col)
ksj = tmp
alpha = max(alpha, ksj)
if alpha >= beta:
return bestPostion, ksj
return (bestPostion, ksj)
def minSearch(alpha, beta, depth, color):
ksj = 1e18
t = []
for row in range(15):
for col in range(15):
count = countStones(row, col)
if count > 2:
t.append([row, col, evaluate(row, col, color, True)])
t.sort(key = lambda s: -s[2])
for v in t:
row, col = v[0], v[1]
if not grid[row][col]:
if depth != searchLimit:
grid[row][col] = color
ddd = -evaluate(row, col, color)
if ddd <= -1000000000: # 这个时候其实玩家已经赢了,不应该接着搜索下去了,因此tmp应该赋值为负无穷。
tmp = -1000000000
else:
tmp = ddd + maxSearch(alpha - ddd, beta - ddd, depth + 1, -color)[1]
grid[row][col] = 0
else:
tmp = -evaluate(row, col, color, True)
if tmp < ksj:
bestPostion = (row, col)
ksj = tmp
beta = min(beta, ksj)
if alpha >= beta:
return bestPostion, ksj
return (bestPostion, ksj)
def getLine(row, col, k, color):
'''
落子在 (row, col),返回 k 方向上的信息
返回 (是否有潜力,连子数量,自由端数量,再加上一个棋子是否能够连成四个)
'''
i, j = row + direction[k][0], col + direction[k][1]
count = 1; empty1, empty2 = 0, 0; sides = 0
while 0 <= i < 15 and 0 <= j < 15 and grid[i][j] == color:
count += 1
i += direction[k][0]
j += direction[k][1]
if 0 <= i < 15 and 0 <= j < 15 and grid[i][j] == 0:
i += direction[k][0]
j += direction[k][1]
sides += 1
while 0 <= i < 15 and 0 <= j < 15 and (grid[i][j] == 0 or grid[i][j] == color):
empty1 += 1
i += direction[k][0]
j += direction[k][1]
i, j = row - direction[k][0], col - direction[k][1]
while 0 <= i < 15 and 0 <= j < 15 and grid[i][j] == color:
count += 1
i -= direction[k][0]
j -= direction[k][1]
if 0 <= i < 15 and 0 <= j < 15 and grid[i][j] == 0:
i -= direction[k][0]
j -= direction[k][1]
sides += 1
while 0 <= i < 15 and 0 <= j < 15 and (grid[i][j] == 0 or grid[i][j] == color):
empty2 += 1
i -= direction[k][0]
j -= direction[k][1]
nearFour = count + empty1 >= 3 or count + empty2 >= 3
return (empty1 + empty2 + count + sides >= 5, count, sides, nearFour)
def evaluate(row, col, color, general = False):
'''
计算落子在 (row, col) 的时候,整个局面的价值的增量
**注意是增量**
注意当返回值大于等于1e9的时候说明已经练成了五个子,应该立刻终止搜索。
'''
#deltaValue = 0
sum = [0] * 6; nearFour = 0
deltaValue = 14 - abs(row - 7) - abs(col - 7)
for k in range(4):
tmp = getLine(row, col, k, color)
if tmp[0]:
ksj = value(tmp[1], tmp[2])
deltaValue += ksj[0]
sum[ksj[1]] += 1
nearFour += tmp[3]
if sum[3] and sum[4] or sum[3] and nearFour or nearFour > 1:
deltaValue += 400000
if sum[3] >= 2:
deltaValue += 20000
if general:
sum = [0] * 6; nearFour = 0
for k in range(4):
tmp = getLine(row, col, k, -color)
if tmp[0]:
ksj = value(tmp[1], tmp[2])
deltaValue += ksj[0] // 1.1
sum[ksj[1]] += 1
nearFour += tmp[3]
if sum[3] and sum[4] or sum[3] and nearFour or nearFour > 1:
deltaValue += 400000
if sum[3] >= 2:
deltaValue += 20000
return deltaValue
stateValue = [
[(2, 0), (15, 0)],
[(10, 0), (200, 0)],
[(100, 0), (2000, 3)],
[(1600, 4), (50000, 4)],
[(1000000000, 5), (1000000000, 5)]
]
def value(number, sides):
'''
计算局部棋子价值
有连成五个子的潜力时才进此函数
'''
return stateValue[min(number-1, 4)][sides - 1]
if __name__ == '__main__':
pg.init()
hasWin = False
clock = pg.time.Clock()
screen = pg.display.set_mode(size)
# AI下第一步
grid[7][7] = whichTurn
lastUpperLeft = gridUpperLeft[7][7]
whichTurn *= -1
drawMap()
while True:
clock.tick(60)
for event in pg.event.get():
if event.type == pg.QUIT:
sys.exit()
pg.quit()
if not hasWin:
if event.type == pg.MOUSEBUTTONDOWN:
getPosition(event.pos)
if not grid[mousex][mousey]:
mouseClick()
drawMap()
if checkWin():
print('YOU WIN')
hasWin = True
break
AImove()
drawMap()
if checkWin():
print('YOU LOSE')
hasWin = True
if event.type == pg.MOUSEMOTION:
getPosition(event.pos)
drawMap()
if hasWin and event.type == pg.KEYDOWN:
sys.exit()
pg.quit()