学习和理解alphazero 算法
相关学习材料 https://github.com/chiefzzs/alphago_learnning/
参考:https://github.com/junxiaosong/AlphaZero_Gomoku
序号 | 步骤 | 代码 |
---|---|---|
1 | 理解下棋过程 | 代码参考 |
2 | 理解下棋算法过程 |
序号 | l1步骤 | l2步骤 | l3步骤 | l4步骤 |
---|---|---|---|---|
1 | 初始化 | |||
2 | 对弈过程 | |||
2.1 | 局面分析 | |||
2.1.1 | 蒙特卡洛下棋 |
定义棋盘
定义游戏
定义棋手
绑定棋手到棋盘
#变量定义
n = 5
width, height = 8, 8
model_file = 'best_policy_8_8_5.model'
#初始化棋盘
board = Board(width=width, height=height, n_in_row=n)
game = Game(board)
# ############### human VS AI ###################
# load the trained policy_value_net in either Theano/Lasagne, PyTorch or TensorFlow
# best_policy = PolicyValueNet(width, height, model_file = model_file)
# mcts_player = MCTSPlayer(best_policy.policy_value_fn, c_puct=5, n_playout=400)
# load the provided model (trained in Theano/Lasagne) into a MCTS player written in pure numpy
#得到策略
best_policy = PolicyValueNetNumpy(width, height, model_file)
#依据智能初始化棋手1
mcts_player1 = MCTSPlayer(best_policy.policy_value_fn,
c_puct=5,
n_playout=400) # set larger n_playout for better performance
#依据智能初始化棋手2
mcts_player2 = MCTSPlayer(best_policy.policy_value_fn,
c_puct=5,
n_playout=400)
# uncomment the following line to play with pure MCTS (it's much weaker even with a larger n_playout)
# mcts_player = MCTS_Pure(c_puct=5, n_playout=1000)
# human player, input your move in the format: 2,3
human = Human()
#依据棋手设置棋盘
player1 = mcts_player1
player2 = mcts_player2
start_player=0
is_shown=1
game.board.init_board(start_player)
p1, p2 = game.board.players
player1.set_player_ind(p1)
player2.set_player_ind(p2)
players = {p1: player1, p2: player2}
self.width = int(kwargs.get('width', 8))
self.height = int(kwargs.get('height', 8))
# board states stored as a dict,
# key: move as location on the board,
# value: player as pieces type
self.states = {}
# need how many pieces in a row to win
self.n_in_row = int(kwargs.get('n_in_row', 5))
self.players = [1, 2] # player1 and player2
def current_state(self):
"""return the board state from the perspective of the current player.
state shape: 4*width*height
"""
square_state = np.zeros((4, self.width, self.height))
if self.states:
moves, players = np.array(list(zip(*self.states.items())))
move_curr = moves[players == self.current_player]
move_oppo = moves[players != self.current_player]
square_state[0][move_curr // self.width,
move_curr % self.height] = 1.0
square_state[1][move_oppo // self.width,
move_oppo % self.height] = 1.0
# indicate the last move location
square_state[2][self.last_move // self.width,
self.last_move % self.height] = 1.0
if len(self.states) % 2 == 0:
square_state[3][:, :] = 1.0 # indicate the colour to play
return square_state[:, ::-1, :]
依据当前局面,得到当前局面的全部可能"下发"acts 和 推荐概率 probs
##计算概率
current_player = game.board.get_current_player()
player_in_turn = players[current_player]
board=game.board
temp=1e-3
return_prob=0
sensible_moves = board.availables
move_probs = np.zeros(board.width*board.height)
## 依据当前局面,得到每个步骤
acts, probs = player_in_turn.mcts.get_move_probs(board, temp)
print(acts)
print(probs)
输出:
每个可以下的位置推荐的概率
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63)
[0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 5.91871068e-107
1.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 1.53059365e-139
2.92327048e-039 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000]
1、采用蒙特卡洛下棋
2、依据蒙特卡洛树来统计每个动作的访问次数
3、对访问次数做softmax归一化得到概率。
state=board
temp=1e-3
##
state=board
temp=1e-3
#蒙特卡洛下棋
for n in range(player_in_turn.mcts._n_playout):
state_copy = copy.deepcopy(state)
player_in_turn.mcts._playout(state_copy)
# 依据访问来计算可能性
# calc the move probabilities based on visit counts at the root node
act_visits = [(act, node._n_visits)
for act, node in player_in_turn.mcts._root._children.items()]
acts, visits = zip(*act_visits)
# 归一化
act_probs = softmax(1.0/temp * np.log(np.array(visits) + 1e-10))
print("---------acts----------")
print(acts)
print(visits)
print(act_probs)
当前局
蒙特卡洛树表达:采用mermaid表达
def printNode(node,level=0,act=0):
if(node._n_visits<2):
return
str = ' '*level
print("%s, act=%d,_n_visits=%d ,_Q=%f ,_u=%f ,_P=%f " %(str,act,node._n_visits,node._Q,node._u,node._P))
for act,child in node._children.items():
printNode(child,level+1,act)
prn_obj(game.board)
printNode(player_in_turn.mcts._root)
蒙特卡洛树表达1:采用缩进表达。
, act=0,_n_visits=800 ,_Q=-0.455971 ,_u=0.000000 ,_P=1.000000
, act=18,_n_visits=8 ,_Q=-0.185447 ,_u=0.781758 ,_P=0.049782
, act=27,_n_visits=3 ,_Q=-0.035337 ,_u=1.092187 ,_P=0.330246
, act=36,_n_visits=2 ,_Q=0.365520 ,_u=0.683974 ,_P=0.155111
, act=19,_n_visits=4 ,_Q=0.219712 ,_u=0.339472 ,_P=0.012010
, act=28,_n_visits=3 ,_Q=-0.214190 ,_u=1.373586 ,_P=0.475824
, act=27,_n_visits=2 ,_Q=0.232468 ,_u=1.274235 ,_P=0.360408
, act=20,_n_visits=2 ,_Q=0.080553 ,_u=0.501082 ,_P=0.010636
, act=21,_n_visits=12 ,_Q=0.058928 ,_u=0.458884 ,_P=0.042209
, act=27,_n_visits=2 ,_Q=-0.063028 ,_u=0.487490 ,_P=0.088190
, act=28,_n_visits=5 ,_Q=0.071194 ,_u=1.033736 ,_P=0.311683
, act=29,_n_visits=2 ,_Q=-0.497347 ,_u=1.085420 ,_P=0.217084
, act=35,_n_visits=2 ,_Q=-0.243935 ,_u=0.904106 ,_P=0.163559
, act=36,_n_visits=2 ,_Q=-0.152409 ,_u=0.580145 ,_P=0.104952
, act=26,_n_visits=2 ,_Q=-0.069493 ,_u=0.454502 ,_P=0.009647
, act=27,_n_visits=157 ,_Q=0.476385 ,_u=0.151632 ,_P=0.168441
, act=8,_n_visits=2 ,_Q=-0.329893 ,_u=0.081594 ,_P=0.003920
, act=9,_n_visits=2 ,_Q=-0.358136 ,_u=0.039543 ,_P=0.001900
, act=11,_n_visits=2 ,_Q=-0.359925 ,_u=0.206617 ,_P=0.009926
, act=18,_n_visits=6 ,_Q=-0.410891 ,_u=0.412434 ,_P=0.046230
, act=28,_n_visits=2 ,_Q=0.277199 ,_u=0.784231 ,_P=0.210431
, act=35,_n_visits=3 ,_Q=0.493989 ,_u=1.036746 ,_P=0.278188
, act=19,_n_visits=2 ,_Q=-0.307951 ,_u=1.972537 ,_P=0.557918
, act=20,_n_visits=18 ,_Q=-0.186914 ,_u=0.173711 ,_P=0.052850
, act=18,_n_visits=2 ,_Q=-0.205378 ,_u=0.827569 ,_P=0.120429
, act=19,_n_visits=2 ,_Q=0.284847 ,_u=0.473225 ,_P=0.068864
, act=28,_n_visits=8 ,_Q=0.161790 ,_u=0.918767 ,_P=0.401101
, act=29,_n_visits=6 ,_Q=0.063473 ,_u=1.567840 ,_P=0.711105
, act=26,_n_visits=2 ,_Q=-0.019465 ,_u=0.769820 ,_P=0.206564
, act=35,_n_visits=2 ,_Q=-0.016241 ,_u=0.788532 ,_P=0.141057
, act=36,_n_visits=4 ,_Q=0.281107 ,_u=1.072277 ,_P=0.208052
, act=18,_n_visits=2 ,_Q=0.176588 ,_u=1.850940 ,_P=0.427456
, act=21,_n_visits=2 ,_Q=-0.390710 ,_u=0.288127 ,_P=0.013841
, act=28,_n_visits=3 ,_Q=-0.636589 ,_u=0.534831 ,_P=0.034257
, act=36,_n_visits=2 ,_Q=0.785911 ,_u=1.563539 ,_P=0.442236
, act=29,_n_visits=4 ,_Q=-0.255149 ,_u=0.262146 ,_P=0.020988
, act=36,_n_visits=3 ,_Q=0.326441 ,_u=1.556929 ,_P=0.539336
, act=34,_n_visits=3 ,_Q=-0.541608 ,_u=0.435908 ,_P=0.027920
, act=35,_n_visits=2 ,_Q=0.636979 ,_u=1.690728 ,_P=0.478210
, act=35,_n_visits=3 ,_Q=-0.782964 ,_u=0.598944 ,_P=0.038363
, act=36,_n_visits=2 ,_Q=0.930136 ,_u=1.706458 ,_P=0.482659
, act=36,_n_visits=50 ,_Q=-0.550742 ,_u=0.570920 ,_P=0.457102
, act=20,_n_visits=2 ,_Q=-0.129428 ,_u=0.938021 ,_P=0.080402
, act=28,_n_visits=21 ,_Q=0.608117 ,_u=0.551983 ,_P=0.331190
, act=26,_n_visits=2 ,_Q=-0.587251 ,_u=0.623344 ,_P=0.083630
, act=29,_n_visits=16 ,_Q=-0.562059 ,_u=1.026563 ,_P=0.734549
, act=22,_n_visits=3 ,_Q=0.722939 ,_u=0.781887 ,_P=0.161506
, act=26,_n_visits=2 ,_Q=0.218605 ,_u=0.925751 ,_P=0.143417
, act=35,_n_visits=3 ,_Q=0.465863 ,_u=0.908840 ,_P=0.140797
, act=43,_n_visits=2 ,_Q=-0.373646 ,_u=2.861129 ,_P=0.809249
, act=43,_n_visits=7 ,_Q=0.671767 ,_u=0.728113 ,_P=0.300797
, act=35,_n_visits=6 ,_Q=-0.626132 ,_u=1.928775 ,_P=0.944903
, act=34,_n_visits=3 ,_Q=0.464121 ,_u=1.722878 ,_P=0.462297
, act=37,_n_visits=2 ,_Q=0.737424 ,_u=1.495213 ,_P=0.401208
, act=34,_n_visits=2 ,_Q=-0.332387 ,_u=0.668018 ,_P=0.057259
, act=35,_n_visits=24 ,_Q=0.645149 ,_u=0.532131 ,_P=0.380093
, act=19,_n_visits=2 ,_Q=-0.766992 ,_u=0.776434 ,_P=0.097139
, act=43,_n_visits=18 ,_Q=-0.593014 ,_u=0.916555 ,_P=0.688014
, act=19,_n_visits=2 ,_Q=0.045573 ,_u=1.172094 ,_P=0.170565
, act=28,_n_visits=2 ,_Q=0.529674 ,_u=0.760434 ,_P=0.110659
, act=29,_n_visits=10 ,_Q=0.654451 ,_u=0.819301 ,_P=0.397420
, act=28,_n_visits=9 ,_Q=-0.627666 ,_u=1.530749 ,_P=0.918450
, act=20,_n_visits=4 ,_Q=0.731742 ,_u=1.332107 ,_P=0.376777
, act=44,_n_visits=2 ,_Q=-0.581877 ,_u=1.257006 ,_P=0.435440
, act=44,_n_visits=4 ,_Q=0.441063 ,_u=1.321673 ,_P=0.467282
, act=50,_n_visits=3 ,_Q=0.718254 ,_u=0.629334 ,_P=0.122109
, act=43,_n_visits=5 ,_Q=-0.260930 ,_u=0.182409 ,_P=0.017525
, act=36,_n_visits=4 ,_Q=0.319099 ,_u=1.258602 ,_P=0.503441
, act=45,_n_visits=2 ,_Q=-0.302862 ,_u=1.650853 ,_P=0.381248
, act=45,_n_visits=3 ,_Q=-0.542640 ,_u=0.482945 ,_P=0.030933
, act=48,_n_visits=3 ,_Q=-0.135777 ,_u=0.084086 ,_P=0.005386
, act=36,_n_visits=2 ,_Q=0.275592 ,_u=1.335555 ,_P=0.377752
, act=50,_n_visits=2 ,_Q=-0.183745 ,_u=0.191733 ,_P=0.009211
, act=28,_n_visits=214 ,_Q=0.512162 ,_u=0.117028 ,_P=0.178026
, act=1,_n_visits=5 ,_Q=-0.287701 ,_u=0.037544 ,_P=0.003087
, act=35,_n_visits=2 ,_Q=0.163045 ,_u=1.257606 ,_P=0.377282
, act=3,_n_visits=3 ,_Q=-0.422566 ,_u=0.093003 ,_P=0.005098
, act=8,_n_visits=10 ,_Q=-0.202820 ,_u=0.048012 ,_P=0.006579
, act=27,_n_visits=2 ,_Q=0.606195 ,_u=0.639651 ,_P=0.127930
, act=35,_n_visits=5 ,_Q=0.127078 ,_u=1.208807 ,_P=0.402936
, act=21,_n_visits=2 ,_Q=0.063265 ,_u=1.296277 ,_P=0.388883
, act=42,_n_visits=2 ,_Q=-0.172860 ,_u=2.039363 ,_P=0.407873
, act=36,_n_visits=2 ,_Q=0.097307 ,_u=1.167656 ,_P=0.233531
, act=9,_n_visits=2 ,_Q=-0.346548 ,_u=0.037019 ,_P=0.001522
, act=10,_n_visits=2 ,_Q=-0.433851 ,_u=0.255563 ,_P=0.010507
, act=11,_n_visits=2 ,_Q=-0.374231 ,_u=0.124851 ,_P=0.005133
, act=18,_n_visits=2 ,_Q=-0.712887 ,_u=0.443869 ,_P=0.018248
, act=19,_n_visits=28 ,_Q=-0.300480 ,_u=0.143432 ,_P=0.057001
, act=27,_n_visits=14 ,_Q=0.413202 ,_u=0.638426 ,_P=0.368595
, act=26,_n_visits=12 ,_Q=-0.385210 ,_u=1.204515 ,_P=0.801773
, act=29,_n_visits=2 ,_Q=0.265790 ,_u=1.032788 ,_P=0.186838
, act=36,_n_visits=6 ,_Q=0.831043 ,_u=0.488590 ,_P=0.206242
, act=18,_n_visits=2 ,_Q=-0.890770 ,_u=1.737816 ,_P=0.310870
, act=35,_n_visits=7 ,_Q=0.180327 ,_u=0.846025 ,_P=0.260508
, act=21,_n_visits=5 ,_Q=-0.089641 ,_u=1.223282 ,_P=0.499403
, act=20,_n_visits=4 ,_Q=0.156589 ,_u=1.601920 ,_P=0.640768
, act=12,_n_visits=2 ,_Q=0.145149 ,_u=1.547143 ,_P=0.357297
, act=36,_n_visits=4 ,_Q=0.213771 ,_u=0.899944 ,_P=0.138555
, act=20,_n_visits=3 ,_Q=-0.082635 ,_u=2.230487 ,_P=0.772664
, act=21,_n_visits=2 ,_Q=0.478807 ,_u=2.768993 ,_P=0.783189
, act=20,_n_visits=2 ,_Q=-0.560294 ,_u=0.121234 ,_P=0.004984
, act=21,_n_visits=13 ,_Q=-0.450823 ,_u=0.283983 ,_P=0.054483
, act=27,_n_visits=5 ,_Q=0.489777 ,_u=0.725693 ,_P=0.251387
, act=29,_n_visits=2 ,_Q=-0.434157 ,_u=1.302774 ,_P=0.390832
, act=29,_n_visits=2 ,_Q=0.302444 ,_u=0.606600 ,_P=0.105066
, act=36,_n_visits=4 ,_Q=0.293393 ,_u=0.756540 ,_P=0.218394
, act=20,_n_visits=3 ,_Q=-0.155795 ,_u=1.584278 ,_P=0.548810
, act=19,_n_visits=2 ,_Q=0.140650 ,_u=2.828891 ,_P=0.800131
, act=25,_n_visits=2 ,_Q=-0.494314 ,_u=0.161321 ,_P=0.006632
, act=26,_n_visits=9 ,_Q=-0.425081 ,_u=0.208361 ,_P=0.028553
, act=35,_n_visits=6 ,_Q=0.408029 ,_u=1.119694 ,_P=0.475046
, act=42,_n_visits=4 ,_Q=-0.441664 ,_u=1.310199 ,_P=0.468751
, act=34,_n_visits=3 ,_Q=0.442974 ,_u=1.686404 ,_P=0.584187
, act=36,_n_visits=2 ,_Q=0.645280 ,_u=0.665816 ,_P=0.141241
, act=27,_n_visits=5 ,_Q=-0.697772 ,_u=0.488608 ,_P=0.040175
, act=35,_n_visits=4 ,_Q=0.717066 ,_u=1.542901 ,_P=0.617160
, act=30,_n_visits=2 ,_Q=-0.413724 ,_u=0.246762 ,_P=0.010145
, act=34,_n_visits=2 ,_Q=-0.452803 ,_u=0.202389 ,_P=0.008320
, act=35,_n_visits=48 ,_Q=-0.735712 ,_u=0.580984 ,_P=0.390122
, act=27,_n_visits=13 ,_Q=0.675958 ,_u=0.703514 ,_P=0.266807
, act=26,_n_visits=10 ,_Q=-0.641721 ,_u=1.374282 ,_P=0.793442
, act=36,_n_visits=3 ,_Q=0.730219 ,_u=0.695421 ,_P=0.185446
, act=44,_n_visits=2 ,_Q=-0.606733 ,_u=2.647418 ,_P=0.748803
, act=44,_n_visits=4 ,_Q=0.858133 ,_u=0.787659 ,_P=0.210042
, act=36,_n_visits=3 ,_Q=-0.866214 ,_u=2.786685 ,_P=0.965336
, act=37,_n_visits=2 ,_Q=0.903987 ,_u=1.975933 ,_P=0.558878
, act=36,_n_visits=32 ,_Q=0.826806 ,_u=0.532382 ,_P=0.512529
, act=20,_n_visits=2 ,_Q=-0.641988 ,_u=0.404906 ,_P=0.043634
, act=27,_n_visits=2 ,_Q=-0.975444 ,_u=0.969482 ,_P=0.104474
, act=44,_n_visits=20 ,_Q=-0.815586 ,_u=0.925425 ,_P=0.698087
, act=26,_n_visits=14 ,_Q=0.939248 ,_u=0.677325 ,_P=0.466167
, act=27,_n_visits=13 ,_Q=-0.934725 ,_u=1.299563 ,_P=0.937129
, act=19,_n_visits=4 ,_Q=0.865042 ,_u=1.078254 ,_P=0.311265
, act=43,_n_visits=8 ,_Q=0.966543 ,_u=1.230521 ,_P=0.568353
, act=19,_n_visits=3 ,_Q=-0.976368 ,_u=1.523390 ,_P=0.345472
, act=11,_n_visits=2 ,_Q=0.973991 ,_u=1.236814 ,_P=0.349824
, act=27,_n_visits=2 ,_Q=0.585449 ,_u=0.740996 ,_P=0.101998
, act=53,_n_visits=2 ,_Q=0.813205 ,_u=0.963240 ,_P=0.088393
, act=37,_n_visits=2 ,_Q=-0.046088 ,_u=0.743118 ,_P=0.065037
, act=36,_n_visits=4 ,_Q=-0.755416 ,_u=0.545887 ,_P=0.037404
, act=35,_n_visits=3 ,_Q=0.859492 ,_u=2.014102 ,_P=0.697705
, act=37,_n_visits=4 ,_Q=-0.692301 ,_u=0.508317 ,_P=0.034829
, act=36,_n_visits=2 ,_Q=0.968557 ,_u=1.086428 ,_P=0.376350
, act=42,_n_visits=4 ,_Q=-0.808254 ,_u=0.558774 ,_P=0.038287
, act=27,_n_visits=2 ,_Q=0.864962 ,_u=1.170260 ,_P=0.270260
, act=43,_n_visits=2 ,_Q=-0.519138 ,_u=0.217227 ,_P=0.008930
, act=44,_n_visits=4 ,_Q=-0.471863 ,_u=0.319050 ,_P=0.021861
, act=35,_n_visits=2 ,_Q=0.249568 ,_u=1.123565 ,_P=0.389214
, act=45,_n_visits=3 ,_Q=-0.568997 ,_u=0.347658 ,_P=0.019057
, act=35,_n_visits=2 ,_Q=0.515761 ,_u=1.418398 ,_P=0.401184
, act=48,_n_visits=3 ,_Q=-0.259834 ,_u=0.099201 ,_P=0.005438
, act=49,_n_visits=2 ,_Q=-0.330507 ,_u=0.054329 ,_P=0.002234
, act=50,_n_visits=2 ,_Q=-0.509679 ,_u=0.127728 ,_P=0.005251
, act=51,_n_visits=2 ,_Q=-0.628834 ,_u=0.259326 ,_P=0.010661
, act=57,_n_visits=10 ,_Q=-0.194946 ,_u=0.027902 ,_P=0.004206
, act=35,_n_visits=5 ,_Q=0.339731 ,_u=1.078280 ,_P=0.359427
, act=21,_n_visits=2 ,_Q=-0.286942 ,_u=1.430482 ,_P=0.429145
, act=42,_n_visits=2 ,_Q=-0.439994 ,_u=1.555351 ,_P=0.311070
, act=36,_n_visits=3 ,_Q=0.068489 ,_u=1.160878 ,_P=0.309567
, act=29,_n_visits=4 ,_Q=0.298291 ,_u=0.298917 ,_P=0.010575
, act=36,_n_visits=2 ,_Q=-0.540858 ,_u=1.094615 ,_P=0.379186
, act=34,_n_visits=3 ,_Q=0.237141 ,_u=0.261359 ,_P=0.007397
, act=35,_n_visits=164 ,_Q=0.471148 ,_u=0.157738 ,_P=0.184153
, act=6,_n_visits=3 ,_Q=-0.268087 ,_u=0.037671 ,_P=0.002361
, act=8,_n_visits=2 ,_Q=-0.168531 ,_u=0.111599 ,_P=0.005245
, act=18,_n_visits=2 ,_Q=-0.412313 ,_u=0.360184 ,_P=0.016927
, act=19,_n_visits=7 ,_Q=-0.269322 ,_u=0.156382 ,_P=0.019598
, act=28,_n_visits=3 ,_Q=-0.013959 ,_u=1.428502 ,_P=0.466547
, act=21,_n_visits=2 ,_Q=0.118545 ,_u=1.938271 ,_P=0.548226
, act=36,_n_visits=3 ,_Q=0.723727 ,_u=0.795652 ,_P=0.194894
, act=20,_n_visits=2 ,_Q=-0.453386 ,_u=0.192174 ,_P=0.009031
, act=21,_n_visits=9 ,_Q=-0.226283 ,_u=0.156564 ,_P=0.024526
, act=27,_n_visits=5 ,_Q=0.488635 ,_u=0.656950 ,_P=0.232267
, act=19,_n_visits=4 ,_Q=-0.544839 ,_u=1.526171 ,_P=0.610468
, act=20,_n_visits=3 ,_Q=0.707047 ,_u=2.385997 ,_P=0.826534
, act=36,_n_visits=3 ,_Q=-0.167797 ,_u=1.099131 ,_P=0.310881
, act=37,_n_visits=2 ,_Q=0.369261 ,_u=1.709329 ,_P=0.483471
, act=26,_n_visits=5 ,_Q=-0.488121 ,_u=0.416311 ,_P=0.039130
, act=27,_n_visits=3 ,_Q=0.620914 ,_u=1.154112 ,_P=0.346234
, act=19,_n_visits=2 ,_Q=-0.628788 ,_u=2.815251 ,_P=0.796273
, act=27,_n_visits=4 ,_Q=-0.551821 ,_u=0.368254 ,_P=0.028844
, act=28,_n_visits=3 ,_Q=0.679648 ,_u=1.764206 ,_P=0.611139
, act=28,_n_visits=51 ,_Q=-0.643413 ,_u=0.614193 ,_P=0.500316
, act=26,_n_visits=2 ,_Q=-0.046136 ,_u=0.528276 ,_P=0.044826
, act=27,_n_visits=22 ,_Q=0.642367 ,_u=0.573347 ,_P=0.372984
, act=19,_n_visits=19 ,_Q=-0.616728 ,_u=0.968660 ,_P=0.803240
, act=10,_n_visits=4 ,_Q=0.458661 ,_u=0.965780 ,_P=0.182109
, act=37,_n_visits=11 ,_Q=0.803694 ,_u=0.679286 ,_P=0.384262
, act=36,_n_visits=10 ,_Q=-0.808126 ,_u=1.456837 ,_P=0.921385
, act=20,_n_visits=6 ,_Q=0.840890 ,_u=1.425785 ,_P=0.570314
, act=44,_n_visits=3 ,_Q=0.693072 ,_u=1.224395 ,_P=0.326505
, act=43,_n_visits=2 ,_Q=0.216855 ,_u=1.060660 ,_P=0.150000
, act=36,_n_visits=26 ,_Q=0.701468 ,_u=0.566880 ,_P=0.416879
, act=37,_n_visits=21 ,_Q=-0.686978 ,_u=0.928380 ,_P=0.779840
, act=19,_n_visits=11 ,_Q=0.901339 ,_u=0.562010 ,_P=0.276472
, act=27,_n_visits=10 ,_Q=-0.903239 ,_u=1.488088 ,_P=0.941149
, act=26,_n_visits=3 ,_Q=0.975513 ,_u=1.247852 ,_P=0.249570
, act=29,_n_visits=6 ,_Q=0.854688 ,_u=1.336465 ,_P=0.623684
, act=22,_n_visits=2 ,_Q=-0.987045 ,_u=1.876242 ,_P=0.335632
, act=27,_n_visits=3 ,_Q=0.211159 ,_u=0.696040 ,_P=0.124511
, act=19,_n_visits=2 ,_Q=-0.074831 ,_u=3.154536 ,_P=0.892238
, act=46,_n_visits=5 ,_Q=0.633744 ,_u=0.788997 ,_P=0.211710
, act=19,_n_visits=2 ,_Q=-0.401619 ,_u=1.329921 ,_P=0.265984
, act=29,_n_visits=2 ,_Q=-0.353277 ,_u=0.215836 ,_P=0.010143
, act=32,_n_visits=2 ,_Q=-0.433429 ,_u=0.026845 ,_P=0.001262
, act=36,_n_visits=4 ,_Q=-0.656860 ,_u=0.481576 ,_P=0.037720
, act=28,_n_visits=3 ,_Q=0.805237 ,_u=1.726618 ,_P=0.598118
, act=37,_n_visits=13 ,_Q=-0.178011 ,_u=0.096716 ,_P=0.021211
, act=27,_n_visits=4 ,_Q=0.331033 ,_u=0.707519 ,_P=0.204243
, act=19,_n_visits=2 ,_Q=-0.308755 ,_u=1.175578 ,_P=0.271488
, act=28,_n_visits=8 ,_Q=0.138524 ,_u=1.078394 ,_P=0.498089
, act=21,_n_visits=6 ,_Q=0.006721 ,_u=1.015354 ,_P=0.460522
, act=29,_n_visits=4 ,_Q=0.243062 ,_u=1.602219 ,_P=0.573227
, act=30,_n_visits=3 ,_Q=-0.259366 ,_u=2.046454 ,_P=0.708912
, act=42,_n_visits=3 ,_Q=-0.716200 ,_u=0.542660 ,_P=0.034004
, act=27,_n_visits=2 ,_Q=0.824671 ,_u=1.270746 ,_P=0.359421
, act=44,_n_visits=3 ,_Q=-0.690611 ,_u=0.492385 ,_P=0.030853
, act=36,_n_visits=2 ,_Q=0.891452 ,_u=1.318017 ,_P=0.372791
, act=46,_n_visits=2 ,_Q=-0.265342 ,_u=0.152422 ,_P=0.007163
, act=48,_n_visits=3 ,_Q=-0.014280 ,_u=0.108676 ,_P=0.005107
, act=28,_n_visits=2 ,_Q=0.115462 ,_u=1.422586 ,_P=0.402368
, act=36,_n_visits=206 ,_Q=0.493156 ,_u=0.135389 ,_P=0.198295
, act=1,_n_visits=2 ,_Q=-0.327716 ,_u=0.061656 ,_P=0.002584
, act=2,_n_visits=2 ,_Q=-0.525546 ,_u=0.057245 ,_P=0.002399
, act=3,_n_visits=2 ,_Q=-0.422939 ,_u=0.050571 ,_P=0.002119
, act=4,_n_visits=2 ,_Q=-0.377521 ,_u=0.093878 ,_P=0.003934
, act=8,_n_visits=6 ,_Q=-0.197412 ,_u=0.047235 ,_P=0.004619
, act=27,_n_visits=2 ,_Q=0.172668 ,_u=1.422024 ,_P=0.381569
, act=28,_n_visits=2 ,_Q=0.114629 ,_u=0.983914 ,_P=0.264012
, act=9,_n_visits=2 ,_Q=-0.248283 ,_u=0.040006 ,_P=0.001676
, act=10,_n_visits=2 ,_Q=-0.553534 ,_u=0.123933 ,_P=0.005193
, act=18,_n_visits=9 ,_Q=-0.462452 ,_u=0.332581 ,_P=0.046457
, act=28,_n_visits=2 ,_Q=-0.292724 ,_u=1.154056 ,_P=0.244812
, act=35,_n_visits=6 ,_Q=0.792367 ,_u=0.660128 ,_P=0.280068
, act=34,_n_visits=4 ,_Q=-0.849978 ,_u=1.554532 ,_P=0.556166
, act=26,_n_visits=3 ,_Q=0.897936 ,_u=1.961762 ,_P=0.679574
, act=19,_n_visits=4 ,_Q=-0.306404 ,_u=0.140312 ,_P=0.009800
, act=27,_n_visits=3 ,_Q=0.508752 ,_u=1.169199 ,_P=0.405023
, act=18,_n_visits=2 ,_Q=-0.651676 ,_u=2.156640 ,_P=0.609990
, act=20,_n_visits=15 ,_Q=-0.084819 ,_u=0.116576 ,_P=0.024426
, act=27,_n_visits=8 ,_Q=-0.107960 ,_u=1.083542 ,_P=0.463342
, act=18,_n_visits=6 ,_Q=0.395494 ,_u=1.210155 ,_P=0.548875
, act=19,_n_visits=5 ,_Q=-0.302121 ,_u=1.689607 ,_P=0.755615
, act=11,_n_visits=3 ,_Q=0.648154 ,_u=1.694776 ,_P=0.508433
, act=28,_n_visits=2 ,_Q=0.303412 ,_u=0.385879 ,_P=0.061878
, act=35,_n_visits=4 ,_Q=0.443864 ,_u=0.497722 ,_P=0.133022
, act=21,_n_visits=5 ,_Q=-0.339598 ,_u=0.177279 ,_P=0.014858
, act=27,_n_visits=4 ,_Q=0.410865 ,_u=1.229464 ,_P=0.491786
, act=18,_n_visits=2 ,_Q=-0.164816 ,_u=1.406258 ,_P=0.487142
, act=27,_n_visits=49 ,_Q=-0.766117 ,_u=0.647778 ,_P=0.452427
, act=28,_n_visits=22 ,_Q=0.813776 ,_u=0.563106 ,_P=0.357621
, act=20,_n_visits=18 ,_Q=-0.797705 ,_u=1.048455 ,_P=0.823650
, act=34,_n_visits=11 ,_Q=0.917386 ,_u=0.764274 ,_P=0.407800
, act=35,_n_visits=10 ,_Q=-0.909520 ,_u=1.490832 ,_P=0.942885
, act=19,_n_visits=4 ,_Q=0.842634 ,_u=1.239081 ,_P=0.413027
, act=43,_n_visits=5 ,_Q=0.962893 ,_u=1.273770 ,_P=0.424590
, act=19,_n_visits=2 ,_Q=-0.967556 ,_u=1.495460 ,_P=0.448638
, act=35,_n_visits=3 ,_Q=0.800281 ,_u=0.719312 ,_P=0.139567
, act=34,_n_visits=2 ,_Q=-0.796763 ,_u=3.087315 ,_P=0.873225
, act=44,_n_visits=2 ,_Q=0.741146 ,_u=0.720837 ,_P=0.104897
, act=35,_n_visits=24 ,_Q=0.797960 ,_u=0.530278 ,_P=0.382695
, act=34,_n_visits=18 ,_Q=-0.756013 ,_u=0.985338 ,_P=0.739646
, act=20,_n_visits=6 ,_Q=0.759747 ,_u=0.863677 ,_P=0.293262
, act=28,_n_visits=5 ,_Q=-0.721861 ,_u=2.066503 ,_P=0.924168
, act=29,_n_visits=3 ,_Q=0.956090 ,_u=1.280097 ,_P=0.384029
, act=28,_n_visits=3 ,_Q=0.548980 ,_u=0.695815 ,_P=0.135008
, act=20,_n_visits=2 ,_Q=-0.389072 ,_u=2.936032 ,_P=0.830435
, act=37,_n_visits=5 ,_Q=0.794501 ,_u=0.745048 ,_P=0.180701
, act=38,_n_visits=3 ,_Q=-0.956281 ,_u=1.786278 ,_P=0.714511
, act=20,_n_visits=2 ,_Q=0.982556 ,_u=2.506504 ,_P=0.708946
, act=41,_n_visits=3 ,_Q=0.821011 ,_u=0.699014 ,_P=0.135629
, act=28,_n_visits=4 ,_Q=-0.611261 ,_u=0.475917 ,_P=0.033239
, act=27,_n_visits=3 ,_Q=0.775069 ,_u=1.532128 ,_P=0.530745
, act=29,_n_visits=10 ,_Q=-0.441284 ,_u=0.289547 ,_P=0.044490
, act=27,_n_visits=2 ,_Q=0.323227 ,_u=1.052465 ,_P=0.210493
, act=28,_n_visits=6 ,_Q=0.476465 ,_u=1.075635 ,_P=0.430254
, act=20,_n_visits=5 ,_Q=-0.466093 ,_u=1.835785 ,_P=0.820988
, act=38,_n_visits=2 ,_Q=0.702720 ,_u=0.608844 ,_P=0.182653
, act=32,_n_visits=2 ,_Q=-0.275379 ,_u=0.084119 ,_P=0.003525
, act=33,_n_visits=2 ,_Q=-0.501406 ,_u=0.103510 ,_P=0.004338
, act=34,_n_visits=10 ,_Q=-0.271272 ,_u=0.133494 ,_P=0.020512
, act=27,_n_visits=7 ,_Q=0.276063 ,_u=1.178679 ,_P=0.550050
, act=18,_n_visits=4 ,_Q=-0.058847 ,_u=1.219894 ,_P=0.498020
, act=26,_n_visits=3 ,_Q=0.272370 ,_u=1.924102 ,_P=0.666528
, act=25,_n_visits=2 ,_Q=0.075270 ,_u=1.911006 ,_P=0.540514
, act=45,_n_visits=2 ,_Q=-0.528664 ,_u=1.889566 ,_P=0.308565
, act=28,_n_visits=2 ,_Q=0.511357 ,_u=0.756065 ,_P=0.151213
, act=35,_n_visits=6 ,_Q=-0.657799 ,_u=0.430329 ,_P=0.042078
, act=27,_n_visits=3 ,_Q=0.812160 ,_u=1.472037 ,_P=0.394989
, act=28,_n_visits=2 ,_Q=0.689378 ,_u=1.313297 ,_P=0.352395
, act=42,_n_visits=2 ,_Q=-0.717491 ,_u=0.453573 ,_P=0.019007
, act=43,_n_visits=3 ,_Q=-0.706385 ,_u=0.537022 ,_P=0.030006
, act=35,_n_visits=2 ,_Q=0.902659 ,_u=1.390399 ,_P=0.393264
, act=45,_n_visits=4 ,_Q=-0.837022 ,_u=0.576621 ,_P=0.040273
, act=35,_n_visits=2 ,_Q=0.836585 ,_u=0.734688 ,_P=0.254503
, act=48,_n_visits=13 ,_Q=-0.183513 ,_u=0.029456 ,_P=0.005760
, act=27,_n_visits=6 ,_Q=0.139221 ,_u=1.015738 ,_P=0.410506
, act=18,_n_visits=3 ,_Q=-0.018522 ,_u=1.534746 ,_P=0.411816
, act=45,_n_visits=2 ,_Q=0.381959 ,_u=2.034072 ,_P=0.575322
, act=45,_n_visits=2 ,_Q=-0.320098 ,_u=1.566204 ,_P=0.420257
, act=28,_n_visits=3 ,_Q=0.252122 ,_u=1.295780 ,_P=0.224436
, act=20,_n_visits=2 ,_Q=-0.108003 ,_u=1.482892 ,_P=0.419425
, act=35,_n_visits=3 ,_Q=0.336723 ,_u=0.600453 ,_P=0.138669
, act=49,_n_visits=2 ,_Q=-0.293640 ,_u=0.043256 ,_P=0.001813
, act=50,_n_visits=4 ,_Q=-0.395144 ,_u=0.146799 ,_P=0.010253
, act=27,_n_visits=2 ,_Q=0.151526 ,_u=1.403314 ,_P=0.486122
, act=57,_n_visits=5 ,_Q=-0.282833 ,_u=0.051124 ,_P=0.004285
, act=35,_n_visits=2 ,_Q=0.775605 ,_u=1.292591 ,_P=0.258518
, act=37,_n_visits=4 ,_Q=0.259429 ,_u=0.302998 ,_P=0.010719
, act=28,_n_visits=2 ,_Q=-0.460422 ,_u=1.269522 ,_P=0.439775
, act=42,_n_visits=7 ,_Q=-0.250396 ,_u=0.828577 ,_P=0.046901
, act=35,_n_visits=4 ,_Q=0.330735 ,_u=0.872967 ,_P=0.356387
, act=43,_n_visits=3 ,_Q=0.215870 ,_u=0.311616 ,_P=0.008819
, act=36,_n_visits=2 ,_Q=-0.150397 ,_u=1.779955 ,_P=0.503447
, act=44,_n_visits=3 ,_Q=0.215788 ,_u=0.262477 ,_P=0.007429
, act=35,_n_visits=2 ,_Q=-0.165841 ,_u=1.632656 ,_P=0.461785
, act=45,_n_visits=6 ,_Q=-0.228874 ,_u=0.816619 ,_P=0.040446
, act=36,_n_visits=4 ,_Q=0.339856 ,_u=1.106024 ,_P=0.395703
Gamma公式展示 Γ ( n ) = ( n − 1 ) ! ∀ n ∈ N \Gamma(n) = (n-1)!\quad\forall n\in\mathbb N Γ(n)=(n−1)!∀n∈N 是通过 Euler integral
Γ ( z ) = ∫ 0 ∞ t z − 1 e − t d t   . \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,. Γ(z)=∫0∞tz−1e−tdt.
构建过程
:
1、初始搜索树为只有根节点。
2、搜索路径:依据选择函数来确定
选择函数:
所有儿子节点中权重最大值
权重:(抑制已经选择过的)
u= c_puct * _P * ( V i s i t ( p ) / V i s i t ) \sqrt(Visit(p)/Visit) (Visit(p)/Visit)
Q+u
3、叶子节点再扩充搜索树节点
3.1 、搜索树节点:依据策略函数来确定
4、更新搜索路径值
4.1 上级节点更新
4.2 本级节点更新
访问节点数: +1
Q += 增量价值/访问次数
MSTC
属性 | 值 | 方法 | 值 变化 |
---|---|---|---|
_root | Node(1) | _playout | 1、getPath 2、 _policy 3、expand 4、update |
_policy | |||
_c_puct | |||
_n_playout |
Node
属性 | 值 | 方法 | 值变化 |
---|---|---|---|
_parent | init | self._parent = parent | |
_children | {} | expand | 由action_priors 生成 _children[action] = TreeNode( prob) |
_n_visits | 0 | update | _n_visits +=1 |
_Q | 0 | update | self._Q += 1.0*(leaf_value - self._Q) / self._n_visits |
_u | 0 | get_value | self._u = (c_puct * self._P * np.sqrt(self._parent._n_visits) / (1 + self._n_visits)) |
_p | prior_p | init | self._P = prior_p |
待续未完。。。