alphazero 五子棋开源代码分析

1目的

学习和理解alphazero 算法

相关学习材料 https://github.com/chiefzzs/alphago_learnning/

参考:https://github.com/junxiaosong/AlphaZero_Gomoku 

2学习方式

序号 步骤 代码
1 理解下棋过程 代码参考
2 理解下棋算法过程

3下棋过程

序号 l1步骤 l2步骤 l3步骤 l4步骤
1 初始化
2 对弈过程
2.1 局面分析
2.1.1 蒙特卡洛下棋

3.1 初始化

定义棋盘
定义游戏
定义棋手
绑定棋手到棋盘

 #变量定义
n = 5
width, height = 8, 8
model_file =  'best_policy_8_8_5.model'

#初始化棋盘
board = Board(width=width, height=height, n_in_row=n)
game = Game(board)

# ############### human VS AI ###################
# load the trained policy_value_net in either Theano/Lasagne, PyTorch or TensorFlow

# best_policy = PolicyValueNet(width, height, model_file = model_file)
# mcts_player = MCTSPlayer(best_policy.policy_value_fn, c_puct=5, n_playout=400)

# load the provided model (trained in Theano/Lasagne) into a MCTS player written in pure numpy

#得到策略                               
best_policy = PolicyValueNetNumpy(width, height, model_file)
#依据智能初始化棋手1
mcts_player1 = MCTSPlayer(best_policy.policy_value_fn,
                         c_puct=5,
                         n_playout=400)  # set larger n_playout for better performance
#依据智能初始化棋手2
mcts_player2 = MCTSPlayer(best_policy.policy_value_fn,
                         c_puct=5,
                         n_playout=400)  

# uncomment the following line to play with pure MCTS (it's much weaker even with a larger n_playout)
# mcts_player = MCTS_Pure(c_puct=5, n_playout=1000)

# human player, input your move in the format: 2,3
human = Human()

#依据棋手设置棋盘
player1 =  mcts_player1
player2 =  mcts_player2
start_player=0 
is_shown=1

game.board.init_board(start_player)
p1, p2 = game.board.players
player1.set_player_ind(p1)
player2.set_player_ind(p2)
players = {p1: player1, p2: player2}

3.1.1 棋盘内部状态

*
states
state
act
player
 self.width = int(kwargs.get('width', 8))
        self.height = int(kwargs.get('height', 8))
        # board states stored as a dict,
        # key: move as location on the board,
        # value: player as pieces type
        self.states = {}
        # need how many pieces in a row to win
        self.n_in_row = int(kwargs.get('n_in_row', 5))
        self.players = [1, 2]  # player1 and player2

3.1.1 棋盘外部状态

height*width
height*width
height*width
height*width
state
layer0
layer1
layer2
layer3
player1 moves
player2 moves
lastmove
current player
    def current_state(self):
        """return the board state from the perspective of the current player.
        state shape: 4*width*height
        """

        square_state = np.zeros((4, self.width, self.height))
        if self.states:
            moves, players = np.array(list(zip(*self.states.items())))
            move_curr = moves[players == self.current_player]
            move_oppo = moves[players != self.current_player]
            square_state[0][move_curr // self.width,
                            move_curr % self.height] = 1.0
            square_state[1][move_oppo // self.width,
                            move_oppo % self.height] = 1.0
            # indicate the last move location
            square_state[2][self.last_move // self.width,
                            self.last_move % self.height] = 1.0
        if len(self.states) % 2 == 0:
            square_state[3][:, :] = 1.0  # indicate the colour to play
        return square_state[:, ::-1, :]

3.2 对弈过程

依据当前局面,得到当前局面的全部可能"下发"acts 和 推荐概率 probs

##计算概率
current_player = game.board.get_current_player()
player_in_turn = players[current_player]
board=game.board
temp=1e-3
return_prob=0

sensible_moves = board.availables
move_probs = np.zeros(board.width*board.height)

## 依据当前局面,得到每个步骤
acts, probs = player_in_turn.mcts.get_move_probs(board, temp)
print(acts)
print(probs)

输出:
每个可以下的位置推荐的概率

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63)
[0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 5.91871068e-107
1.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 1.53059365e-139
2.92327048e-039 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000]

3.2.1 局面分析:依据当前局面得到步骤可能性

1、采用蒙特卡洛下棋
2、依据蒙特卡洛树来统计每个动作的访问次数
3、对访问次数做softmax归一化得到概率。

state=board
temp=1e-3

## 
state=board
temp=1e-3

#蒙特卡洛下棋
for n in range(player_in_turn.mcts._n_playout):
    state_copy = copy.deepcopy(state)
    player_in_turn.mcts._playout(state_copy)

# 依据访问来计算可能性    
# calc the move probabilities based on visit counts at the root node
act_visits = [(act, node._n_visits)
              for act, node in player_in_turn.mcts._root._children.items()]
acts, visits = zip(*act_visits)

# 归一化
act_probs = softmax(1.0/temp * np.log(np.array(visits) + 1e-10))
print("---------acts----------")
print(acts)
print(visits)
print(act_probs)

3.2.1.1 分析蒙特卡洛树

当前局

alphazero 五子棋开源代码分析_第1张图片
蒙特卡洛后第一层结果:

alphazero 五子棋开源代码分析_第2张图片

蒙特卡洛树表达:采用mermaid表达

act18
act26
act44
act36
act27
act19
act10
act18
act20
act18
act36
act37
act46
act18
act36
act37
act37
act36
act20
act13
act43
act34
act43
act44
act52
act44
act20
act51
act12
act30
act26
act43
act37
act51
act51
act36
act19
act43
act36
act34
act36
act19
act27
act34
act37
act34
act33
act34
act27
act18
act37
act19
act27
act26
act29
act30
act30
act44
act29
act22
act26
act34
act33
act26
act26
act25
act43
act20
act27
act19
act10
act46
act33
act34
act19
act33
act33
act19
act27
act45
act46
act19
act10
act29
act27
act29
act29
act27
act45
act53
act45
act44
visited:400
visited:2
visited:5
visited:3
visited:2
visited:165
visited:98
visited:14
visited:2
visited:4
visited:2
visited:2
visited:5
visited:4
visited:2
visited:4
visited:3
visited:69
visited:44
visited:26
visited:4
visited:2
visited:2
visited:2
visited:5
visited:4
visited:17
visited:7
visited:6
visited:4
visited:2
visited:2
visited:5
visited:3
visited:2
visited:2
visited:4
visited:3
visited:4
visited:3
visited:2
visited:221
visited:2
visited:8
visited:2
visited:4
visited:2
visited:2
visited:7
visited:6
visited:2
visited:145
visited:105
visited:50
visited:14
visited:4
visited:2
visited:2
visited:2
visited:35
visited:9
visited:3
visited:5
visited:3
visited:2
visited:3
visited:2
visited:3
visited:2
visited:6
visited:5
visited:3
visited:2
visited:2
visited:8
visited:3
visited:2
visited:4
visited:3
visited:2
visited:2
visited:20
visited:6
visited:5
visited:2
visited:3
visited:2
visited:6
visited:4
visited:3
visited:2
visited:2
visited:2
def printNode(node,level=0,act=0):    
    if(node._n_visits<2):
        return
    str = '	'*level
    print("%s, act=%d,_n_visits=%d ,_Q=%f ,_u=%f ,_P=%f " %(str,act,node._n_visits,node._Q,node._u,node._P))
          
    for act,child in node._children.items():
          printNode(child,level+1,act)

prn_obj(game.board)          
printNode(player_in_turn.mcts._root)          

蒙特卡洛树表达1:采用缩进表达。

, act=0,_n_visits=800 ,_Q=-0.455971 ,_u=0.000000 ,_P=1.000000 
	, act=18,_n_visits=8 ,_Q=-0.185447 ,_u=0.781758 ,_P=0.049782 
		, act=27,_n_visits=3 ,_Q=-0.035337 ,_u=1.092187 ,_P=0.330246 
		, act=36,_n_visits=2 ,_Q=0.365520 ,_u=0.683974 ,_P=0.155111 
	, act=19,_n_visits=4 ,_Q=0.219712 ,_u=0.339472 ,_P=0.012010 
		, act=28,_n_visits=3 ,_Q=-0.214190 ,_u=1.373586 ,_P=0.475824 
			, act=27,_n_visits=2 ,_Q=0.232468 ,_u=1.274235 ,_P=0.360408 
	, act=20,_n_visits=2 ,_Q=0.080553 ,_u=0.501082 ,_P=0.010636 
	, act=21,_n_visits=12 ,_Q=0.058928 ,_u=0.458884 ,_P=0.042209 
		, act=27,_n_visits=2 ,_Q=-0.063028 ,_u=0.487490 ,_P=0.088190 
		, act=28,_n_visits=5 ,_Q=0.071194 ,_u=1.033736 ,_P=0.311683 
			, act=29,_n_visits=2 ,_Q=-0.497347 ,_u=1.085420 ,_P=0.217084 
		, act=35,_n_visits=2 ,_Q=-0.243935 ,_u=0.904106 ,_P=0.163559 
		, act=36,_n_visits=2 ,_Q=-0.152409 ,_u=0.580145 ,_P=0.104952 
	, act=26,_n_visits=2 ,_Q=-0.069493 ,_u=0.454502 ,_P=0.009647 
	, act=27,_n_visits=157 ,_Q=0.476385 ,_u=0.151632 ,_P=0.168441 
		, act=8,_n_visits=2 ,_Q=-0.329893 ,_u=0.081594 ,_P=0.003920 
		, act=9,_n_visits=2 ,_Q=-0.358136 ,_u=0.039543 ,_P=0.001900 
		, act=11,_n_visits=2 ,_Q=-0.359925 ,_u=0.206617 ,_P=0.009926 
		, act=18,_n_visits=6 ,_Q=-0.410891 ,_u=0.412434 ,_P=0.046230 
			, act=28,_n_visits=2 ,_Q=0.277199 ,_u=0.784231 ,_P=0.210431 
			, act=35,_n_visits=3 ,_Q=0.493989 ,_u=1.036746 ,_P=0.278188 
				, act=19,_n_visits=2 ,_Q=-0.307951 ,_u=1.972537 ,_P=0.557918 
		, act=20,_n_visits=18 ,_Q=-0.186914 ,_u=0.173711 ,_P=0.052850 
			, act=18,_n_visits=2 ,_Q=-0.205378 ,_u=0.827569 ,_P=0.120429 
			, act=19,_n_visits=2 ,_Q=0.284847 ,_u=0.473225 ,_P=0.068864 
			, act=28,_n_visits=8 ,_Q=0.161790 ,_u=0.918767 ,_P=0.401101 
				, act=29,_n_visits=6 ,_Q=0.063473 ,_u=1.567840 ,_P=0.711105 
					, act=26,_n_visits=2 ,_Q=-0.019465 ,_u=0.769820 ,_P=0.206564 
					, act=35,_n_visits=2 ,_Q=-0.016241 ,_u=0.788532 ,_P=0.141057 
			, act=36,_n_visits=4 ,_Q=0.281107 ,_u=1.072277 ,_P=0.208052 
				, act=18,_n_visits=2 ,_Q=0.176588 ,_u=1.850940 ,_P=0.427456 
		, act=21,_n_visits=2 ,_Q=-0.390710 ,_u=0.288127 ,_P=0.013841 
		, act=28,_n_visits=3 ,_Q=-0.636589 ,_u=0.534831 ,_P=0.034257 
			, act=36,_n_visits=2 ,_Q=0.785911 ,_u=1.563539 ,_P=0.442236 
		, act=29,_n_visits=4 ,_Q=-0.255149 ,_u=0.262146 ,_P=0.020988 
			, act=36,_n_visits=3 ,_Q=0.326441 ,_u=1.556929 ,_P=0.539336 
		, act=34,_n_visits=3 ,_Q=-0.541608 ,_u=0.435908 ,_P=0.027920 
			, act=35,_n_visits=2 ,_Q=0.636979 ,_u=1.690728 ,_P=0.478210 
		, act=35,_n_visits=3 ,_Q=-0.782964 ,_u=0.598944 ,_P=0.038363 
			, act=36,_n_visits=2 ,_Q=0.930136 ,_u=1.706458 ,_P=0.482659 
		, act=36,_n_visits=50 ,_Q=-0.550742 ,_u=0.570920 ,_P=0.457102 
			, act=20,_n_visits=2 ,_Q=-0.129428 ,_u=0.938021 ,_P=0.080402 
			, act=28,_n_visits=21 ,_Q=0.608117 ,_u=0.551983 ,_P=0.331190 
				, act=26,_n_visits=2 ,_Q=-0.587251 ,_u=0.623344 ,_P=0.083630 
				, act=29,_n_visits=16 ,_Q=-0.562059 ,_u=1.026563 ,_P=0.734549 
					, act=22,_n_visits=3 ,_Q=0.722939 ,_u=0.781887 ,_P=0.161506 
					, act=26,_n_visits=2 ,_Q=0.218605 ,_u=0.925751 ,_P=0.143417 
					, act=35,_n_visits=3 ,_Q=0.465863 ,_u=0.908840 ,_P=0.140797 
						, act=43,_n_visits=2 ,_Q=-0.373646 ,_u=2.861129 ,_P=0.809249 
					, act=43,_n_visits=7 ,_Q=0.671767 ,_u=0.728113 ,_P=0.300797 
						, act=35,_n_visits=6 ,_Q=-0.626132 ,_u=1.928775 ,_P=0.944903 
							, act=34,_n_visits=3 ,_Q=0.464121 ,_u=1.722878 ,_P=0.462297 
							, act=37,_n_visits=2 ,_Q=0.737424 ,_u=1.495213 ,_P=0.401208 
			, act=34,_n_visits=2 ,_Q=-0.332387 ,_u=0.668018 ,_P=0.057259 
			, act=35,_n_visits=24 ,_Q=0.645149 ,_u=0.532131 ,_P=0.380093 
				, act=19,_n_visits=2 ,_Q=-0.766992 ,_u=0.776434 ,_P=0.097139 
				, act=43,_n_visits=18 ,_Q=-0.593014 ,_u=0.916555 ,_P=0.688014 
					, act=19,_n_visits=2 ,_Q=0.045573 ,_u=1.172094 ,_P=0.170565 
					, act=28,_n_visits=2 ,_Q=0.529674 ,_u=0.760434 ,_P=0.110659 
					, act=29,_n_visits=10 ,_Q=0.654451 ,_u=0.819301 ,_P=0.397420 
						, act=28,_n_visits=9 ,_Q=-0.627666 ,_u=1.530749 ,_P=0.918450 
							, act=20,_n_visits=4 ,_Q=0.731742 ,_u=1.332107 ,_P=0.376777 
								, act=44,_n_visits=2 ,_Q=-0.581877 ,_u=1.257006 ,_P=0.435440 
							, act=44,_n_visits=4 ,_Q=0.441063 ,_u=1.321673 ,_P=0.467282 
					, act=50,_n_visits=3 ,_Q=0.718254 ,_u=0.629334 ,_P=0.122109 
		, act=43,_n_visits=5 ,_Q=-0.260930 ,_u=0.182409 ,_P=0.017525 
			, act=36,_n_visits=4 ,_Q=0.319099 ,_u=1.258602 ,_P=0.503441 
				, act=45,_n_visits=2 ,_Q=-0.302862 ,_u=1.650853 ,_P=0.381248 
		, act=45,_n_visits=3 ,_Q=-0.542640 ,_u=0.482945 ,_P=0.030933 
		, act=48,_n_visits=3 ,_Q=-0.135777 ,_u=0.084086 ,_P=0.005386 
			, act=36,_n_visits=2 ,_Q=0.275592 ,_u=1.335555 ,_P=0.377752 
		, act=50,_n_visits=2 ,_Q=-0.183745 ,_u=0.191733 ,_P=0.009211 
	, act=28,_n_visits=214 ,_Q=0.512162 ,_u=0.117028 ,_P=0.178026 
		, act=1,_n_visits=5 ,_Q=-0.287701 ,_u=0.037544 ,_P=0.003087 
			, act=35,_n_visits=2 ,_Q=0.163045 ,_u=1.257606 ,_P=0.377282 
		, act=3,_n_visits=3 ,_Q=-0.422566 ,_u=0.093003 ,_P=0.005098 
		, act=8,_n_visits=10 ,_Q=-0.202820 ,_u=0.048012 ,_P=0.006579 
			, act=27,_n_visits=2 ,_Q=0.606195 ,_u=0.639651 ,_P=0.127930 
			, act=35,_n_visits=5 ,_Q=0.127078 ,_u=1.208807 ,_P=0.402936 
				, act=21,_n_visits=2 ,_Q=0.063265 ,_u=1.296277 ,_P=0.388883 
				, act=42,_n_visits=2 ,_Q=-0.172860 ,_u=2.039363 ,_P=0.407873 
			, act=36,_n_visits=2 ,_Q=0.097307 ,_u=1.167656 ,_P=0.233531 
		, act=9,_n_visits=2 ,_Q=-0.346548 ,_u=0.037019 ,_P=0.001522 
		, act=10,_n_visits=2 ,_Q=-0.433851 ,_u=0.255563 ,_P=0.010507 
		, act=11,_n_visits=2 ,_Q=-0.374231 ,_u=0.124851 ,_P=0.005133 
		, act=18,_n_visits=2 ,_Q=-0.712887 ,_u=0.443869 ,_P=0.018248 
		, act=19,_n_visits=28 ,_Q=-0.300480 ,_u=0.143432 ,_P=0.057001 
			, act=27,_n_visits=14 ,_Q=0.413202 ,_u=0.638426 ,_P=0.368595 
				, act=26,_n_visits=12 ,_Q=-0.385210 ,_u=1.204515 ,_P=0.801773 
					, act=29,_n_visits=2 ,_Q=0.265790 ,_u=1.032788 ,_P=0.186838 
					, act=36,_n_visits=6 ,_Q=0.831043 ,_u=0.488590 ,_P=0.206242 
						, act=18,_n_visits=2 ,_Q=-0.890770 ,_u=1.737816 ,_P=0.310870 
			, act=35,_n_visits=7 ,_Q=0.180327 ,_u=0.846025 ,_P=0.260508 
				, act=21,_n_visits=5 ,_Q=-0.089641 ,_u=1.223282 ,_P=0.499403 
					, act=20,_n_visits=4 ,_Q=0.156589 ,_u=1.601920 ,_P=0.640768 
						, act=12,_n_visits=2 ,_Q=0.145149 ,_u=1.547143 ,_P=0.357297 
			, act=36,_n_visits=4 ,_Q=0.213771 ,_u=0.899944 ,_P=0.138555 
				, act=20,_n_visits=3 ,_Q=-0.082635 ,_u=2.230487 ,_P=0.772664 
					, act=21,_n_visits=2 ,_Q=0.478807 ,_u=2.768993 ,_P=0.783189 
		, act=20,_n_visits=2 ,_Q=-0.560294 ,_u=0.121234 ,_P=0.004984 
		, act=21,_n_visits=13 ,_Q=-0.450823 ,_u=0.283983 ,_P=0.054483 
			, act=27,_n_visits=5 ,_Q=0.489777 ,_u=0.725693 ,_P=0.251387 
				, act=29,_n_visits=2 ,_Q=-0.434157 ,_u=1.302774 ,_P=0.390832 
			, act=29,_n_visits=2 ,_Q=0.302444 ,_u=0.606600 ,_P=0.105066 
			, act=36,_n_visits=4 ,_Q=0.293393 ,_u=0.756540 ,_P=0.218394 
				, act=20,_n_visits=3 ,_Q=-0.155795 ,_u=1.584278 ,_P=0.548810 
					, act=19,_n_visits=2 ,_Q=0.140650 ,_u=2.828891 ,_P=0.800131 
		, act=25,_n_visits=2 ,_Q=-0.494314 ,_u=0.161321 ,_P=0.006632 
		, act=26,_n_visits=9 ,_Q=-0.425081 ,_u=0.208361 ,_P=0.028553 
			, act=35,_n_visits=6 ,_Q=0.408029 ,_u=1.119694 ,_P=0.475046 
				, act=42,_n_visits=4 ,_Q=-0.441664 ,_u=1.310199 ,_P=0.468751 
					, act=34,_n_visits=3 ,_Q=0.442974 ,_u=1.686404 ,_P=0.584187 
			, act=36,_n_visits=2 ,_Q=0.645280 ,_u=0.665816 ,_P=0.141241 
		, act=27,_n_visits=5 ,_Q=-0.697772 ,_u=0.488608 ,_P=0.040175 
			, act=35,_n_visits=4 ,_Q=0.717066 ,_u=1.542901 ,_P=0.617160 
		, act=30,_n_visits=2 ,_Q=-0.413724 ,_u=0.246762 ,_P=0.010145 
		, act=34,_n_visits=2 ,_Q=-0.452803 ,_u=0.202389 ,_P=0.008320 
		, act=35,_n_visits=48 ,_Q=-0.735712 ,_u=0.580984 ,_P=0.390122 
			, act=27,_n_visits=13 ,_Q=0.675958 ,_u=0.703514 ,_P=0.266807 
				, act=26,_n_visits=10 ,_Q=-0.641721 ,_u=1.374282 ,_P=0.793442 
					, act=36,_n_visits=3 ,_Q=0.730219 ,_u=0.695421 ,_P=0.185446 
						, act=44,_n_visits=2 ,_Q=-0.606733 ,_u=2.647418 ,_P=0.748803 
					, act=44,_n_visits=4 ,_Q=0.858133 ,_u=0.787659 ,_P=0.210042 
						, act=36,_n_visits=3 ,_Q=-0.866214 ,_u=2.786685 ,_P=0.965336 
							, act=37,_n_visits=2 ,_Q=0.903987 ,_u=1.975933 ,_P=0.558878 
			, act=36,_n_visits=32 ,_Q=0.826806 ,_u=0.532382 ,_P=0.512529 
				, act=20,_n_visits=2 ,_Q=-0.641988 ,_u=0.404906 ,_P=0.043634 
				, act=27,_n_visits=2 ,_Q=-0.975444 ,_u=0.969482 ,_P=0.104474 
				, act=44,_n_visits=20 ,_Q=-0.815586 ,_u=0.925425 ,_P=0.698087 
					, act=26,_n_visits=14 ,_Q=0.939248 ,_u=0.677325 ,_P=0.466167 
						, act=27,_n_visits=13 ,_Q=-0.934725 ,_u=1.299563 ,_P=0.937129 
							, act=19,_n_visits=4 ,_Q=0.865042 ,_u=1.078254 ,_P=0.311265 
							, act=43,_n_visits=8 ,_Q=0.966543 ,_u=1.230521 ,_P=0.568353 
								, act=19,_n_visits=3 ,_Q=-0.976368 ,_u=1.523390 ,_P=0.345472 
									, act=11,_n_visits=2 ,_Q=0.973991 ,_u=1.236814 ,_P=0.349824 
					, act=27,_n_visits=2 ,_Q=0.585449 ,_u=0.740996 ,_P=0.101998 
					, act=53,_n_visits=2 ,_Q=0.813205 ,_u=0.963240 ,_P=0.088393 
			, act=37,_n_visits=2 ,_Q=-0.046088 ,_u=0.743118 ,_P=0.065037 
		, act=36,_n_visits=4 ,_Q=-0.755416 ,_u=0.545887 ,_P=0.037404 
			, act=35,_n_visits=3 ,_Q=0.859492 ,_u=2.014102 ,_P=0.697705 
		, act=37,_n_visits=4 ,_Q=-0.692301 ,_u=0.508317 ,_P=0.034829 
			, act=36,_n_visits=2 ,_Q=0.968557 ,_u=1.086428 ,_P=0.376350 
		, act=42,_n_visits=4 ,_Q=-0.808254 ,_u=0.558774 ,_P=0.038287 
			, act=27,_n_visits=2 ,_Q=0.864962 ,_u=1.170260 ,_P=0.270260 
		, act=43,_n_visits=2 ,_Q=-0.519138 ,_u=0.217227 ,_P=0.008930 
		, act=44,_n_visits=4 ,_Q=-0.471863 ,_u=0.319050 ,_P=0.021861 
			, act=35,_n_visits=2 ,_Q=0.249568 ,_u=1.123565 ,_P=0.389214 
		, act=45,_n_visits=3 ,_Q=-0.568997 ,_u=0.347658 ,_P=0.019057 
			, act=35,_n_visits=2 ,_Q=0.515761 ,_u=1.418398 ,_P=0.401184 
		, act=48,_n_visits=3 ,_Q=-0.259834 ,_u=0.099201 ,_P=0.005438 
		, act=49,_n_visits=2 ,_Q=-0.330507 ,_u=0.054329 ,_P=0.002234 
		, act=50,_n_visits=2 ,_Q=-0.509679 ,_u=0.127728 ,_P=0.005251 
		, act=51,_n_visits=2 ,_Q=-0.628834 ,_u=0.259326 ,_P=0.010661 
		, act=57,_n_visits=10 ,_Q=-0.194946 ,_u=0.027902 ,_P=0.004206 
			, act=35,_n_visits=5 ,_Q=0.339731 ,_u=1.078280 ,_P=0.359427 
				, act=21,_n_visits=2 ,_Q=-0.286942 ,_u=1.430482 ,_P=0.429145 
				, act=42,_n_visits=2 ,_Q=-0.439994 ,_u=1.555351 ,_P=0.311070 
			, act=36,_n_visits=3 ,_Q=0.068489 ,_u=1.160878 ,_P=0.309567 
	, act=29,_n_visits=4 ,_Q=0.298291 ,_u=0.298917 ,_P=0.010575 
		, act=36,_n_visits=2 ,_Q=-0.540858 ,_u=1.094615 ,_P=0.379186 
	, act=34,_n_visits=3 ,_Q=0.237141 ,_u=0.261359 ,_P=0.007397 
	, act=35,_n_visits=164 ,_Q=0.471148 ,_u=0.157738 ,_P=0.184153 
		, act=6,_n_visits=3 ,_Q=-0.268087 ,_u=0.037671 ,_P=0.002361 
		, act=8,_n_visits=2 ,_Q=-0.168531 ,_u=0.111599 ,_P=0.005245 
		, act=18,_n_visits=2 ,_Q=-0.412313 ,_u=0.360184 ,_P=0.016927 
		, act=19,_n_visits=7 ,_Q=-0.269322 ,_u=0.156382 ,_P=0.019598 
			, act=28,_n_visits=3 ,_Q=-0.013959 ,_u=1.428502 ,_P=0.466547 
				, act=21,_n_visits=2 ,_Q=0.118545 ,_u=1.938271 ,_P=0.548226 
			, act=36,_n_visits=3 ,_Q=0.723727 ,_u=0.795652 ,_P=0.194894 
		, act=20,_n_visits=2 ,_Q=-0.453386 ,_u=0.192174 ,_P=0.009031 
		, act=21,_n_visits=9 ,_Q=-0.226283 ,_u=0.156564 ,_P=0.024526 
			, act=27,_n_visits=5 ,_Q=0.488635 ,_u=0.656950 ,_P=0.232267 
				, act=19,_n_visits=4 ,_Q=-0.544839 ,_u=1.526171 ,_P=0.610468 
					, act=20,_n_visits=3 ,_Q=0.707047 ,_u=2.385997 ,_P=0.826534 
			, act=36,_n_visits=3 ,_Q=-0.167797 ,_u=1.099131 ,_P=0.310881 
				, act=37,_n_visits=2 ,_Q=0.369261 ,_u=1.709329 ,_P=0.483471 
		, act=26,_n_visits=5 ,_Q=-0.488121 ,_u=0.416311 ,_P=0.039130 
			, act=27,_n_visits=3 ,_Q=0.620914 ,_u=1.154112 ,_P=0.346234 
				, act=19,_n_visits=2 ,_Q=-0.628788 ,_u=2.815251 ,_P=0.796273 
		, act=27,_n_visits=4 ,_Q=-0.551821 ,_u=0.368254 ,_P=0.028844 
			, act=28,_n_visits=3 ,_Q=0.679648 ,_u=1.764206 ,_P=0.611139 
		, act=28,_n_visits=51 ,_Q=-0.643413 ,_u=0.614193 ,_P=0.500316 
			, act=26,_n_visits=2 ,_Q=-0.046136 ,_u=0.528276 ,_P=0.044826 
			, act=27,_n_visits=22 ,_Q=0.642367 ,_u=0.573347 ,_P=0.372984 
				, act=19,_n_visits=19 ,_Q=-0.616728 ,_u=0.968660 ,_P=0.803240 
					, act=10,_n_visits=4 ,_Q=0.458661 ,_u=0.965780 ,_P=0.182109 
					, act=37,_n_visits=11 ,_Q=0.803694 ,_u=0.679286 ,_P=0.384262 
						, act=36,_n_visits=10 ,_Q=-0.808126 ,_u=1.456837 ,_P=0.921385 
							, act=20,_n_visits=6 ,_Q=0.840890 ,_u=1.425785 ,_P=0.570314 
							, act=44,_n_visits=3 ,_Q=0.693072 ,_u=1.224395 ,_P=0.326505 
					, act=43,_n_visits=2 ,_Q=0.216855 ,_u=1.060660 ,_P=0.150000 
			, act=36,_n_visits=26 ,_Q=0.701468 ,_u=0.566880 ,_P=0.416879 
				, act=37,_n_visits=21 ,_Q=-0.686978 ,_u=0.928380 ,_P=0.779840 
					, act=19,_n_visits=11 ,_Q=0.901339 ,_u=0.562010 ,_P=0.276472 
						, act=27,_n_visits=10 ,_Q=-0.903239 ,_u=1.488088 ,_P=0.941149 
							, act=26,_n_visits=3 ,_Q=0.975513 ,_u=1.247852 ,_P=0.249570 
							, act=29,_n_visits=6 ,_Q=0.854688 ,_u=1.336465 ,_P=0.623684 
								, act=22,_n_visits=2 ,_Q=-0.987045 ,_u=1.876242 ,_P=0.335632 
					, act=27,_n_visits=3 ,_Q=0.211159 ,_u=0.696040 ,_P=0.124511 
						, act=19,_n_visits=2 ,_Q=-0.074831 ,_u=3.154536 ,_P=0.892238 
					, act=46,_n_visits=5 ,_Q=0.633744 ,_u=0.788997 ,_P=0.211710 
						, act=19,_n_visits=2 ,_Q=-0.401619 ,_u=1.329921 ,_P=0.265984 
		, act=29,_n_visits=2 ,_Q=-0.353277 ,_u=0.215836 ,_P=0.010143 
		, act=32,_n_visits=2 ,_Q=-0.433429 ,_u=0.026845 ,_P=0.001262 
		, act=36,_n_visits=4 ,_Q=-0.656860 ,_u=0.481576 ,_P=0.037720 
			, act=28,_n_visits=3 ,_Q=0.805237 ,_u=1.726618 ,_P=0.598118 
		, act=37,_n_visits=13 ,_Q=-0.178011 ,_u=0.096716 ,_P=0.021211 
			, act=27,_n_visits=4 ,_Q=0.331033 ,_u=0.707519 ,_P=0.204243 
				, act=19,_n_visits=2 ,_Q=-0.308755 ,_u=1.175578 ,_P=0.271488 
			, act=28,_n_visits=8 ,_Q=0.138524 ,_u=1.078394 ,_P=0.498089 
				, act=21,_n_visits=6 ,_Q=0.006721 ,_u=1.015354 ,_P=0.460522 
					, act=29,_n_visits=4 ,_Q=0.243062 ,_u=1.602219 ,_P=0.573227 
						, act=30,_n_visits=3 ,_Q=-0.259366 ,_u=2.046454 ,_P=0.708912 
		, act=42,_n_visits=3 ,_Q=-0.716200 ,_u=0.542660 ,_P=0.034004 
			, act=27,_n_visits=2 ,_Q=0.824671 ,_u=1.270746 ,_P=0.359421 
		, act=44,_n_visits=3 ,_Q=-0.690611 ,_u=0.492385 ,_P=0.030853 
			, act=36,_n_visits=2 ,_Q=0.891452 ,_u=1.318017 ,_P=0.372791 
		, act=46,_n_visits=2 ,_Q=-0.265342 ,_u=0.152422 ,_P=0.007163 
		, act=48,_n_visits=3 ,_Q=-0.014280 ,_u=0.108676 ,_P=0.005107 
			, act=28,_n_visits=2 ,_Q=0.115462 ,_u=1.422586 ,_P=0.402368 
	, act=36,_n_visits=206 ,_Q=0.493156 ,_u=0.135389 ,_P=0.198295 
		, act=1,_n_visits=2 ,_Q=-0.327716 ,_u=0.061656 ,_P=0.002584 
		, act=2,_n_visits=2 ,_Q=-0.525546 ,_u=0.057245 ,_P=0.002399 
		, act=3,_n_visits=2 ,_Q=-0.422939 ,_u=0.050571 ,_P=0.002119 
		, act=4,_n_visits=2 ,_Q=-0.377521 ,_u=0.093878 ,_P=0.003934 
		, act=8,_n_visits=6 ,_Q=-0.197412 ,_u=0.047235 ,_P=0.004619 
			, act=27,_n_visits=2 ,_Q=0.172668 ,_u=1.422024 ,_P=0.381569 
			, act=28,_n_visits=2 ,_Q=0.114629 ,_u=0.983914 ,_P=0.264012 
		, act=9,_n_visits=2 ,_Q=-0.248283 ,_u=0.040006 ,_P=0.001676 
		, act=10,_n_visits=2 ,_Q=-0.553534 ,_u=0.123933 ,_P=0.005193 
		, act=18,_n_visits=9 ,_Q=-0.462452 ,_u=0.332581 ,_P=0.046457 
			, act=28,_n_visits=2 ,_Q=-0.292724 ,_u=1.154056 ,_P=0.244812 
			, act=35,_n_visits=6 ,_Q=0.792367 ,_u=0.660128 ,_P=0.280068 
				, act=34,_n_visits=4 ,_Q=-0.849978 ,_u=1.554532 ,_P=0.556166 
					, act=26,_n_visits=3 ,_Q=0.897936 ,_u=1.961762 ,_P=0.679574 
		, act=19,_n_visits=4 ,_Q=-0.306404 ,_u=0.140312 ,_P=0.009800 
			, act=27,_n_visits=3 ,_Q=0.508752 ,_u=1.169199 ,_P=0.405023 
				, act=18,_n_visits=2 ,_Q=-0.651676 ,_u=2.156640 ,_P=0.609990 
		, act=20,_n_visits=15 ,_Q=-0.084819 ,_u=0.116576 ,_P=0.024426 
			, act=27,_n_visits=8 ,_Q=-0.107960 ,_u=1.083542 ,_P=0.463342 
				, act=18,_n_visits=6 ,_Q=0.395494 ,_u=1.210155 ,_P=0.548875 
					, act=19,_n_visits=5 ,_Q=-0.302121 ,_u=1.689607 ,_P=0.755615 
						, act=11,_n_visits=3 ,_Q=0.648154 ,_u=1.694776 ,_P=0.508433 
			, act=28,_n_visits=2 ,_Q=0.303412 ,_u=0.385879 ,_P=0.061878 
			, act=35,_n_visits=4 ,_Q=0.443864 ,_u=0.497722 ,_P=0.133022 
		, act=21,_n_visits=5 ,_Q=-0.339598 ,_u=0.177279 ,_P=0.014858 
			, act=27,_n_visits=4 ,_Q=0.410865 ,_u=1.229464 ,_P=0.491786 
				, act=18,_n_visits=2 ,_Q=-0.164816 ,_u=1.406258 ,_P=0.487142 
		, act=27,_n_visits=49 ,_Q=-0.766117 ,_u=0.647778 ,_P=0.452427 
			, act=28,_n_visits=22 ,_Q=0.813776 ,_u=0.563106 ,_P=0.357621 
				, act=20,_n_visits=18 ,_Q=-0.797705 ,_u=1.048455 ,_P=0.823650 
					, act=34,_n_visits=11 ,_Q=0.917386 ,_u=0.764274 ,_P=0.407800 
						, act=35,_n_visits=10 ,_Q=-0.909520 ,_u=1.490832 ,_P=0.942885 
							, act=19,_n_visits=4 ,_Q=0.842634 ,_u=1.239081 ,_P=0.413027 
							, act=43,_n_visits=5 ,_Q=0.962893 ,_u=1.273770 ,_P=0.424590 
								, act=19,_n_visits=2 ,_Q=-0.967556 ,_u=1.495460 ,_P=0.448638 
					, act=35,_n_visits=3 ,_Q=0.800281 ,_u=0.719312 ,_P=0.139567 
						, act=34,_n_visits=2 ,_Q=-0.796763 ,_u=3.087315 ,_P=0.873225 
					, act=44,_n_visits=2 ,_Q=0.741146 ,_u=0.720837 ,_P=0.104897 
			, act=35,_n_visits=24 ,_Q=0.797960 ,_u=0.530278 ,_P=0.382695 
				, act=34,_n_visits=18 ,_Q=-0.756013 ,_u=0.985338 ,_P=0.739646 
					, act=20,_n_visits=6 ,_Q=0.759747 ,_u=0.863677 ,_P=0.293262 
						, act=28,_n_visits=5 ,_Q=-0.721861 ,_u=2.066503 ,_P=0.924168 
							, act=29,_n_visits=3 ,_Q=0.956090 ,_u=1.280097 ,_P=0.384029 
					, act=28,_n_visits=3 ,_Q=0.548980 ,_u=0.695815 ,_P=0.135008 
						, act=20,_n_visits=2 ,_Q=-0.389072 ,_u=2.936032 ,_P=0.830435 
					, act=37,_n_visits=5 ,_Q=0.794501 ,_u=0.745048 ,_P=0.180701 
						, act=38,_n_visits=3 ,_Q=-0.956281 ,_u=1.786278 ,_P=0.714511 
							, act=20,_n_visits=2 ,_Q=0.982556 ,_u=2.506504 ,_P=0.708946 
					, act=41,_n_visits=3 ,_Q=0.821011 ,_u=0.699014 ,_P=0.135629 
		, act=28,_n_visits=4 ,_Q=-0.611261 ,_u=0.475917 ,_P=0.033239 
			, act=27,_n_visits=3 ,_Q=0.775069 ,_u=1.532128 ,_P=0.530745 
		, act=29,_n_visits=10 ,_Q=-0.441284 ,_u=0.289547 ,_P=0.044490 
			, act=27,_n_visits=2 ,_Q=0.323227 ,_u=1.052465 ,_P=0.210493 
			, act=28,_n_visits=6 ,_Q=0.476465 ,_u=1.075635 ,_P=0.430254 
				, act=20,_n_visits=5 ,_Q=-0.466093 ,_u=1.835785 ,_P=0.820988 
					, act=38,_n_visits=2 ,_Q=0.702720 ,_u=0.608844 ,_P=0.182653 
		, act=32,_n_visits=2 ,_Q=-0.275379 ,_u=0.084119 ,_P=0.003525 
		, act=33,_n_visits=2 ,_Q=-0.501406 ,_u=0.103510 ,_P=0.004338 
		, act=34,_n_visits=10 ,_Q=-0.271272 ,_u=0.133494 ,_P=0.020512 
			, act=27,_n_visits=7 ,_Q=0.276063 ,_u=1.178679 ,_P=0.550050 
				, act=18,_n_visits=4 ,_Q=-0.058847 ,_u=1.219894 ,_P=0.498020 
					, act=26,_n_visits=3 ,_Q=0.272370 ,_u=1.924102 ,_P=0.666528 
						, act=25,_n_visits=2 ,_Q=0.075270 ,_u=1.911006 ,_P=0.540514 
				, act=45,_n_visits=2 ,_Q=-0.528664 ,_u=1.889566 ,_P=0.308565 
			, act=28,_n_visits=2 ,_Q=0.511357 ,_u=0.756065 ,_P=0.151213 
		, act=35,_n_visits=6 ,_Q=-0.657799 ,_u=0.430329 ,_P=0.042078 
			, act=27,_n_visits=3 ,_Q=0.812160 ,_u=1.472037 ,_P=0.394989 
			, act=28,_n_visits=2 ,_Q=0.689378 ,_u=1.313297 ,_P=0.352395 
		, act=42,_n_visits=2 ,_Q=-0.717491 ,_u=0.453573 ,_P=0.019007 
		, act=43,_n_visits=3 ,_Q=-0.706385 ,_u=0.537022 ,_P=0.030006 
			, act=35,_n_visits=2 ,_Q=0.902659 ,_u=1.390399 ,_P=0.393264 
		, act=45,_n_visits=4 ,_Q=-0.837022 ,_u=0.576621 ,_P=0.040273 
			, act=35,_n_visits=2 ,_Q=0.836585 ,_u=0.734688 ,_P=0.254503 
		, act=48,_n_visits=13 ,_Q=-0.183513 ,_u=0.029456 ,_P=0.005760 
			, act=27,_n_visits=6 ,_Q=0.139221 ,_u=1.015738 ,_P=0.410506 
				, act=18,_n_visits=3 ,_Q=-0.018522 ,_u=1.534746 ,_P=0.411816 
					, act=45,_n_visits=2 ,_Q=0.381959 ,_u=2.034072 ,_P=0.575322 
				, act=45,_n_visits=2 ,_Q=-0.320098 ,_u=1.566204 ,_P=0.420257 
			, act=28,_n_visits=3 ,_Q=0.252122 ,_u=1.295780 ,_P=0.224436 
				, act=20,_n_visits=2 ,_Q=-0.108003 ,_u=1.482892 ,_P=0.419425 
			, act=35,_n_visits=3 ,_Q=0.336723 ,_u=0.600453 ,_P=0.138669 
		, act=49,_n_visits=2 ,_Q=-0.293640 ,_u=0.043256 ,_P=0.001813 
		, act=50,_n_visits=4 ,_Q=-0.395144 ,_u=0.146799 ,_P=0.010253 
			, act=27,_n_visits=2 ,_Q=0.151526 ,_u=1.403314 ,_P=0.486122 
		, act=57,_n_visits=5 ,_Q=-0.282833 ,_u=0.051124 ,_P=0.004285 
			, act=35,_n_visits=2 ,_Q=0.775605 ,_u=1.292591 ,_P=0.258518 
	, act=37,_n_visits=4 ,_Q=0.259429 ,_u=0.302998 ,_P=0.010719 
		, act=28,_n_visits=2 ,_Q=-0.460422 ,_u=1.269522 ,_P=0.439775 
	, act=42,_n_visits=7 ,_Q=-0.250396 ,_u=0.828577 ,_P=0.046901 
		, act=35,_n_visits=4 ,_Q=0.330735 ,_u=0.872967 ,_P=0.356387 
	, act=43,_n_visits=3 ,_Q=0.215870 ,_u=0.311616 ,_P=0.008819 
		, act=36,_n_visits=2 ,_Q=-0.150397 ,_u=1.779955 ,_P=0.503447 
	, act=44,_n_visits=3 ,_Q=0.215788 ,_u=0.262477 ,_P=0.007429 
		, act=35,_n_visits=2 ,_Q=-0.165841 ,_u=1.632656 ,_P=0.461785 
	, act=45,_n_visits=6 ,_Q=-0.228874 ,_u=0.816619 ,_P=0.040446 
		, act=36,_n_visits=4 ,_Q=0.339856 ,_u=1.106024 ,_P=0.395703 

3.2.1.2 构建蒙特卡洛树节点

Gamma公式展示 Γ ( n ) = ( n − 1 ) ! ∀ n ∈ N \Gamma(n) = (n-1)!\quad\forall n\in\mathbb N Γ(n)=(n1)!nN 是通过 Euler integral

Γ ( z ) = ∫ 0 ∞ t z − 1 e − t d t   . \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,. Γ(z)=0tz1etdt.
构建过程

1、初始搜索树为只有根节点。
2、搜索路径:依据选择函数来确定
选择函数:
所有儿子节点中权重最大值
权重:(抑制已经选择过的)
u= c_puct * _P * ( V i s i t ( p ) / V i s i t ) \sqrt(Visit(p)/Visit) ( Visit(p)/Visit)
Q+u

3、叶子节点再扩充搜索树节点
3.1 、搜索树节点:依据策略函数来确定
4、更新搜索路径值
4.1 上级节点更新
4.2 本级节点更新
访问节点数: +1
Q += 增量价值/访问次数

s9
s8
s7
s6
s5
s4
s3
s2
s1
act36
act35
act36
act28
act35
act36
act27
act28
act35
act36
act27
act28
act35
act36
act27
act27
act28
act35
act35
act36
act27
act27
act28
act35
act35
act28
act36
act27
act27
act36
act28
act35
act35
act28
act36
act27
act27
act36
act28
act35
act35
act28
act36
act36
act27
visited:10
visited:2
visited:1
visited:2
visited:1
visited:3
visited:2
visited:1
visited:2
visited:1
visited:9
visited:2
visited:1
visited:2
visited:1
visited:2
visited:1
visited:2
visited:1
visited:8
visited:1
visited:2
visited:1
visited:2
visited:1
visited:2
visited:1
visited:7
visited:1
visited:2
visited:1
visited:1
visited:2
visited:1
visited:6
visited:1
visited:1
visited:1
visited:2
visited:1
visited:5
visited:1
visited:1
visited:1
visited:1
visited:4
visited:1
visited:1
visited:1
visited:3
visited:1
visited:1
visited:2
visited:1
player MCTS PolicyNet Node get_move_probs( state):act copy.deepcopy(state):state_copy _playout(state_copy) getNext(node):node policy(state_copy):action_probs, leaf_value expand(action_probs) update_recursive(leaf_value) loop [ ] player MCTS PolicyNet Node

数据变化

MSTC

属性 方法 值 变化
_root Node(1) _playout 1、getPath 2、 _policy 3、expand 4、update
_policy
_c_puct
_n_playout

Node

属性 方法 值变化
_parent init self._parent = parent
_children {} expand 由action_priors 生成 _children[action] = TreeNode( prob)
_n_visits 0 update _n_visits +=1
_Q 0 update self._Q += 1.0*(leaf_value - self._Q) / self._n_visits
_u 0 get_value self._u = (c_puct * self._P * np.sqrt(self._parent._n_visits) / (1 + self._n_visits))
_p prior_p init self._P = prior_p

待续未完。。。

你可能感兴趣的:(大数据,人工智能)