hwnbox

【论文翻译】Mastering the game of Go with deep neural networks and tree search( 用深度神经网络和树搜索实现围棋游戏)

【原文作者及来源：Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search.[J]. Nature, 2016, 529(7587):484-489.】

【此译文由COCO主要完成，对MarkDown编辑器正在熟悉过程中，因此，文章中相关公式存在问题，请见谅】

【原文】The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

【翻译】围棋因其庞大的搜索空间以及评估棋局和落子的难度，一直被认为是人工智能领域中最具挑战性的经典游戏。在这里，我们引入了一种新的计算围棋的方法，它使用“价值网络”来评估棋局，利用“策略网络”来选择落子位置。对围棋高手下过的棋局进行监督学习，通过自我博弈的棋局进行强化学习，并将两者结合起来训练深度神经网络。没有任何前向搜索，深度神经网络通过模拟成千上万的随机自我博弈，达到了国家最先进的蒙特卡洛树搜索程序的水准。我们也会介绍一种新的搜索算法，这个算法将蒙特卡仿真和价值网络、策略网络结合起来。通过这个搜索算法，相比于其他的围棋程序，AlphaGo可以达到99.8%的胜算率，并以5比0在欧洲击败人类围棋选手。这是第一次在全尺寸围棋中，一个计算机程序击败了人类职业选手。而这一壮举在以前被认为至少需要十年的时间。

【原文】All games of perfect information have an optimal value function which determines the outcome of the game, from every board position or state , under perfect play by all players. These games may be solved by recursively computing the optimal value function in a search tree containing approximately possible sequences of moves, where b is the game’s breadth (number of legal moves per position) and d is its depth (game length). In large games, such as chess (b ≈ 35, d ≈ 80) and especially Go (b ≈ 250, d ≈ 150), exhaustive search is infeasible , but the effective search space can be reduced by two general principles . First, the depth of the search may be reduced by position evaluation: truncating the search tree at state and replacing the subtree below s by an approximate value function that predicts the outcome from state . This approach has led to superhuman performance in chess, checkers and othello, but it was believed to be intractable in Go due to the complexity of the game. Second, the breadth of the search may be reduced by sampling actions from a policy that is a probability distribution over possible moves in position . For example, Monte Carlo rollouts search to maximum depth without branching at all, by sampling long sequences of actions for both players from a policy . Averaging over such rollouts can provide an effective position evaluation, achieving superhuman performance in backgammon and Scrabble, and weak amateur level play in Go.

【翻译】当每个棋手都发挥最佳时，完全信息博弈有一个最优值函数，它决定了每个棋局或者状态之后博弈结果的好坏。这些游戏在包含大约个可能的移动序列的搜索树中，通过递归计算最优值函数。其中b 是游戏的宽度（每次下棋合法的落子个数）， d是它的深度（博弈的步数长度）。在国际象棋中， b≈35， d≈80，但是在围棋中，b ≈250，d ≈150，因此穷举搜索是不可行的，但可以根据两个大体原则减小有效搜索空间。第一个原则是，搜索的深度可以通过棋局评估减少：在状态时对搜索树进行剪枝，并且通过一个近似的估值函数来替换下面的子树，用这个近似的估值函数预测状态s之后的对弈结果。这种方法在国际象棋、跳棋、黑白棋中都获得了超人的表现。但由于围棋的复杂性，这种方法仍旧难以应付。第二个原则是，搜索的广度可以通过策略（在位置处可能下棋走子的概率分布）来减少。例如，比如蒙特卡洛走子方法搜索到最大深度时候根本不使用分枝界定法，它通过策略 p 对双方棋手的一系列下棋走法进行采样。计算这些走子的平均数就可以产生一个有效的棋局评估，这在五子棋和拼字游戏中实现了超人的表现，并且能在围棋中达到业余段位水平。

【原文】Rollout： In backgammon parlance, the expected value of a position is known as the "equity" of the position, and estimating the equity by Monte-Carlo sampling is known as performing a "rollout" .

【翻译】Rollout：在西洋双陆棋中，每一个位置的期望值就叫做这个位置的"equity"，通过蒙特卡洛采样对"equity"进行估计就叫做进行"rollout"。论文中一般称为快速走棋。

【原文】Monte Carlo tree search (MCTS), uses Monte Carlo rollouts to estimate the value of each state in a search tree. As more simulations are executed, the search tree grows larger and the relevant values become more accurate. The policy used to select actions during search is also improved over time, by selecting children with higher values. Asymptotically, this policy converges to optimal play, and the evaluations converge to the optimal value function. The strongest current Go programs are based on MCTS, enhanced by policies that are trained to predict human expert moves. These policies are used to narrow the search to a beam of high-probability actions, and to sample actions during rollouts. This approach has achieved strong amateur play. However, prior work has been limited to shallow policies or value functions based on a linear combination of input features.

【翻译】蒙特卡洛树搜索，利用Monte Carlo对搜索树中的每个状态的价值进行评估。模拟次数越多，搜索树就越大，相关的估值也变得更加精确。在搜索过程中用于选择下棋动作的策略，也随着时间的推移有所改进，这种改进也就是选择具有更高价值的子树。渐渐地，该策略收敛于最优下法，并且评估也收敛到最优值函数。目前最强的围棋程序是基于MCTS的，通过训练来预测人类棋手的落子，从而越来越强。这些策略过去是用来缩小搜索范围的，使搜索范围成为一束高概率的下棋动作，并且用来在rollout中对动作进行采样。这种方法达到了较好的业余段位水平，但是，以前的工作仅局限于基于对输入特征进行线性组合的估值函数或者浅层策略的限制。

【原文】Recently, deep convolutional neural networks have achieved unprecedented performance in visual domains: for example, image classification, face recognition, and playing Atari games. They use many layers of neurons, each arranged in overlapping tiles, to construct increasingly abstract, localized representations of an image. We employ a similar architecture for the game of Go. We pass in the board position as a 19 × 19 image and use convolutional layers to construct a representation of the position. We use these neural networks to reduce the effective depth and breadth of the search tree: evaluating positions using a value network, and sampling actions using a policy network. We train the neural networks using a pipeline consisting of several stages of machine learning (Fig. 1). We begin by training a supervised learning (SL) policy network directly from expert human moves. This provides fast, efficient learning updates with immediate feedback and high-quality gradients. Similar to prior work, we also train a fast policy that can rapidly sample actions during rollouts. Next, we train a reinforcement learning (RL) policy network that improves the SL policy network by optimizing the final outcome of games of self-play. This adjusts the policy towards the correct goal of winning games, rather than maximizing predictive accuracy. Finally, we train a value network that predicts the winner of games played by the RL policy network against itself. Our program AlphaGo efficiently combines the policy and value networks with MCTS.

【翻译】近日，深度卷积神经网络在视觉领域取得了前所未有的成绩，例如图像分类、人脸识别和Atari游戏。它们使用许多层神经元，层与层之间像瓦片一样排列重叠在一起，来构造逐渐抽象的、局部的图像表示。我们采用类似的架构来进行围棋游戏。我们将棋局看成一个19×19的图像，并且使用卷积层来表示棋局。我们使用神经网络来减少搜索树的有效深度和广度：使用价值网络评估棋局；使用策略网络来对落子动作进行取样。我们使用由机器学习的几个阶段组成的训练流程来训练神经网络（图1）。我们首先直接利用人类棋手的落子训练监督学习（SL）策略网络。这通过即时的反馈和高质量的梯度，提供了快速、高效的学习更新。与以前的工作类似，我们还训练了快速走棋策略网络，使其能在rollout中迅速对动作进行采样。接下来，我们训练强化学习（RL）策略网络，通过优化自我博弈的最终结果，来改善SL策略网络的性能。我们以是否能够赢得比赛为标准，而不是以最大限度地提高预测精度为标准对策略进行调整。最后，我们训练一个价值网络，来预测通过训练过后的RL策略网进行自我博弈的结果。AlphaGo就是利用MCTS将上述策略网络和价值网络有效结合在一起的。

【原文】Figure 1 | Neural network training pipeline andarchitecture.

【翻译】图1 神经网络训练流程图和结构

【原文】a. A fast rollout policy and supervised learning (SL) policy network are trained to predict human expert moves in a data set ofpositions. A reinforcement learning (RL) policy network is initialized to the SL policy network, and is then improvedby policy gradient learning to maximize the outcome (that is, winning moregames) against previous versions of the policy network. A new data set isgenerated by playing games of self-play with the RL policy network. Finally, avalue network is trained by regression to predict the expected outcome(that is, whether the current player wins) in positions from the self-play dataset.

【翻译】a.一个快速走子策略和监督学习SL策略网络来训练用于预测人类棋手在一些棋局数据集中的落子。强化学习（RL）策略网络被初始化为SL策略网络，并且通过策略梯度学习来使结果与之前策略网络的版本相比最大化（也就是，赢得更多的比赛），进而使该网络得到改善，这样就会产生一个新的数据集合。通过结合RL策略网络进行自我对弈，最终通过回归训练，产生一个价值网络，来预测自我博弈数据集中棋局的期望结果（也就是当前玩家是否能赢）

【原文】b, Schematic representation of the neural network architecture used in AlphaGo. The policy network takes a representation of the board position s as its input, passes it through many convolutional layers with parameters σ (SL policy network) or ρ (RL policy network), and outputs a probability distribution or over legal moves a, represented by a probability map over the board. The value network similarly uses many convolutional layers with parameters θ, but outputs a scalar value that predicts the expected outcome in position s′.

【翻译】b, AlphaGo使用的神经网络体系结构示意图。策略网络将棋局的状态s作为输入，将它通过很多带有参数σ（SL策略网络）或者参数ρ（RL策略网络）的卷积层，输出一个合法下棋动作a的概率分布：或者，由棋盘的概率图表示。与之相似，价值网络使用很多参数为θ的卷积层，但是输出是一个标量值，它预测了棋局s′的期望结果。

【原文】Supervised learning of policy networks

For the first stage of the training pipeline, we build on prior work on predicting expert moves in the game of Go using supervised learning. The SL policy network alternates between convolutional layers with weights , and rectifier nonlinearities. A final softmax layer outputs a probability distribution over all legal moves . The input to the policy network is a simple representation of the board state (see Extended Data Table 2). The policy network is trained on randomly sampled state-action pairs , using stochastic gradient ascent to maximize the likelihood of the human move selected in state

【翻译】策略网络的监督学习

在训练流程的第一阶段，我们进行了基于监督学习的、对人类棋手的落子进行预测的前期工作。权重为

的卷积层和非线性层在SL策略网络

中交替出现，最后一个分类器层输出所有合法移动

的概率分布。策略网络的输入

是对棋局状态的简单表示（参见扩展数据表2）。使用随机取样的状态动作对

来训练策略网络，使用随机梯度上升来使人类棋手在状态中走子

的可能性最大化。

扩展数据表2 神经网络的输入特征

【论文翻译】Mastering the game of Go with deep neural networks and tree search( 用深度神经网络和树搜索实现围棋游戏)_第2张图片

【原文】We trained a 13-layer policy network, which we call the SL policy network, from 30 million positions from the KGS Go Server. The network predicted expert moves on a held out test set with an accuracy of 57.0% using all input features, and 55.7% using only raw board position and move history as inputs, compared to the state-of-the-art from other research groups of 44.4% at date of submission (full results in Extended Data Table 3). Small improvements in accuracy led to large improvements in playing strength (Fig. 2a); larger networks achieve better accuracy but are slower to evaluate during search. We also trained a faster but less accurate rollout policy

, using a linear softmax of small pattern features (see Extended Data Table 4) with weights π ; this achieved an accuracy of 24.2%, using just 2

to select an action, rather than 3 ms for the policy network.

【翻译】我们利用KGS围棋服务器的3000万个棋局训练了一个13层的策略网络，我们称之为SL策略网络。该网络使用所有的输入特征，在一个连续测试集上来预测人类棋手的走棋，预测准确率为57%；在只使用原始棋局和下棋记录作为输入时，准确率55.7%。与之相比，截至到本篇论文提交（2015年）时，其他研究团队的最先进的精度是44.4%（全部结果在扩展数据表3）。在精确度上小的改进可以导致健壮性上大的改进（图2a）；更大的神经网络实现可以更好的准确性，但在搜索过程中评估速度较慢。我们还训练了一个更快但准确率更低的走棋策略，使用权重为

、具有小型模式特征的线性softmax（参见扩展数据表4），仅用2

选择落子位置，预测准确度为 24.2%。与之相比，策略网络需要3毫秒。

【论文翻译】Mastering the game of Go with deep neural networks and tree search( 用深度神经网络和树搜索实现围棋游戏)_第3张图片

【论文翻译】Mastering the game of Go with deep neural networks and tree search( 用深度神经网络和树搜索实现围棋游戏)_第4张图片

【原文】Reinforcement learning of policy networks
The second stage of the training pipeline aims at improving the policy network by policy gradient reinforcement learning (RL). The RL policy network

is identical in structure to the SL policy network, and its weights

are initialized to the same values,

. We play games between the current policy network

and a randomly selected previous iteration of the policy network. Randomizing from a pool of opponents in this way stabilizes training by preventing overfitting to the current policy. We use a reward function

that is zero for all non-terminal time steps

. The outcome

is the terminal reward at the end of the game from the perspective of the current player at time step

: +1 for winning and −1 for losing. Weights are then updated at each time step t by stochastic gradient ascent in the direction that maximizes expected outcome.

【翻译】策略网络的强化学习

训练流程的第二阶段旨在通过策略梯度强化学习来改善策略网络。RL策略网络

在结构上与SL策略网络相同，并且权重

初始化为相同的值

。我们将当前版本的策略网络和随机选择先前一次迭代的策略网络进行博弈。通过从对手池中随机选取策略网络与之博弈，可以防止过拟合，从而使训练更稳定。我们使用一个奖励函数

，对于所有非终端的步骤

来讲，

为零。在比赛结束时，从当前玩家的视角来看，在时间步时，

是最终的奖励，赢者为+1，输者为-1。然后权重在每一个步骤 t 更新：朝向最大化预期结果的方向随机梯度递增。

【原文】We evaluated the performance of the RL policy network in game play, sampling each move from its output probability distribution over actions. When played head-to-head, the RL policy network won more than 80% of games against the SL policy network. We also tested against the strongest open-source Go program, Pachi, a sophisticated Monte Carlo search program, ranked at 2 amateur dan on KGS, that executes 100,000 simulations per move. Using no search at all, the RL policy network won 85% of games against Pachi. In comparison, the previous state-of-the-art, based only on supervised learning of convolutional networks, won 11% of games against Pachi and 12% against a slightly weaker program, Fuego.
【翻译】我们评估了RL策略网络在游戏中的性能，从它输出的下棋动作的概率分布中，对每一下棋动作进行取样。当正面交锋时，RL策略网络相比于SL策略网络，赢得了超过80%的游戏。我们还测试了最强大的开源围棋程序Pachi。它是一个复杂的蒙特卡洛搜索程序，在KGS围棋服务器中业余段位第二，每次移动执行100000次模拟。RL策略网络在不使用搜索的情况下，相比于Pachi，胜率为85%。相比之下，先前最先进的、仅仅基于监督学习的卷积网络，相比于Pachi和较弱的程序Fuego，胜率为12%和11%。

【原文】Reinforcement learning of value networks
The final stage of the training pipeline focuses on position evaluation, estimating a value function that predicts the outcome from position of games played by using policy for both players.

Ideally, we would like to know the optimal value function under perfect play ; in practice, we instead estimate the value function for our strongest policy, using the RL policy network . We approximate the value function using a value network with weights ， . This neural network has a similar architecture to the policy network, but outputs a single prediction instead of a probability distribution. We train the weights of the value network by regression on state-outcome pairs , using stochastic gradient descent to minimize the mean squared error (MSE) between the predicted value , and the corresponding outcome .

【翻译】价值网络的强化学习

训练流水线的最后阶段是棋局评估，得出价值函数

，它预测了从棋局s开始，双方棋手都使用策略

时的预测结果。

理想的情况是，我们想知道在完美下法时的最优价值函数；然而在现实中，我们利用当前最强大的RL策略网络来对价值函数做评估，并将作为最佳策略。我们使用具有权重的价值网络来近似表示价值函数，。该神经网络与策略网络具有相似的体系结构，但输出的不是概率分布，而是单一的预测值。我们通过对状态结果对进行回归来训练价值网络的权值，采用随机梯度下降使预测估值和相应的结局之间的均方误差（MSE）达到最小化。

【原文】The naive approach of predicting game outcomes from data consisting of complete games leads to overfitting. The problem is that successive positions are strongly correlated, differing by just one stone, but the regression target is shared for the entire game. When trained on the KGS data set in this way, the value network memorized the game outcomes rather than generalizing to new positions, achieving a minimum MSE of 0.37 on the test set, compared to 0.19 on the training set. To mitigate this problem, we generated a new self-play data set consisting of 30 million distinct positions, each sampled from a separate game. Each game was played between the RL policy network and itself until the game terminated. Training on this data set led to MSEs of 0.226 and 0.234 on the training and test set respectively, indicating minimal overfitting. Figure 2b shows the position evaluation accuracy of the value network, compared to Monte Carlo rollouts using the fast rollout policy

; the value function was consistently more accurate. A single evaluation of

also approached the accuracy of Monte Carlo rollouts using the RL policy network

, but using 15,000 times less computation.

【翻译】从完整的博弈数据中预测博弈结局的幼稚做法将会导致过拟合。因为两个盘面之间仅仅只是一个棋子的差别，即连续的棋局之间密切相关，但是回归的目标是要求符合整个博弈的。当我们利用这种方法在KGS围棋服务器中的数据集上进行训练时，价值网络记住了博弈的结局而没有推广出新的棋局，在测试集上最小均方误差为0.37，在训练集上最小均方误差为0.19。为了解决这个问题，我们生成了一个由3000万个不同的棋局组成的新的自我博弈数据集，每个棋局都是从不同盘博弈中采样的。每盘博弈都是在RL策略网络和它自己之间进行自对弈直到比赛结束的。在这组数据上进行训练，在训练集和测试集上的均方误差分别为0.226和0.234，这表明实现了很小的过拟合。图2b显示了价值网络相比于蒙特卡洛rollout使用快速走子策略

的位置评估精度，相比之下，价值函数总是更准确。单一评估

的精确度也接近于使用RL策略网络

的蒙特卡洛rollout，不过计算量是是原来的1/15000。

【原文】Figure 2 | Strength and accuracy of policy andvalue networks.

【翻译】图二价值网络和策略网络的健壮性和精确度

【原文】a, Plot showing the playing strength of policy networks as a function of their training accuracy. Policy networks with 128, 192, 256 and 384 convolutional filters per layer were evaluated periodically during training; the plot shows the winning rate of AlphaGo using that policy network against the match version of AlphaGo.
【翻译】a,图展示了策略网络的下棋能力随着训练精确度的函数。具有128、192、256和384个卷积核的策略网络在训练过程中被周期性评估；该图显示了AlphaGo使用策略网络的获胜概率随着不同精确度版本的变化。
【原文】b, Comparison of evaluation accuracy between the value network and rollouts with different policies. Positions and outcomes were sampled from human expert games. Each position was evaluated by a single forward pass of the value network , or by the mean outcome of 100 rollouts, played out using either uniform random rollouts, the fast rollout policy , the SL policy network or the RL policy network . The mean squared error between the predicted value and the actual game outcome is plotted against the stage of the game (how many moves had been played in the given position).

【翻译】b,价值网络与rollout相对于不同策略的评估精确度比较。棋局和最终结果是从人类专业棋手的博弈对局中取样的。每一个棋局都是由一个单独的向前传递的估值网络评估的，或者在包括使用正式的随机rollout、快速走棋网络，SL策略网络或者RL策略网络进行playout后使用100次rollout 的平均结果进行评估。预测值和实际比赛结局之间的均方误差随着博弈的进行阶段的变化（博弈总共下了多少步）显示在了图中。

【原文】Searching with policy and value networks

AlphaGo combines the policy and value networks in an MCTS algorithm (Fig. 3) that selects actions by lookahead search. Each edge of the search tree stores an action value , visit count ,and prior probability . The tree is traversed by simulation (that is, descending the tree in complete games without backup), starting from the root state. At each time step of each simulation, an action is selected from state

so as to maximize action value plus a bonus

【翻译】利用策略和价值网络进行搜索

AlphaGo在MCTS算法中结合了策略网络和价值网络（图3），MCTS算法通过向前搜索选择下棋动作。搜索树的每条边

存储一个动作估值

、访问计数

和先验概率

。树是从根节点开始进行模拟遍历（比如在完整的博弈中沿着树没有回溯地向下搜索）。在每次仿真的步骤

中，从状态

中选择一个动作

用来最大化动作价值与额外的奖励的和，额外的奖励为

【原文】Figure 3 | Monte Carlo tree search in AlphaGo.

【翻译】图三 AlphaGo中的蒙特卡洛树搜索

【原文】a,Each simulation traverses the tree by selecting the edge with maximum action value Q, plus a bonus u(P) that depends on a stored prior probability P for that edge.
b, The leaf node may be expanded; the new node is processed once by the policy network and the output probabilities are stored as prior probabilities P for each action.
c, At the end of a simulation, the leaf node is evaluated in two ways: using the value network ; and by running a rollout to the end of the game with the fast rollout policy , then computing the winner with function r.

d, Action values Q are updated to track the mean value of all evaluations r(·) and v θ (·) in the subtree below that action.

【翻译】a, 每一次模拟遍历搜索树，通过下棋动作估值 Q加上一个额外奖励 u(P)（依赖于存储的该边的先验概率 P）之和来选择边。
b,叶子结点可以被扩展。新节点通过策略网络被处理一次，并且输出概率存储为下棋动作的先验概率P。
c,在每次模拟最后，叶节点通过两种方法被评估：利用价值网络；利用rollout结合快速走棋策略网络，然后用函数r计算出赢家。
d,下棋动作价值Q被更新，来追踪经过该边的所有的评价r(·) 的平均值和该落子动作子树的v θ (·) 。

【原文】that is proportional to the prior probability but decays with repeated visits to encourage exploration. When the traversal reaches a leaf node at step , the leaf node may be expanded. The leaf position is processed just once by the SL policy network . The output probabilities are stored as prior probabilities for each legal action , .The leaf node is evaluated in two very different ways: first, by the value network ; and second, by the outcome of a random rollout played out until terminal step T using the fast rollout policy p π ; these evaluations are combined, using a mixing parameter , into a leaf evaluation .

【翻译】额外的奖励与先验概率成正相关，但与访问次数成负相关，这是为了鼓励更多的探索。当遍历在步骤到达叶节点时，叶节点可以被扩展。叶节点的棋局仅通过SL策略网络进行了一次处理。输出的概率存储下来作为每一合法下法动作 a 的先验概率 : 。叶节点以两种非常不同的方式进行评估：第一，通过价值网络；第二，通过一个采用快速走子策略的随机rollout，rollout直到终点步骤T，产生的结果为，并用来进行评估；这两种评估相结合，通过混合参数，形成叶评价

【原文】At the end of simulation, the action values and visit counts of all traversed edges are updated. Each edge accumulates the visit count and mean evaluation of all simulations passing through that edge

Where is the leaf node from the th simulation, and indicates whether an edge was traversed during the th simulation. Once the search is complete, the algorithm chooses the most visited move from the root position. It is worth noting that the SL policy network performed better in AlphaGo than the stronger RL policy network , presumably because humans select a diverse beam of promising moves, whereas RL optimizes for the single best move. However, the value function derived from the stronger RL policy network performed better in AlphaGo than a value function derived from the SL policy network.

【翻译】在模拟的结尾，对所有被遍历的边的落子动作价值和访问次数进行更新。每条边的访问次数进行累积，并且计算出通过该边的所有模拟的价值的平均值。

其中是第次模拟的叶节点, 表示在第次模拟期间边是否被遍历。一旦搜索完成，该算法从根开始选择访问次数最多的节点。值得注意的是，AlphaGo利用SL策略网络时，比更强的RL策略网络表现更好，大概是因为在SL中，人类从一束前景很好的下棋走法中选择了变化较多的走法，然而 RL的最优下棋走法过于单一。但是，AlphaGo利用起源于强大的RL策略网络的价值函数，比利用起源于SL策略网络的价值函数表现更好。

【原文】Evaluating policy and value networks requires several orders of magnitude more computation than traditional search heuristics. To efficiently combine MCTS with deep neural networks, AlphaGo uses an asynchronous multi-threaded search that executes simulations on CPUs, and computes policy and value networks in parallel on GPUs.The final version of AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs. We also implemented a distributed version of AlphaGo thatexploited multiple machines, 40 search threads, 1,202 CPUs and 176 GPUs. The Methods section provides full details of asynchronous and distributed MCTS.
【翻译】对策略网络和价值网络的评估比传统的启发式搜索在计算上多几个数量级。为了把蒙特卡洛和深度神经网络有效地结合起来，AlphaGo在很多CPU上采用异步多线程搜索进行模拟，并在并行的GPU上计算策略和价值网络。AlphaGo最终版本使用40个搜索线程，48个CPU和8个GPU。我们还实现了一个分布式版本的AlphaGo，它利用多台计算机，40个搜索线程，1202个CPU和176个GPU。方法部分提供异步和分布MCTS的全部细节和情况。
【原文】Evaluating the playing strength of AlphaGo
To evaluate AlphaGo, we ran an internal tournament among variants of AlphaGo and several other Go programs, including the strongest commercial programs Crazy Stone and Zen, and the strongest open source programs Pachi and Fuego. All of these programs are basedon high performance MCTS algorithms. In addition, we included the open source program GnuGo, a Go program using state-of-the-art search methods that preceded MCTS. All programs were allowed 5 s of computation time per move.
【翻译】评价AlphaGo下棋能力

为了评估AlphaGo的性能，我们在AlphaGo的变种和其他几个围棋程序之间进行了内部比赛，包括最强大的商业程序CrazyStone和Zen，以及最强的开放源码的程序Pachi和Fuego。所有这些程序都是基于高性能的MCTS算法的。此外，我们也邀请了开源程序GnuGo，它使用了最先进的蒙特卡洛树搜索。所有程序只允许使用最多5秒的时间对每一步的移动进行计算。

【原文】The results of the tournament (see Fig. 4a) suggest that single-machine AlphaGo is many dan ranks stronger than any previous Go program, winning 494 out of 495 games (99.8%) against other Go programs. To provide a greater challenge to AlphaGo, we also played games with four handicap stones (that is, free moves for the opponent); AlphaGo won 77%, 86%, and 99% of handicap games against Crazy Stone, Zen and Pachi, respectively. The distributed version of AlphaGo was significantly stronger, winning 77% of games against single-machine AlphaGo and 100% of its games against other programs.

【翻译】这次比赛的结果（见图4a）表明，单机AlphaGo比以往任何的围棋程序在段位排名中都靠前。与其他的围棋程序相比，单机AlphaGo在495场比赛中赢得了494场（胜率为99.8%）。为了挑战AlphaGo，我们还在让对手四目棋的情况下进行了博弈（即对手可以自由落子）；在与CrazyStone、Zen和Pachi的对阵中，AlphaGo获胜率分别为77%、86%，和99%。AlphaGo分布式版本明显更强，对单机AlphaGo的对弈胜率为77%，对其他程序的胜率为100%。

【原文】We also assessed variants of AlphaGo that evaluated positions using just the value network (λ = 0) or just rollouts (λ = 1) (see Fig. 4b). Even without rollouts AlphaGo exceeded the performance of all other Go programs, demonstrating that value networks provide a viable alternative to Monte Carlo evaluation in Go. However, the mixed evaluation (λ = 0.5) performed best, winning ≥95% of games against other variants. This suggests that the two position-evaluation mechanisms are complementary: the value network approximates the outcome of games played by the strong but impractically slow , while the rollouts can precisely score and evaluate the outcome of games played by the weaker but faster rollout policy . Figure 5 visualizes the evaluation of a real game position by AlphaGo.

【翻译】我们也对AlphaGo的不同版本进行了评估，比如只使用价值网络（λ= 0）或只是使用rollout（λ= 1）（见图4b）。即使没有使用rollouts，AlphaGo的表现也超过了其他所有围棋程序的性能，证明价值网络在围棋程序上提供了一个替代蒙特卡洛评价的可行选择。然而，价值网络和rollout的混合版本（λ= 0.5）表现最佳，相对于其他变种的博弈，胜率超过95%。这表明，两个棋局评价机制是互补的：价值网络通过更强但更慢的来逼近博弈的结果，而rollouts可以在较弱但更快的策略下得到更精确的评分和评价结局。图5显示了AlphaGo在一场真正博弈中的棋局评估能力。

【原文】Figure 4 | Tournament evaluation of AlphaGo.

【翻译】图四 AlphaGo的比赛评估

【原文】a, Results of a tournament between different Go programs (see Extended Data Tables 6–11). Each program used approximately 5 s computation time per move. To provide a greater challenge to AlphaGo, some programs (pale upper bars) were given four handicap stones (that is, free moves at the start of every game) against all opponents. Programs were evaluated on an Elo scale: a 230 point gap corresponds to a 79% probability of winning, which roughly corresponds to one amateur dan rank advantage on KGS; an approximate correspondence to human ranks is also shown, horizontal lines show KGS ranks achieved online by that program. Games against the human European champion Fan Hui were also included; these games used longer time controls. 95% confidence intervals are shown.
【翻译】a,和不同围棋程序的比赛结果（见扩展数据表6-11）。每个程序使用大约5秒来计算每次落子。为了挑战AlphaGo，一些程序得到了所有对手让4步子的优势（也就是说，每场比赛开始时的自由移动）。程序以Elo体系被评估；一个230分的差距相当于79%的胜率，大致相当于在KGS服务器上高一个业余段位；相对于人类棋手的段位也显示了出来，水平的线显示了程序在在线比赛中达到的KSG等级。对战欧洲冠军樊麾的比赛也包含了进去，这些比赛使用了很长的时间控制。图中显示了95%的置信区间。
【原文】b, Performance of AlphaGo, on a single machine, for different combinations of components. The version solely using the policy network does not perform any search.
【翻译】b, 单机版本的AlphaGo在组成部分的不同组合下的性能表现。只使用策略网络的版本没有使用任何搜索算法。
【原文】c, Scalability study of MCTS in AlphaGo with search threads and GPUs, using asynchronous search (light blue) or distributed search (dark blue), for 2 s per move.

【翻译】c, MTCS关于搜索线程和GPU的可扩展性研究，使用了异步的搜索（蓝色高亮部分）和分布式搜索（深蓝色部分），每次移动使用了2s的时间。

【原文】Figure 5 | How AlphaGo (black, to play) selected its move in an informal game against Fan Hui.

【翻译】图5 AlphaGo（执黑）是如何在对战樊麾的正式比赛中选择落子的

【原文】For each of the following statistics, the location of the maximum value is indicated by an orange circle.
a, Evaluation of all successors s′ of the root position s, using the value network vθ(s′); estimated winning percentages are shown for the top evaluations.
b, Action values Q(s, a) for each edge (s, a) in the tree from root position s; averaged over value network evaluations only (λ = 0).
c, Action values Q(s, a), averaged over rollout evaluations only (λ = 1).
d, Move probabilities directly from the SL policy network, pσ(a | s) ; reported as a percentage (if above 0.1%).
e, Percentage frequency with which actions were selected from the root during simulations.
f, The principal variation (path with maximum visit count) from AlphaGo’s search tree. The moves are presented in a numbered sequence. AlphaGo selected the move indicated by the red circle; Fan Hui responded with the move indicated by the white square; in his post-game commentary he preferred the move (labelled 1) predicted by AlphaGo.
【翻译】如下的每个统计，具有最大估值的落子位置用橘黄圆圈进行了表示。
a,根节点s的所有后继结点 s’的评价，使用的是价值网络vθ(s′)，评估的结果中靠前的会赢的百分数在图中显示出来了。
b,根节点s所在树的边(s, a)的落子动作价值Q(s, a)，只是使用价值网络的评价(λ = 0)。
c,落子动作价值Q(s, a)，只是使用rollout评价(λ = 1)。
d,直接利用SL策略网络pσ(a | s)计算出来的移动概率；大于0.1%时，报告为一个百分比。
e, 从根节点开始的模拟过程中落子位置选择的百分频率。

f, AlphaGo搜索树理论上的走子选择序列（一个搜索过程中具有最大访问次数的路径）。下棋走子展示成了一个数字序列。AlphaGo选择的落子位置标成了红色圆圈。樊麾应对的移动标成了白色的正方形；在他的复盘过程中，他提到了被AlphaGo预测到的移动（标注为1）

【原文】Finally, we evaluated the distributed version of AlphaGo against Fan Hui, a professional 2 dan, and the winner of the 2013, 2014 and 2015 European Go championships. Over 5–9 October 2015 AlphaGo and Fan Hui competed in a formal five-game match. AlphaGo won the match 5 games to 0 (Fig. 6 and Extended Data Table 1). This is the first time that a computer Go program has defeated a human professional player, without handicap, in the full game of Go—a feat that was previously believed to be at least a decade away.

【翻译】最后，我们评估了分布式AlphaGo版本对战樊麾的表现，樊麾是排名为职业2段的选手，是2013, 2014和2015年欧洲围棋锦标赛的赢家。在2015年10月5日到9日，AlphaGo和樊麾参加了五场正式的围棋比赛。AlphaGo以5：0赢得比赛（图6和扩展数据表1）。这是在一次完整的和在人类不让子的情况下，计算机围棋程序第一次打败了职业棋手，这在以前认为是至少10年之后才会发生的事件。

【原文】Figure 6 | Games from the match between AlphaGo and the European champion, Fan Hui.

【翻译】图六欧洲冠军樊麾对战AlphaGo的比赛

【原文】Moves are shown in a numbered sequence corresponding to the order in which they were played. Repeated moves on the same intersection are shown in pairs below the board. The first move number in each pair indicates when the repeat move was played, at an intersection identified by the second move number (see Supplementary Information).

【翻译】下棋走的每一步按照下棋顺序由数字序列显示出来。重复落子的地方在棋盘的下面成双成对显示出来。每一对数字中第一个数字的落子，重复下到了第二个数字显示的交叉地方。（详见补充信息）。

【原文】Discussion

In this work we have developed a Go program, based on a combination of deep neural networks and tree search, that plays at the level of the strongest human players, thereby achieving one of artificial intelligence’s “grand challenges”. We have developed, for the first time, effective move selection and position evaluation functions for Go, based on deep neural networks that are trained by a novel combination of supervised and reinforcement learning. We have introduced a new search algorithm that successfully combines neural network evaluations with Monte Carlo rollouts. Our program AlphaGo integrates these components together, at scale, in a high-performance tree search engine.

【翻译】讨论

在这项工作中，我们开发了一个基于深度神经网络和树搜索的围棋程序，它能和实力最强的人类棋手相抗衡，从而突破了人工智能面临的一大挑战。我们首次开发了基于经过监督和强化学习训练的深度神经网络的落子选择器和棋局评估函数。我们已经引进了将蒙特卡洛rollout和神经网络评估成功结合的新的搜索算法。AlphaGo以高性能的树搜索为引擎，在规模上将上述思想和算法进行集成。

【原文】During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than Deep Blue did in its chess match against Kasparov; compensating by selecting those positions more intelligently, using the policy network, and evaluating them more precisely, using the value network—an approach that is perhaps closer to how humans play. Furthermore, while Deep Blue relied on a handcrafted evaluation function, the neural networks of AlphaGo are trained directly from gameplay purely through general-purpose supervised and reinforcement learning methods.

【翻译】在对战樊麾的比赛中，AlphaGo对棋局的评估次数比深蓝对战卡斯帕罗夫的象棋比赛中的少几千次；但它运用策略网络选择棋局，因此更智能；同时使用价值网络，使走棋更接近人类棋手。此外，相对于深蓝使用依靠人类手工调参的评估函数，AlphaGo 的神经网络是直接从比赛对弈数据中训练出来的，只是利用了一个通用目的的监督学习和强化学习方法。

【原文】Go is exemplary in many ways of the difficulties faced by artificial intelligence: a challenging decision-making task, an intractable search space, and an optimal solution so complex it appears infeasible to directly approximate using a policy or value function. The previous major breakthrough in computer Go, the introduction of MCTS, led to corresponding advances in many other domains; for example, general game-playing, classical planning, partially observed planning, scheduling, and constraint satisfaction. By combining tree search with policy and value networks, AlphaGo has finally reached a professional level in Go, providing hope that human-level performance can now be achieved in other seemingly intractable artificial intelligence domains.

【翻译】人工智能在围棋游戏中面临的困难在很多方面都具有示范性：一个具有挑战性的决策问题、一个棘手的搜索空间、一个复杂到似乎无法直接利用策略网络和价值函数进行近似的最优解。先前在计算机围棋上取得的重大突破是MCTS的引入，MCTS的引入也促使了很多其他领域的进展，例如，通用的博弈，经典规划，局部观察规划、调度问题和约束满足问题。通过将树搜索与策略网络、价值网络的结合，AlphaGo最终在围棋领域达到了职业段位水平，为在其他看似棘手的人工智能领域中达到人类水平提供了希望。

【原文】METHODS

Problem setting. Many games of perfect information, such as chess, checkers, othello, backgammon and Go, may be defined as alternating Markov games. In these games, there is a state space S (where state includes an indication of the current player to play); an action space defining the legal actions in any given state ; a state transition function defining the successor state after selecting action in state and random input (for example, dice); and finally a reward function describing the reward received by player in state . We restrict our attention to two-player zero-sum games, , with deterministic state transitions, , and zero rewards except at a terminal time stepT . The outcome of the game is the terminal reward at the end of the game from the perspective of the current player at time step . A policy is a probability distribution over legal actions .A value function is the expected outcome if all actions for both players are selected according to policy , that is, . Zero-sum games have a unique optimal value function that determines the outcome from state following perfect play by both players,

【翻译】方法

问题设置。许多具有完美信息的游戏，如象棋、西洋跳棋、黑白棋、五子棋和围棋，都可以被定义为交替马尔可夫游戏。在这些游戏中，有一个状态空间 S （这和状态空间包括玩家当前的指示）；动作空间

，定义了的在任何所给状态

时的合法动作集；一个状态转移函数

，定义了在状态

中选择动作

和随机输入

后的后续状态（例如骰子）；最后一个报酬函数

描述玩家和状态

下的得到的奖励。我们将注意力集中在两个玩家的零和博弈，带有确定性状态转换：

，以及除了在最终时间步 T 上的零回报。游戏的结果

是在比赛结束后、在时间步 t 时从当前玩家的角度来看的最终奖励。策略

是合法动作

的概率分布。当双方棋手的所有动作都根据策略

被选择出来，价值函数就是预期的结果，即：

。零和游戏有一个独特的最优值函数

，决定了双方玩家在状态 s时完美发挥的结果。

【原文】Prior work. The optimal value function can be computed recursively by minimax (or equivalently negamax) search. Most games are too large for exhaustive minimax tree search; instead, the game is truncated by using an approximate value function

in place of terminal rewards. Depth-first minimax search with alpha–beta pruning has achieved superhuman performance in chess , checkers and othello , but it has not been effective in Go .

【翻译】先前的工作。最优值函数可以由极大极小（或等价的负极大值）搜索进行递归计算。大多数游戏的极大极小搜索树太大；因此，通过使用一个近似值函数

代替终端奖励来对树进行缩短。利用α-β修剪的深度极小极大搜索已经在国际象棋、跳棋和奥赛罗上取得了超人的表现，但它在围棋上的效率并不高。

【原文】Reinforcement learning can learn to approximate the optimal value function directly from games of self-play . The majority of prior work has focused on a linear combination of features with weights . Weights were trained using temporal-difference learning in chess, checkers and Go; or using linear regression in othello and Scrabble . Temporal-difference learning has also been used to train a neural network to approximate the optimal value function, achieving superhuman performance in backgammon; and achieving weak kyu-level performance in small-board Go using convolutional networks.
【翻译】强化学习可以直接从自我博弈中学习到近似的最优值函数。先前的大部分工作都集中在具有权重的特征的线性组合上。权重利用时空差异学习在国际象棋、跳棋和围棋上进行训练；或在奥赛罗和拼字游戏上利用线性回归进行训练。时间差学习也被用来训练神经网络，使其逼近最优值函数，在五子棋上达到了超人的表现；实现采用卷积网络在小棋盘上实现弱级水平的性能。
【原文】n alternative approach to minimax search is Monte Carlo tree search (MCTS), which estimates the optimal value of interior nodes by a double approximation . The first approximation, Monte Carlo simulations to estimate the value function of a simulation policy . The second approximation, , uses a simulation policy in place of minimax optimal actions. The simulation policy selects actions according to a search control function , such as UCT, that selects children with higher action values, , plus a bonus that encourages exploration; or in the absence of a search tree at state , it samples actions from a fast rollout policy . As more simulations are executed and the search tree grows deeper, the simulation policy becomes informed by increasingly accurate statistics. In the limit, both approximations become exact and MCTS (for example, with UCT) converges to the optimal value function .The strongest current Go programs are based on MCTS 13–15,36 .
【翻译】极大极小搜索的另一种可选方法是蒙特卡洛树搜索（MCTS），通过双重近似来估计内部节点的最优值 ,。第个一近似：，通过 n 步 Monte Carlo模拟，估计了模拟策略下的值函数。第二个近似：，采用代替极大极小最优动作的模拟策略。模拟策略根据搜索控制函数来选择动作，如UCT，选择具有较大动作值的“儿子”，，加上一个额外的；或在缺少s 状态下的搜索树时，通过快速走子策略中对动作进行采样。随着模拟次数的增加，搜索树变得越来越深，评估策略也变得越来越精确。在极限情况下，两个近似都变得准确，MCTS（例如UCT）也收敛到最优值函数。现在最强的围棋程序就是基于MCTS的。
【原文】MCTS has previously been combined with a policy that is used to narrow the beam of the search tree to high-probability moves; or to bias the bonus term towards high-probability moves. MCTS has also been combined with a value function that is used to initialize action values in newly expanded nodes, or to mix Monte Carlo evaluation with minimax evaluation. By contrast, AlphaGo’s use of value functions is based on truncated Monte Carlo search algorithms, which terminate rollouts before the end of the game and use a value function in place of the terminal reward. AlphaGo’s position evaluation mixes full rollouts with truncated rollouts, resembling in some respects the well-known temporal-difference learning algorithm . AlphaGo also differs from prior work by using slower but more powerful representations of the policy and value function; evaluating deep neural networks is several orders of magnitude slower than linear representations and must therefore occur asynchronously.
【翻译】MCTS先前已经与用于与束窄搜索树使其成为高概率移动或偏向高概率移动的额外奖励的策略进行结合。MCTS也已经与在新扩展节点上用于初始化动作值，或用于蒙特卡洛评价与极大极小评价进行混合的价值函数相结合。相比之下，AlphaGo对价值函数的使用是基于截断的蒙特卡洛搜索算法，这里的蒙特卡洛搜索算法在比赛结束前终止rollout，并且使用价值函数来代替最终奖励。AlphaGo的棋局评估混合了完整的和截断的rollout，在某些方面类似于著名的时间差算法。在使用更慢但是更强大的策略和价值函数上，AlphaGo也不同于先前的工作；评价深度神经网络比评价线性表示慢了几个数量级，因此必须异步。
【原文】The performance of MCTS is to a large degree determined by the quality of the rollout policy. Prior work has focused on handcrafted patterns or learning rollout policies by supervised learning, reinforcement learning, simulation balancing or online adaptation; however, it is known that rollout-based position evaluation is frequently inaccurate. AlphaGo uses relatively simple rollouts, and instead addresses the challenging problem of position evaluation more directly using value networks.

【翻译】MCTS的性能很大程度上是由rollout策略的质量决定的。以前的工作都集中在手工模式或利用监督学习、强化学习、模拟平衡或在线自适应学习rollout策略；然而大家都知道，基于rollout的位置评价往往是不准确的。AlphaGo采用相对简单的rollout，同时更直接地利用价值网络应对棋局评估的挑战。

【原文】Search algorithm. To efficiently integrate large neural networks into AlphaGo, we implemented an asynchronous policy and value MCTS algorithm (APV-MCTS). Each node in the search tree contains edges for all legal actions .Each edge stores a set of statistics,

【翻译】搜索算法。为了将大型神经网络与AlphaGo进行有效整合，我们实现了一个异步策略和价值MCTS算法（APV-MCTS）。对于所有的合法动作，搜索树中的每个节点都包含边。每条边都存储一组统计：

【原文】where is the prior probability, and are Monte Carlo estimates of total action value, accumulated over and leaf evaluations and rollout rewards, respectively, and is the combined mean action value for that edge. Multiple simulations are executed in parallel on separate search threads. The APV-MCTS algorithm proceeds in the four stages outlined in Fig. 3.
其中是先验概率，和是蒙特卡洛对总的动作价值的估计，分别在和个叶子节点的价值评估和rollout奖励进行累计，是该边的联合平均动作价值。多个模拟在独立的线程上并行执行，APV-MCTS算法在四个阶段的收益在图3中进行概述。
【原文】Selection (Fig. 3a). The first in-tree phase of each simulation begins at the root of the search tree and finishes when the simulation reaches a leaf node at time step L. At each of these time steps, t < L, an action is selected according to the statistics in the search tree, ,using a variant of the PUCT algorithm, , where is a constant determining the level of exploration; this search control strategy initially prefers actions with high prior probability and low visit count, but asymptotically prefers actions with high action value.
【翻译】选择(见图三). 第一阶段，各个仿真从搜索树的根开始，当模拟在时间步L到达叶子节点时结束。在每一个时间步t＜L时，一个动作的选择是根据搜索树中的统计数据，。利用PUCT算法的一个变种，，其中是一个确定搜索水平的常数；这个搜索控制策略最初偏爱高先验概率和低访问数的落子动作，但逐渐地更偏爱具有高价值的动作。
【原文】Evaluation (Fig. 3c). The leaf position added to a queue for evaluation by the value network, unless it has previously been evaluated. The second rollout phase of each simulation begins at leaf node and continues until the end of the game. At each of these time-steps, , actions are selected by both players according to the rollout policy, . When the game reaches a terminal state, the outcome is computed from the final score.
【翻译】模拟（图3c）。除非叶子之前已经被评估，否则它所代表的棋局通过价值网络加入到评价队列中。每个模拟的第二个rollout阶段从叶节点开始，一直持续到比赛结束。在每一个时间步时，落子动作是双方棋手根据rollout策略来选择的。当比赛结束时，结果从最终点目中计算出来。
【原文】Backup (Fig. 3d). At each in-tree step of the simulation, the rollout statistics are updated as if it has lost games, ; ; this virtual loss discourages other threads from simultaneously exploring the identical variation. At the end of the simulation, the rollout statistics are updated in a backward pass through each step , replacing the virtual losses by the outcome, ； . Asynchronously, a separate backward pass is initiated when the evaluation of the leaf position completes. The output of the value network is used to update value statistics in a second backward pass through each step , , . The overall evaluation of each state action is a weighted average of the Monte Carlo estimates, , that mixes together the value network and rollout evaluations with weighting parameterλ . All updates are performed lock-free.
【翻译】回传（图3d）。在模拟树的每一步中，rollout数据被更新，就好比输了场游戏： ; 。这个虚拟的损失阻止其他线程同时探索相同的路径。在模拟结束时，在每一步，rollout统计数据反向回溯更新，用结果替换虚拟损失；。当叶子位置的评估完成后，单独的逆向回溯才会异步地开始。价值网络的输出被用来更新在每一步中第二次反向回溯的统计值，。每个状态动作的总体评价是蒙特卡洛估计的加权平均，，它将价值网络和具有加权参数λ 的rollout评估混合在一起。所有更新都是在无锁的状态下执行的。
【原文】Expansion (Fig. 3b). When the visit count exceeds a threshold, , the successor state is added to the search tree. The new node is initialized to ，， using a tree policy (similar to the rollout policy but with more features, see Extended Data Table 4) to provide place-holder prior probabilities for action selection. The position is also inserted into a queue for asynchronous GPU evaluation by the policy network. Prior probabilities are computed by the SL policy network with a softmax temperature set to β; these replace the placeholder prior probabilities, , using an atomic update. The threshold is adjusted dynamically to ensure that the rate at which positions are added to the policy queue matches the rate at which the GPUs evaluate the policy network. Positions are evaluated by both the policy network and the value network using a mini-batch size of 1 to minimize end-to-end evaluation time.
【翻译】扩展（图3b）。当访问计数超过阈值时后，后继状态才会被被添加到搜索树中。新节点被初始化为，，，使用Tree policy （类似于rollout策略，但特征更多，参见扩展数据表4）来为动作选择提供place-holder先验概率。通过策略网络，将棋局插入到异步GPU的评估队列中，由SL策略网络利用softmax层温度集β 来计算先验概率，计算完毕后，这些更新将取代place-holder先验概率: 。动态地调整阈值以确保棋局添加到策略队列的速率与GPU评价策略网络的速度相匹配。棋局由策略网络和使用批大小为1的价值网络相结合进行评估，来使端到端评估时间最小化。
【原文】We also implemented a distributed APV-MCTS algorithm. This architecture consists of a single master machine that executes the main search, many remote worker CPUs that execute asynchronous rollouts, and many remote worker GPUs that execute asynchronous policy and value network evaluations. The entire search tree is stored on the master, which only executes the in-tree phase of each simulation. The leaf positions are communicated to the worker CPUs, which execute the rollout phase of simulation, and to the worker GPUs, which compute network features and evaluate the policy and value networks. The prior probabilities of the policy network are returned to the master, where they replace placeholder prior probabilities at the newly expanded node. The rewards from rollouts and the value network outputs are each returned to the master, and backed up the originating search path.
【翻译】我们还实现了一个分布式APV-MCTS算法。该体系由一个执行主要搜索的主机和许多执行异步rollout的远程CPU，以及许多执行异步策略和价值网络评估的远程GPU组成。整个搜索树存储在主目录中，它只执行每个模拟的树内搜索阶段。叶子的棋局状态传送到CPU，让其执行仿真的rollout阶段；同时也传送到GPU，让其生成网络的特征并通过策略和价值网络进行评估。策略网络将得到的先验概率返回给主机，并在新扩展节点上替换place-holder先验概率。Rollout的奖励和价值网络的输出返回给主机，并回退到起始搜索路径。
【原文】At the end of search AlphaGo selects the action with maximum visit count; this is less sensitive to outliers than maximizing action value. The search tree is reused at subsequent time steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded. The match version of AlphaGo continues searching during the opponent’s move. It extends the search if the action maximizing visit count and the action maximizing action value disagree. Time controls were otherwise shaped to use most time in the middle-game. AlphaGo resigns when its overall evaluation drops below an estimated 10% probability of winning the game, that is .
【翻译】在搜索最后，AlphaGo选择具有最大访问计数的动作；没有选择落子动作价值最大的动作，是因为前者对于离群值更不敏感。在随后的时间步骤中，搜索树被重复使用：与所做落子动作相一致的子节点成为新的根节点；该子节点下面的子树与它所有的统计数据一起保留，而树的其余部分将被丢弃。在对手进行落子时，AlphaGo的match版本继续进行搜索。如果落子动作使最大化的访问计数与最大化的动作价值不一致，它将对搜索进行扩展。另外，时间控制在中局使用最多的时间。当AlphaGo总体评价为胜率估计低于10%时，也就是：时，AlphaGo就会放弃比赛。
【原文】AlphaGo does not employ the all-moves-as-first or rapid action value estimation heuristics used in the majority of Monte Carlo Go programs; when using policy networks as prior knowledge, these biased heuristics do not appear to give any additional benefit. In addition, AlphaGo does not use progressive widening, dynamic komi or an opening book. The parameters used by AlphaGo in the Fan Hui match are listed in Extended Data Table 5.

【翻译】AlphaGo不采用“每次都选择最好的动作”或用于多数蒙特卡洛程序的快速动作价值估计启发式算法；当使用策略网络作为先验知识时，这些偏见的启发式算法似乎没有提供任何额外的优点。此外，AlphaGo不使用逐步扩大、动态贴目或一本打开的书？？。AlphaGo在对战樊麾时使用的参数列在扩展数据表5中。

【原文】Rollout policy. The rollout policy is a linear softmax policy based on fast, incrementally computed, local pattern-based features consisting of both ‘response’ patterns around the previous move that led to state , and ‘non-response’ patterns around the candidate move α in state

. Each non-response pattern is a binary feature matching a specific 3 × 3 pattern centred on α , defined by the colour (black, white, empty) and liberty count (1, 2, ≥3) for each adjacent intersection. Each response pattern is a binary feature matching the colour and liberty count in a 12-point diamond-shaped pattern centred around the previous move. Additionally, a small number of handcrafted local features encode common-sense Go rules (see Extended Data Table 4). Similar to the policy network, the weights π of the rollout policy are trained from 8 million positions from human games on the Tygem server to maximize log likelihood by stochastic gradient descent. Rollouts execute at approximately 1,000 simulations per second per CPU thread on an empty board.

【翻译】rollout策略。rollout策略

是一个基于快速的、增量计算的、由形成状态s的之前的移动组成的“response”模式以及由状态

下的移动 α 组成的“non-response”模式共同组成的局部模式特征的线性softmax策略。每个非应答模式是与以 α为中心的3×3图形匹配的二进制特征，由各个相邻交叉点的颜色（黑，白，空）和气（1, 2，≥3）定义。每一个应答模式都是与围绕着上一个落子的12点菱形模式中的棋子颜色和气相匹配的二进制特征。此外，使用少量手工局部特征对常见的围棋规则进行编码（参见扩展数据表4）。和策略网络类似，rollout策略的权重 π 是从人们在Tygem服务器上利用随机梯度下降的最大对数似然得到的800万个棋局中训练得出的。rollout每秒每个CPU线程在一个空棋盘上大约模拟1000次。

【原文】Our rollout policy

contains less handcrafted knowledge than state-of-the-art Go programs. Instead, we exploit the higher-quality action selection within MCTS, which is informed both by the search tree and the policy network. We introduce a new technique that caches all moves from the search tree and then plays similar moves during rollouts; a generalization of the ‘last good reply’ heuristic. At every step of the tree traversal, the most probable action is inserted into a hash table, along with the 3 × 3 pattern context (colour, liberty and stone counts) around both the previous move and the current move. At each step of the rollout, the pattern context is matched against the hash table; if a match is found then the stored move is played with high probability.
【翻译】rollout策略

和最先进的围棋程序相比，含有较少的手工知识。代替它的是，我们利用MCTS开发出了一个高质量的动作选择器，可以通过搜索树和策略网络同时进行选择。我们介绍一种新技术，将搜索树上所有的动作存储起来，然后在rollout过程中进行相似的落子。对“最后一个好的回复”启发式算法的归纳。在遍历树的每一步，将先前和当前最有可能的落子，连同以往和当前落子动作的3×3模式内容（颜色、气和棋子数）插入到哈希表中。在rollout的每一步，模式内容与哈希表相比较：如果可以找到匹配，则存储的落子动作将以高概率被执行。
【原文】Symmetries. In previous work, the symmetries of Go have been exploited by using rotationally and reflectionally invariant filters in the convolutional layers. Although this may be effective in small neural networks, it actually hurts performance in larger networks, as it prevents the intermediate filters from identifying specific asymmetric patterns. Instead, we exploit symmetries at run-time by dynamically transforming each position

using the dihedral group of eight reflections and rotations

.In an explicit symmetry ensemble, a mini-batch of all 8 positions is passed into the policy network or value network and computed in parallel. For the value network, the output values are simply averaged,

.For the policy network, the planes of output probabilities are rotated/reflected back into the original orientation, and averaged together to provide an ensemble prediction,

; this approach was used in our raw network evaluation (see Extended Data Table 3). Instead, APV-MCTS makes use of an implicit symmetry ensemble that randomly selects a single rotation/reflection

for each evaluation. We compute exactly one evaluation for that orientation only; in each simulation we compute the value of leaf node

, and allow the search procedure to average over these evaluations. Similarly, we compute the policy network for a single, randomly selected rotation/reflection,

【翻译】对称性。在以往的工作中，围棋的对称性已通过使用旋转和对称不变过滤器的卷积层而被开发。虽然这在小型神经网络中可能有效，但它在更大的网络中表现不佳，因为它阻止了中间过滤器识别特定的非对称模式。代替它的是，我们利用八个反射和旋转

中动态地变换每个棋局

，来利用在运行时的对称性。在显式对称集合中，一个包含所有8个棋局的小批量被传递到策略网络或价值网络中进行并行计算。在价值网络中，输出值是简单的平均，

。在策略网络，输出概率的平面被旋转或反射回到原来的方向，平均起来提供一个集合预测：

；原来的网络评价就是采用的这种方法（见扩展数据表3）。相反，APV-MCTS利用隐含的对称性集合，为每次评价随机选择一个旋转或反射

。我们只为一个方向计算一个确切的评价；在每次仿真中我们用

计算叶节点

的价值，同时允许搜索过程对这些评价做平均。同样，我们利用策略网络为每个旋转或反射

做计算，随机地选择旋转或反射

。
【原文】Policy network: classification. We trained the policy network

to classify postions according to expert moves played in the KGS data set. This data set contains 29.4 million positions from 160,000 games played by KGS 6 to 9 dan human players; 35.4% of the games are handicap games. The data set was split into a test set(the first million positions) and a training set (the remaining 28.4 million positions). Pass moves were excluded from the data set. Each position consisted of a raw board description

and the move α selected by the human. We augmented the data set to include all eight reflections and rotations of each position. Symmetry augmentation and input features were pre-computed for each position. For each training step, we sampled a randomly selected mini-batch of samples from the augmented KGS data set

and applied an asynchronous stochastic gradient descent update to maximize the log likelihood of the action,

. The step size α was initialized to 0.003 and was halved every 80 million training steps, without momentum terms, and a mini-batch size of m = 16. Updates were applied asynchronously on 50 GPUs using DistBelief; gradients older than 100 steps were discarded. Training took around 3 weeks for 340 million training steps.

【翻译】策略网络：分类。我们根据人类棋手在KGS服务器数据集中产生的落子来训练策略网络对棋局进行分类。该数据集包含排名6段和9段的棋手进行的160000场比赛的2940万个棋局；35.4%的游戏都是让子游戏。数据集被分割成一个测试集（前一百万个位置）和一个训练集（剩下的2840万个位置）。被放弃的落子动作被排除在数据集之外。每个棋局由原始棋盘描述

和由人类选择的落子动作 α组成。我们扩增了数据集，使其包括每一个棋局的所有八个反射和旋转。每个棋局预先计算对称增强和输入特征。对于每一个训练步骤，我们从扩增的KGS数据集

中随机选择小批量的 m个样本集，并应用异步随机梯度下降法对落子动作的最大对数似然进行更新：

。步长α 被初始化为0.003，并且每达到8000万个训练步骤时就减半，没有动量项，并且小批量 m=16。更新使用DistBelief异步地应用于5个GPU；梯度在100以上的将被丢弃。训练花了大约3周时间，训练了3亿4000万个步骤。

【原文】DistBelief: We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models.
【翻译】DistBelief：我们开发了一个叫做DistBelief的软件框架，它可以利用计算带有几千个机器的簇来训练大型模型。
【原文】Policy network: reinforcement learning. We further trained the policy network by policy gradient reinforcement learning. Each iteration consisted of a mini-batch of n games played in parallel, between the current policy network

that is being trained, and an opponent

that uses parameters from a previous iteration, randomly sampled from a pool of opponents, so as to increase the stability of training. Weights were initialized to

. Every 500 iterations, we added the current parameters

to the opponent pool. Each game i in the mini-batch was played out until termination at step

, and then scored to determine the outcome

from each player’s perspective. The games were then replayed to determine the policy gradient update,

using the REINFORCE algorithm with baseline

for variance reduction. On the first pass through the training pipeline, the baseline was set to zero; on the second pass we used the value network

as a baseline; this provided a small performance boost. The policy network was trained in this way for 10,000 mini-batches of 128 games, using 50 GPUs, for one day.
【翻译】策略网络：强化学习。我们通过策略梯度强化学习进一步训练了策略网络。每次迭代都包括并行进行的 n个小批量的比赛，包括目前正在训练的策略网络

和使用从上次迭代中获得参数

的网络

之间进行。通过从随机池中抽取对手，增加了训练的稳定性。权重初始化为。每进行500次迭代，我们都会将参数

加入对手池。在小批量中的每一场比赛 i 都将持续直到在步骤

时终止，然后从每个玩家的视角得分判定结果

。游戏将重复进行来确定策略梯度更新，

。使用增强算法与基线

来减少方差。通过训练，在第一阶段，基线被设置为零；第二阶段，通过运用价值网络

作为基线，来提供了一个小的性能提升。策略网络也用这个方法进行训练，使用128场比赛的10000个小批量，使用50个GPU，花费一天来进行训练。

【原文】Value network: regression. We trained a value network to approximate the value function of the RL policy network

. To avoid overfitting to the strongly correlated positions within games, we constructed a new data set of uncorrelated self-play positions. This data set consisted of over 30 million positions, each drawn from a unique game of self-play. Each game was generated in three phases by randomly sampling a time step

and sampling the first

moves from the SL policy network

; then sampling one move uniformly at random from available moves,

(repeatedly until

is legal); then sampling the remaining sequence of moves until the game terminates,

, from the RL policy network, Finally, the game is scored to determine the outcome

. Only a single training example

is added to the data set from each game. This data provides unbiased samples of the value function

. During the first two phases of generation we sample from noisier distributions so as to increase the diversity of the data set. The training method was identical to SL policy network training, except that the parameter update was based on mean squared error between the predicted values and the observed rewards,

.The value network was trained for 50 million mini-batches of 32 positions, using 50 GPUs, for one week.
【翻译】价值网络：回归。我们训练了一个价值网络，通过RL策略网络

来近似获得价值函数。为避免与游戏中关联很强强的棋局过度拟合，我们构建了一个新的数据集，在这个数据集中，只含有不相关的自我博弈产生的棋局。这个数据集由超过3000万个棋局组成，每一个棋局都来自于一个独一无二的自我博弈。每个游戏分为三个阶段：采用随机的方法对时间步长

进行采样，并通过SL策略网络对最初的

落子进行采样；然后从可用的移动中均匀随机选取一个落子动作（反复执行直到

是合法的）；然后利用RL策略网络对剩余的移动序列进行采样直到比赛结束，

。最后，比赛由得分决定结果

。每个游戏的数据集只增加一个训练样例

。这些数据提供了价值函数

的无偏样本。在前两阶段中，我们从嘈杂的分布中取样以提高数据集的多样性。训练的方法与SL策略网络训练相同，但参数更新是基于预测值与观测值回报之间均方误差

。价值网络是在具有32个棋局小批量的5000万个棋局样本上进行训练的，使用50个GPU，花费了一周的时间。
【原文】Features for policy/value network. Each position

was pre-processed into a set of 19 × 19 feature planes. The features that we use come directly from the raw representation of the game rules, indicating the status of each intersection of the Go board: stone colour, liberties (adjacent empty points of stone’s chain), captures, legality, turns since stone was played, and (for the value network only) the current colour to play. In addition, we use one simple tactical feature that computes the outcome of a ladder search. All features were computed relative to the current colour to play; for example, the stone colour at each intersection was represented as either player or opponent rather than black or white. Each integer feature value is split into multiple 19 × 19 planes of binary values (one-hot encoding). For example, separate binary feature planes are used to represent whether an intersection has 1 liberty, 2 liberties, …, ≥8 liberties. The full set of feature planes are listed in Extended Data Table 2.
【翻译】策略/价值网络的特征。每个棋局

都预加工成一组19×19的特征平面。我们使用的特征直接来自游戏规则的原始表示，指示棋盘每一个交点的状态：棋子颜色、气、能吃对方棋子的个数、合法性，轮次，和（只运用价值网络才使用到的）当前的棋子颜色。此外，我们使用一个简单的战术特征来计算阶梯搜索的结果。所有的特征都是相对于当前的棋子颜色进行计算的。例如，每一个交叉点棋子的颜色代表对方玩家（或对手），而不是黑色或白色。每个整数特征值分为19×19的二进制值（one-hot编码）平面。例如，单独的二进制特征的平面用来表示一个交叉点有1个气、2个气…还是≥8个气。在扩展数据表2中列出了全部特征平面。
【原文】Neural network architecture. The input to the policy network is a 19 × 19 × 48 image stack consisting of 48 feature planes. The first hidden layer zero pads the input into a 23 × 23 image, then convolves k filters of kernel size 5 × 5 with stride 1 with the input image and applies a rectifier nonlinearity. Each of the subsequent hidden layers 2 to 12 zero pads the respective previous hidden layer into a 21 × 21 image, then convolves k filters of kernel size 3 × 3 with stride 1, again followed by a rectifier nonlinearity. The final layer convolves 1 filter of kernel size 1 × 1 with stride 1, with a different bias for each position, and applies a softmax function. The match version of AlphaGo used k = 192 filters; Fig. 2b and Extended Data Table 3 additionally show the results of training with k = 128, 256 and 384 filters.
【翻译】神经网络体系结构。策略网络的输入是一个由48个特征图组成的19×19×48图像块。输入图像在首个隐藏层通过零填充形成23×23大小的图像，将输入图像与K个卷积核大小为5*5、步长为1的滤波器做卷积操作，然后加入一个非线性层。剩下的2到12隐藏层，每个隐藏层使用零填充形成21×21大小的图像，然后与K个卷积核大小为3×3、步长为1的滤波器进行卷积操作，再加一个非线性层。最后一层与一个卷积核的大小1×1、步长为1的滤波器进行卷积操作，每个位置使用一个不同的偏差，并应用一个softmax函数。AlphaGo的竞赛版本使用k= 192个滤波器。图2b和扩展数据表3还显示用k = 128、256和384滤波器进行训练的结果。
【原文】The input to the value network is also a 19 × 19 × 48 image stack, with an additional binary feature plane describing the current colour to play. Hidden layers 2 to 11 are identical to the policy network, hidden layer 12 is an additional convolution layer, hidden layer 13 convolves 1 filter of kernel size 1 × 1 with stride 1, and hidden layer 14 is a fully connected linear layer with 256 rectifier units. The output layer is a fully connected linear layer with a single tanh unit.
【翻译】对价值网络的输入也是一个19×19×48的图像块，加入了描述当前颜色的二进制特征层。2至11隐藏层与策略网络的隐层相同，第12层是一个额外的卷积层，第13层与卷积核大小为1×1、步长为1的滤波器进行卷积操作，第14个隐藏层是包含256个线性单元的全连接线性层。输出层是包含一个tanh单元的全连接线性层。
【原文】Evaluation. We evaluated the relative strength of computer Go programs by running an internal tournament and measuring the Elo rating of each program. We estimate the probability that program a will beat program b by a logistic function

and estimate the ratings e(·) by Bayesian logistic regression, computed by the BayesElo program using the standard constant

= 1/400. The scale was anchored to the Bayes Elo rating of professional Go player Fan Hui (2,908 at date of submission). All programs received a maximum of 5 s computation time per move; games were scored using Chinese rules with a komi of 7.5 points (extra points to compensate white for playing second).We also played handicap games where AlphaGo played white against existing Go programs; for these games we used a non-standard handicap system in which komi was retained but black was given additional stones on the usual handicap points. Using these rules, a handicap of K stones is equivalent to giving K − 1 free moves to black, rather than K − 1/2 free moves using standard no-komi handicap rules. We used these handicap rules because AlphaGo’s value network was trained specifically to use a komi of 7.5.
【翻译】评价。我们通过进行内部比赛和测量每个程序的ELO等级对计算机围棋程序运行的相对强度进行了评估。我们通过一个逻辑函数

估计程序a会打败程序b，通过贝叶斯逻辑回归分析估计等级e(·)，贝叶斯逻辑回归分析使用标准的常数

= 1/400，这个常数是利用bayes-elo程序进行计算的。规模固定在专业棋手樊麾的bayes-elo评级上（在提交日期时是2908）。所有的程序计算每次移动的时间不能超过5s；游戏使用贴目数值为7目半的中国规则进行加减分（第二次执白棋会获得加分补偿）。我们还让AlphaGo执白棋与现有的围棋程序进行让子赛。在这些游戏中，我们使用一个非标准的让子系统，在通常的让子点处保留贴目制但黑手在通常的让子点有额外的棋子。使用这些规则，使用K个棋子的让子就相当于给黑手K−1次自由的移动，而不是使用标准的非贴目制让子时的K−1/ 2。我们使用这些指导棋规则是因为AlphaGo的价值网络是使用贴目为7目半的规则进行训练的。
【原文】With the exception of distributed AlphaGo, each computer Go program was executed on its own single machine, with identical specifications, using the latest available version and the best hardware configuration supported by that program (see Extended Data Table 6). In Fig. 4, approximate ranks of computer programs are based on the highest KGS rank achieved by that program; however, the KGS version may differ from the publicly available version.
【翻译】除了分布式AlphaGo这个例外，每个计算机围棋程序都在它自己的单机上执行。单机具有相同规格，使用最新的版本和支持该程序（见扩展数据表6）的最好的硬件配置。在图4中，计算机程序的大致排名是根据它们在KGS上的最高等级进行排名的；但是，KGS版本可能与公开可用的版本不同。

【论文翻译】Mastering the game of Go with deep neural networks and tree search( 用深度神经网络和树搜索实现围棋游戏)_第12张图片

【原文】The match against Fan Hui was arbitrated by an impartial referee. Five formal games and five informal games were played with 7.5 komi, no handicap, and Chinese rules. AlphaGo won these games 5–0 and 3–2 respectively (Fig. 6 and Extended Data Table 1). Time controls for formal games were 1 h main time plus three periods of 30 s byoyomi. Time controls for informal games were three periods of 30 s byoyomi. Time controls and playing conditions were chosen by Fan Hui in advance of the match; it was also agreed that the overall match outcome would be determined solely by the formal games. To approximately assess the relative rating of Fan Hui to computer Go programs, we appended the results of all ten games to our internal tournament results, ignoring differences in time controls.

【翻译】对战樊麾的比赛是由一个公正的裁判做出裁决的。五场正式比赛和五场非正式比赛使用没有让子的贴目为7目半的中国规则。AlphaGo以5：0和3：2获胜（图6与扩展数据表1）。正式比赛时间控制在1小时加上三个阶段30秒的读秒时间。非正式比赛时间控制为三个阶段30秒的读秒时间。比赛前，樊麾选择了时间控制和比赛条件，一般比赛结果完全由正式比赛决定。为了大致评估樊麾相对于电脑围棋程序的等级，我们将所有十场比赛的结果附加到我们的内部比赛结果中，忽略了时间控制的差异。

你可能感兴趣的:(机器学习)

TensorFlow GPU 2.10.1 for Python 3.9快速安装指南疑样
本文还有配套的精品资源，点击获取简介：TensorFlowGPU2.10.1是专为Windowsx64和Python3.9设计的TensorFlow版本，它集成了GPU支持以加快深度学习模型的训练。本指南提供了该版本的概述、安装步骤及注意事项，旨在帮助开发者利用其性能优势提升机器学习项目的效率。1.TensorFlowGPU介绍1.1TensorFlow的起源与功能TensorFlow是由Goog
进阶向:基于Python的智能客服系统设计与实现
智能客服系统开发指南系统概述智能客服系统是人工智能领域的重要应用，它通过自然语言处理(NLP)和机器学习技术自动化处理用户查询，显著提升客户服务效率和响应速度。基于Python的实现方案因其丰富的生态系统（如NLTK、spaCy、Transformers等库）、跨平台兼容性以及易于集成的特点，成为开发智能客服系统的首选。系统架构系统核心包括两个主要功能模块：1.API集成模块负责连接各类外部服务，
机器学习专栏（62）：手把手实现工业级ResNet-34及调优全攻略
目录一、ResNet革命性突破解析1.1残差学习核心思想1.2ResNet-34结构详解二、工业级Keras实现详解2.1数据预处理流水线2.2完整模型实现三、模型训练调优策略3.1学习率动态调整3.2混合精度训练四、性能优化技巧4.1分布式训练配置4.2TensorRT推理加速五、实战应用案例5.1医疗影像分类5.2工业质检系统六、模型可视化分析6.1特征热力图6.2参数量分析七、常见问题解决方
模式识别与机器学习课程笔记（1）：数学基础 Ro Jace 学习笔记机器学习笔记人工智能
模式识别与机器学习课程笔记（1）：数学基础特征矢量和特征空间随机矢量的描述随机矢量的分布函数随机矢量的数字特征随机变量、随机矢量间的统计关系随机矢量的变换正态分布正态分布的定义正态分布随机矢量的性质离散随机矢量及其分布信息论矩阵微分法基本知识矢量或矩阵对于数量变量的微分二、数量函数对于矢量的微分三、矢量函数对于矢量的微分特征矢量和特征空间特征量的类型：物理量、次序量、名义量物理量：直接反映特征的实
6+，基于免疫原性细胞死亡的非肿瘤分型文章，投稿到接收仅一个多月，肿瘤的热点已经传导至非肿瘤生信文章中！生信小课堂
影响因子：6.147本文从投稿到接收仅一个多月关于非肿瘤生信，我们也解读过很多，主要有以下类型1单个疾病WGCNA+PPI分析筛选hub基因。2单个疾病结合免疫浸润，热点基因集，机器学习，分子分型等。3两种相关疾病联合分析，包括非肿瘤结合非肿瘤，非肿瘤结合肿瘤或者非肿瘤结合泛癌分析目前非肿瘤生信发文的门槛较低，有需要的朋友欢迎交流！研究概述：脑卒中是世界上死亡和残疾的主要原因之一，缺血性中风占80
VSCode使用Jupyter完整指南配置机器学习环境 z日火校招学习日记 vscode jupyter 机器学习
接下来开始机器学习部分第一步配置环境：VSCode使用Jupyter完整指南1.安装必要的扩展打开VSCode，按Ctrl+Shift+X打开扩展市场，搜索并安装以下扩展：必装扩展：Python(Microsoft官方)-Python语言支持Jupyter(Microsoft官方)-Jupyternotebook支持Pylance(Microsoft官方)-Python智能提示和语法检查推荐扩展：
养老院管理系统基于SpringBoot的养老院管理系统系统设计与实现（源码+论文+部署讲解等）
博主介绍：✌全网粉丝60W+,csdn特邀作者、Java领域优质创作者、csdn/掘金/哔哩哔哩/知乎/道客/小红书等平台优质作者，计算机毕设实战导师，目前专注于大学生项目实战开发,讲解,毕业答疑辅导，欢迎高校老师/同行前辈交流合作✌技术栈范围：SpringBoot、Vue、SSM、Jsp、HLMT、Nodejs、Python、爬虫、数据可视化、小程序、安卓app、大数据、物联网、机器学习、单片机
AI产品经理成长记《零号列车》第一集邂逅0XAI列车黑客思维者 AI产品经理养成人工智能 AI产品经理大模型智能体
《零号列车》绝非传统意义上的AI产品经理教程——它是我沉淀二十多年跨行业数字化转型与工业4.0实战经验后，首创的100集大型小说体培养指南。那些曾在千行百业验证过的知识与经验，不再是枯燥的文字堆砌，而是化作一场沉浸式的学习旅程。这里没有生硬的理论灌输，而是用跌宕起伏的故事情节，串联起AI技术的底层逻辑。你会跟着角色的脚步推进剧情，在不知不觉中吃透机器学习、大模型应用等专业概念；更有深入浅出的技术拆
人工智能时代下的数据新职业：新兴工作岗位版图研究司南锤 economics 人工智能
目录摘要第一章：AI驱动的数据价值链重构1.1从“沉睡金矿”到“流动的血液”：数据作为核心经济资产的激活1.2知识的新经济学：零边际成本革命1.3AI作为新的“操作系统”：重塑产业竞争格局第二章：基石层：数据准备与质量保障中的角色2.1数据标注与标签领导力：数据标注经理/主管2.2“地面真实”的守护者：AI数据质量专家第三章：技术核心层：构建AI与机器学习全生命周期的工程角色3.1AI生产线架构师
Python领域制造业的Python应用 Python编程之道 Python编程之道 python 开发语言 ai
Python在制造业中的应用：从自动化到智能制造关键词：Python、制造业、工业自动化、数据分析、机器学习、物联网、智能制造摘要：本文深入探讨Python编程语言在制造业中的广泛应用。从基础的自动化脚本到复杂的智能制造系统，Python凭借其丰富的库生态系统和易用性，正在重塑现代制造业。我们将分析Python在制造业中的核心应用场景，包括设备监控、质量控制、预测性维护和供应链优化等，并通过实际案
【机器学习】探索未来科技的前沿：人工智能、机器学习与大模型 AIGC零基础入门小白 AI大模型大模型教程人工智能机器学习科技 AI大模型 AIGC AI教程大模型教程
文章目录引言一、人工智能：从概念到现实1.1人工智能的定义1.2人工智能的发展历史1.3人工智能的分类1.4人工智能的应用二、机器学习：人工智能的核心技术2.1机器学习的定义2.2机器学习的分类2.3机器学习的实现原理2.4机器学习的应用2.5机器学习的示例代码2.6解释代码三、大模型：推动AI前沿发展的关键技术3.1大模型的定义3.2大模型的发展历程3.3深度学习与神经网络3.4大模型的优势与挑
人工智能入门指南：从基础概念到实际应用
前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击跳转到网站。https://www.captainbed.cn/north文章目录1.**人工智能的基本概念**1.1什么是人工智能？1.2人工智能的分类2.**人工智能的核心技术**2.1机器学习（MachineLearning）2.1.1机器学习的类型2.1.2机器学习流程2.2深度学习（DeepLearni
Java与机器学习的邂逅：Weka框架入门指南墨夶 Java学习资料1 java 机器学习数据挖掘
在这个数据驱动的时代，机器学习已经成为各行业创新和优化的关键技术。而Java，作为一门成熟且广泛应用的编程语言，在企业级应用开发中占据着重要地位。将二者结合起来，利用Java实现机器学习算法，不仅可以充分发挥其强大的生态系统优势，还能为开发者提供一个高效、稳定的开发环境。今天，我们将带您走进Java与机器学习的世界，探索如何使用Weka这一著名的机器学习库来开启您的智能之旅。Weka简介及其优势什
机器学习基础：从数据到智能的入门指南
一、何谓机器学习在我们的日常生活中，机器学习的身影无处不在。当你打开购物软件，它总能精准推荐你可能喜欢的商品；当你解锁手机，人脸识别瞬间完成；当你使用语音助手，它能准确理解你的指令。这些背后，都离不开机器学习的支撑。机器学习是一门让计算机能够从数据中学习并改进的学科。随着传感器技术的飞速发展，我们身边充满了各种传感器，如手机中的摄像头、麦克风，交通监控中的传感器等，它们收集了海量的数据。这些数据就
大模型算法工程师技术路线全解析：从基础到资深的能力跃迁 Mr.小海大模型算法数据挖掘人工智能机器学习深度学习机器翻译 web3
文章目录大模型算法工程师技术路线全解析：从基础到资深的能力跃迁一、基础阶段（0-2年经验）：构建核心知识体系与工程入门数学与机器学习基础编程与深度学习框架NLP与Transformer入门二、进阶阶段（2-4年经验）：深化模型技术与工程落地能力大模型预训练与微调技术预训练原理：数据与任务的协同设计微调工具：参数高效适配与工程优化对齐实践：价值观优化与实证效果分布式训练与框架工具并行策略：多维度协同
Go与Python在数据管道与分析项目中的抉择：性能与灵活性的较量真智AI 人工智能 python go
你正在设计一个全新数据管道或启动一个分析项目，此时你或许正在思考该选择Python还是Go。五年前，这甚至不是个值得讨论的问题——你会毫不犹豫地选择Python，故事到此为止。然而，近年来Go在数据领域，尤其是在数据基础设施和实时处理方面，正逐渐被更多人采用。实际上，这两种语言都已在现代数据技术栈中找到了各自的定位。Python依然非常适合机器学习和数据分析，而Go则逐步成为高性能数据基础设施的首
Python爬虫实战：从新浪财经爬取股票新闻的完整实现 Python爬虫项目 python 爬虫开发语言数据分析 php
第一部分：爬虫概述1.1什么是爬虫？爬虫是指通过程序模拟浏览器的行为，自动化地抓取网络上的数据。通过爬虫技术，能够从各种网站上提取信息，广泛应用于数据采集、数据分析、机器学习等领域。1.2新浪财经简介新浪财经是中国最大的财经信息平台之一，提供股票、基金、债券、外汇等多方面的财经新闻和数据。在股票领域，新浪财经提供了大量的股票行情、实时数据、新闻报道等信息，因此爬取新浪财经的股票新闻对于投资分析和决
AI 智能运维，重塑大型企业软件运维：从自动化到智能化的进阶实践 AI、少年郎人工智能运维自动化
一、引言：企业软件运维的智能化转型浪潮在数字化转型加速的背景下，大型企业软件架构日益复杂，微服务、多云环境、分布式系统的普及导致传统运维模式面临效率瓶颈。AI技术的渗透催生了智能运维（AIOps）的落地，通过机器学习、大模型、智能Agent等技术，实现从"人工救火"到"智能预防"的范式转变。本文结合头部企业实践，解析AI在运维领域的核心应用场景、技术架构及未来趋势，特别针对基础运维中流程重构、技术
Spring AI 概述与功能简介 drebander AI 编程 spring 人工智能 java
SpringAI是一个由Spring团队开发的开源框架，旨在为人工智能（AI）和机器学习（ML）提供一个成熟且高效的开发平台。它将Spring生态系统的设计理念应用于AI开发，尤其强调模块化、可移植性以及简洁的集成。SpringAI提供了丰富的功能，涵盖从AI模型的调用到与数据库的集成等多个方面，帮助开发者构建和管理AI驱动的应用程序。1.SpringAI背景SpringAI的背景源于Spring
在二分类任务中如何处理包含中文的类别特征 Dush32 分类数据挖掘人工智能机器学习数据分析
在机器学习中，处理类别特征（CategoricalFeatures）是常见的任务，特别是在中文数据中，很多类别特征如省份、城市等都是字符串类型。如何将这些类别变量转换为模型可以理解的数值格式，是每个数据科学家都必须面对的挑战。在这篇文章中，我们将探讨两种常见的类别特征编码方法：astype('category')和LabelEncoder，并比较它们在二分类任务中的效果。我们以“省份”这一类别特征
基于用户画像的商品推荐系统 Dush32 机器学习人工智能 python 推荐算法
随着人工智能和大数据技术的进步，产品推荐系统成为了现代广告与电商平台中不可或缺的部分。通过深度挖掘用户的行为数据，能够为广告主提供精准的用户画像，从而更高效地推荐相关产品，提升购买转化率。本项目基于科大讯飞AI营销云大赛的赛题，目的是利用用户画像进行产品推荐，预测用户是否会购买相应商品。我们使用了机器学习的二分类模型，通过分析用户的性别、年龄、常驻地、机型等信息，来判断用户的付费行为。项目目标：本
AI原生应用领域多租户的技术架构剖析 AI天才研究院 AI-native 架构人工智能 ai
AI原生应用领域多租户技术架构深度剖析元数据框架标题：AI原生应用多租户技术架构：从隔离性到智能化的分层设计与实践关键词：AI原生应用、多租户架构、数据隔离、模型共享、云原生租户管理摘要：本文系统解析AI原生应用场景下多租户技术架构的核心设计逻辑，覆盖从数据层到模型层的全栈隔离与共享机制。通过第一性原理推导，结合云原生、机器学习生命周期管理（MLOps）等技术范式，提出包含租户上下文管理、动态资源
Python爬虫实战：批量下载小红书笔记图片的全流程技术解析 Python爬虫项目 2025年爬虫实战项目 python 爬虫笔记开发语言音视频 github
1.引言：为什么要爬取小红书笔记图片小红书作为新兴的生活方式分享平台，聚集了大量高质量原创笔记内容，涵盖时尚、美妆、旅游、美食等多领域。笔记中的图片往往是内容的核心，批量下载小红书笔记图片，有助于：内容归档与备份数据分析与用户行为研究图像识别与机器学习训练电商推广及内容再加工但小红书对内容保护做得较好，爬取难度较高，需要结合多技术手段突破。2.小红书平台特点与爬取难点动态加载与API接口多变：页面
【机器学习】必会降维算法之：独立成分分析（ICA） Carl_奕然机器学习算法人工智能
独立成分分析（ICA）1、引言2、独立成分分析（ICA）2.0引言2.1定义2.2应用场景2.3核心原理2.4实现方式2.5算法公式2.6代码示例3、总结1、引言小屌丝：鱼哥，最近胡塞武装很哇塞啊。小鱼：你什么时候开始关注军事了？小屌丝：这…还用关注吗？都上新闻了。小鱼：嗯，那你知道胡塞武装为什么这么厉害吗？小屌丝：额…当然是光脚不怕穿鞋的。小鱼：…你可真是…小屌丝：真是啥？小鱼：一个字，自己体会
Java 大视界 -- Java 大数据机器学习模型在金融市场情绪分析与投资策略制定中的应用青云交大数据新视界 Java 大视界 java 大数据机器学习情绪分析智能投资多源数据
Java大视界--Java大数据机器学习模型在金融市场情绪分析与投资策略制定中的应用）引言：正文：一、金融情绪数据的立体化采集与治理1.1多模态数据采集架构1.2数据治理与特征工程二、Java机器学习模型的工程化实践2.1情感分析模型的深度优化2.2强化学习驱动的动态投资策略三、顶级机构实战：Java系统的金融炼金术四、技术前沿：Java与金融科技的未来融合4.1量子机器学习集成4.2联邦学习在合
【机器学习【9】】评估算法：数据集划分与算法泛化能力评估 roman_日积跬步-终至千里 #机器学习机器学习
文章目录一、数据集划分：训练集与评估集二、K折交叉验证：提升评估可靠性1.基本原理1.1.K折交叉验证基本原理1.2.逻辑回归算法与L22.基于K折交叉验证L2算法三、弃一交叉验证（Leave-One-Out）1、基本原理2、代码实现四、ShuffleSplit交叉验证1、基本原理2、为什么能降低方差3、代码测试五、选择建议在机器学习中，评估算法的核心目标是衡量模型在“未知数据”上的表现，而不是仅
Python day15
@浙大疏锦行Pythonday15.内容：复习日本周主要的内容是一些常见的机器学习流程以及其中的部分内容标签编码以及连续特征的处理：归一化和正态化等。图像的绘制：热力图、Shap图等的绘制超参数优化算法：网格搜索、贝叶斯以及启发式算法模拟退火、遗传算法等不平衡数据集的处理：过采样以及欠采样。
Lecture 5：Training versus Testing 薛家掌柜的
回顾一下前四个Lecture，Lecture1讲的是找一个使得（也就是），Lecture2讲的是使得，Lecture3讲的是机器学习的分类，Lecture4讲的是让。那么，我们就有两个核心问题需要解决了。我们如何保证尽可能地靠近？我们如何使得足够小？而在这两个问题里面，假设集大小又扮演着什么样的角色？应该多大呢？如果是一个很小的，能够满足，但是可选的假设又太少了。如果是一个很大的，可选的假设很多，
Python 生物信息学秘籍第三版（四）绝不原创的飞龙默认分类默认分类
原文：annas-archive.org/md5/9694cf42f7d741c69225ff1cf52b0efe译者：飞龙协议：CCBY-NC-SA4.0第十一章：生物信息学中的机器学习机器学习在许多不同的领域中都有应用，计算生物学也不例外。机器学习在该领域有着无数的应用，最古老且最为人熟知的应用之一就是使用主成分分析（PCA）通过基因组学研究种群结构。随着该领域的蓬勃发展，还有许多其他潜在的应
【机器学习&深度学习】什么是量化？一叶千舟深度学习【理论】机器学习深度学习人工智能
目录前言一、量化的基本概念1.1量化对比示例1.2量化是如何实现的？二、为什么要进行量化？2.1解决模型体积过大问题2.2降低对算力的依赖2.3加速模型训练和推理2.4优化训练过程2.5降低部署成本小结：量化的应用场景三、量化的类型与实现3.1权重量化（WeightQuantization）3.2激活量化（ActivationQuantization）3.3梯度量化（GradientQuantiz
Spring的注解积累 yijiesuifeng spring 注解
用注解来向Spring容器注册Bean。需要在applicationContext.xml中注册： <context:component-scan base-package=”pagkage1[,pagkage2,…,pagkageN]”/>。如：在base-package指明一个包 <context:component-sc
传感器百合不是茶 android 传感器
android传感器的作用主要就是来获取数据,根据得到的数据来触发某种事件下面就以重力传感器为例; 1,在onCreate中获得传感器服务 private SensorManager sm;// 获得系统的服务 private Sensor sensor;// 创建传感器实例 @Override protected void
[光磁与探测]金吕玉衣的意义 comsci
这是一个古代人的秘密:现在告诉大家信不信由你们: 穿上金律玉衣的人,如果处于灵魂出窍的状态,可以飞到宇宙中去看星星这就是为什么古代
精简的反序打印某个数沐刃青蛟打印
以前看到一些让求反序打印某个数的程序。比如：输入123，输出321。记得以前是告诉你是几位数的，当时就抓耳挠腮，完全没有思路。似乎最后是用到%和/方法解决的。而今突然想到一个简短的方法，就可以实现任意位数的反序打印（但是如果是首位数或者尾位数为0时就没有打印出来了）代码如下： long num, num1=0;
PHP：6种方法获取文件的扩展名 IT独行者 PHP 扩展名
PHP：6种方法获取文件的扩展名 1、字符串查找和截取的方法 1 $extension = substr ( strrchr ( $file , '.' ), 1); 2、字符串查找和截取的方法二 1 $extension = substr
面试111 文强chu 面试
1事务隔离级别有那些，事务特性是什么（问到一次） 2 spring aop 如何管理事务的，如何实现的。动态代理如何实现，jdk怎么实现动态代理的，ioc是怎么实现的，spring是单例还是多例，有那些初始化bean的方式，各有什么区别（经常问） 3 struts默认提供了那些拦截器（一次） 4 过滤器和拦截器的区别（频率也挺高） 5 final，finally final
XML的四种解析方式小桔子 dom jdom dom4j sax
在平时工作中，难免会遇到把 XML 作为数据存储格式。面对目前种类繁多的解决方案，哪个最适合我们呢？在这篇文章中，我对这四种主流方案做一个不完全评测，仅仅针对遍历 XML 这块来测试，因为遍历 XML 是工作中使用最多的（至少我认为）。　　预备　　测试环境：　　AMD 毒龙1.4G OC 1.5G、256M DDR333、Windows2000 Server
wordpress中常见的操作 aichenglong 中文注册 wordpress 移除菜单
1 wordpress中使用中文名注册解决办法 1)使用插件 2)修改wp源代码进入到wp-include/formatting.php文件中找到 function sanitize_user( $username, $strict = false
小飞飞学管理-1 alafqq 管理
项目管理的下午题，其实就在提出问题（挑刺），分析问题，解决问题。今天我随意看下10年上半年的第一题。主要就是项目经理的提拨和培养。结合我自己经历写下心得对于公司选拔和培养项目经理的制度有什么毛病呢？ 1，公司考察，选拔项目经理，只关注技术能力，而很少或没有关注管理方面的经验，能力。 2，公司对项目经理缺乏必要的项目管理知识和技能方面的培训。 3，公司对项目经理的工作缺乏进行指
IO输入输出部分探讨百合不是茶 IO
//文件处理在处理文件输入输出时要引入java.IO这个包； /* 1，运用File类对文件目录和属性进行操作 2，理解流，理解输入输出流的概念 3，使用字节/符流对文件进行读/写操作 4，了解标准的I/O 5，了解对象序列化 */ //1，运用File类对文件目录和属性进行操作 //在工程中线创建一个text.txt
getElementById的用法 bijian1013 element
getElementById是通过Id来设置/返回HTML标签的属性及调用其事件与方法。用这个方法基本上可以控制页面所有标签，条件很简单，就是给每个标签分配一个ID号。返回具有指定ID属性值的第一个对象的一个引用。语法： &n
励志经典语录 bijian1013 励志人生
经典语录1: 哈佛有一个著名的理论：人的差别在于业余时间，而一个人的命运决定于晚上8点到10点之间。每晚抽出2个小时的时间用来阅读、进修、思考或参加有意的演讲、讨论，你会发现，你的人生正在发生改变，坚持数年之后，成功会向你招手。不要每天抱着QQ/MSN/游戏/电影/肥皂剧……奋斗到12点都舍不得休息，看就看一些励志的影视或者文章，不要当作消遣；学会思考人生，学会感悟人生
[MongoDB学习笔记三]MongoDB分片 bit1129 mongodb
MongoDB的副本集(Replica Set)一方面解决了数据的备份和数据的可靠性问题，另一方面也提升了数据的读写性能。MongoDB分片(Sharding)则解决了数据的扩容问题，MongoDB作为云计算时代的分布式数据库，大容量数据存储，高效并发的数据存取，自动容错等是MongoDB的关键指标。本篇介绍MongoDB的切片(Sharding) 1.何时需要分片 &nbs
【Spark八十三】BlockManager在Spark中的使用场景 bit1129 manager
1. Broadcast变量的存储，在HttpBroadcast类中可以知道 2. RDD通过CacheManager存储RDD中的数据，CacheManager也是通过BlockManager进行存储的 3. ShuffleMapTask得到的结果数据，是通过FileShuffleBlockManager进行管理的，而FileShuffleBlockManager最终也是使用BlockMan
yum方式部署zabbix ronin47 yum方式部署zabbix
安装网络yum库#rpm -ivh http://repo.zabbix.com/zabbix/2.4/rhel/6/x86_64/zabbix-release-2.4-1.el6.noarch.rpm 通过yum装mysql和zabbix调用的插件还有agent代理#yum install zabbix-server-mysql zabbix-web-mysql mysql-
Hibernate4和MySQL5.5自动创建表失败问题解决方法 byalias J2EE Hibernate4
今天初学Hibernate4，了解了使用Hibernate的过程。大体分为4个步骤： ①创建hibernate.cfg.xml文件 ②创建持久化对象 ③创建*.hbm.xml映射文件 ④编写hibernate相应代码在第四步中，进行了单元测试，测试预期结果是hibernate自动帮助在数据库中创建数据表，结果JUnit单元测试没有问题，在控制台打印了创建数据表的SQL语句，但在数据库中
Netty源码学习-FrameDecoder bylijinnan java netty
Netty 3.x的user guide里FrameDecoder的例子，有几个疑问： 1.文档说：FrameDecoder calls decode method with an internally maintained cumulative buffer whenever new data is received. 为什么每次有新数据到达时，都会调用decode方法？ 2.Dec
SQL行列转换方法 chicony 行列转换
create table tb(终端名称 varchar(10) , CEI分值 varchar(10) , 终端数量 int) insert into tb values('三星' , '0-5' , 74) insert into tb values('三星' , '10-15' , 83) insert into tb values('苹果' , '0-5' , 93)
中文编码测试 ctrain 编码
循环打印转换编码 String[] codes = { "iso-8859-1", "utf-8", "gbk", "unicode" }; for (int i = 0; i < codes.length; i++) { for (int j
hive 客户端查询报堆内存溢出解决方法 daizj hive 堆内存溢出
hive> select * from t_test where ds=20150323 limit 2; OK Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 问题原因： hive堆内存默认为256M 这个问题的解决方法为：修改/us
人有多大懒，才有多大闲 (评论『卓有成效的程序员』) dcj3sjt126com 程序员
卓有成效的程序员给我的震撼很大，程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可以那么勤奋，每天都孜孜不倦得做着重复单调的工作。在看这本书之前，我属于勤奋的人，而看完这本书以后，我要努力变成懒惰的人。不要在去庞大的开始菜单里面一项一项搜索自己的应用程序，也不要在自己的桌面上放置眼花缭乱的快捷图标
Eclipse简单有用的配置 dcj3sjt126com eclipse
1、显示行号 Window -- Prefences -- General -- Editors -- Text Editors -- show line numbers 2、代码提示字符 Window ->Perferences，并依次展开 Java -> Editor -> Content Assist，最下面一栏 auto-Activation
在tomcat上面安装solr4.8.0全过程 eksliang Solr solr4.0后的版本安装 solr4.8.0安装
转载请出自出处： http://eksliang.iteye.com/blog/2096478 首先solr是一个基于java的web的应用，所以安装solr之前必须先安装JDK和tomcat，我这里就先省略安装tomcat和jdk了第一步：当然是下载去官网上下载最新的solr版本，下载地址
Android APP通用型拒绝服务、漏洞分析报告 gg163 漏洞 android APP 分析
点评：记得曾经有段时间很多SRC平台被刷了大量APP本地拒绝服务漏洞，移动安全团队爱内测（ineice.com）发现了一个安卓客户端的通用型拒绝服务漏洞，来看看他们的详细分析吧。 0xr0ot和Xbalien交流所有可能导致应用拒绝服务的异常类型时，发现了一处通用的本地拒绝服务漏洞。该通用型本地拒绝服务可以造成大面积的app拒绝服务。针对序列化对象而出现的拒绝服务主要
HoverTree项目已经实现分层 hvt 编程 .net Web C#ASP.ENT
HoverTree项目已经初步实现分层，源代码已经上传到 http://hovertree.codeplex.com请到SOURCE CODE查看。在本地用SQL Server 2008 数据库测试成功。数据库和表请参考：http://keleyi.com/a/bjae/ue6stb42.htmHoverTree是一个ASP.NET 开源项目，希望对你学习ASP.NET或者C#语言有帮助，如果你对
Google Maps API v3: Remove Markers 移除标记天梯梦 google maps api
Simply do the following: I. Declare a global variable: var markersArray = []; II. Define a function: function clearOverlays() { for (var i = 0; i < markersArray.length; i++ )
jQuery选择器总结 lq38366 jquery 选择器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
基础数据结构和算法六：Quick sort sunwinner Algorithm Quicksort
Quick sort is probably used more widely than any other. It is popular because it is not difficult to implement, works well for a variety of different kinds of input data, and is substantially faster t
如何让Flash不遮挡HTML div元素的技巧_HTML/Xhtml_网页制作刘星宇 html Web
今天在写一个flash广告代码的时候，因为flash自带的链接，容易被当成弹出广告，所以做了一个div层放到flash上面，这样链接都是a触发的不会被拦截，但发现flash一直处于div层上面，原来flash需要加个参数才可以。让flash置于DIV层之下的方法，让flash不挡住飘浮层或下拉菜单，让Flash不档住浮动对象或层的关键参数：wmode=opaque。方法如下：
Mybatis实用Mapper SQL汇总示例 wdmcygah sql mysql mybatis 实用
Mybatis作为一个非常好用的持久层框架，相关资料真的是少得可怜，所幸的是官方文档还算详细。本博文主要列举一些个人感觉比较常用的场景及相应的Mapper SQL写法，希望能够对大家有所帮助。不少持久层框架对动态SQL的支持不足，在SQL需要动态拼接时非常苦恼，而Mybatis很好地解决了这个问题，算是框架的一大亮点。对于常见的场景，例如：批量插入/更新/删除，模糊查询，多条件查询，联表查询，