上一篇文章主要“扫清了”run()函数的外围,本篇文章将详细解析其核心内容,即for mini_batch in corpus.load_train():循环中的内容。提前先“打个预防针”,这部分内容会涉及到数学方面的知识,有一定难度。本文会力求通过通俗的语言和详实的内容讲解降低理解难度,使读者能相对容易地理解这部分内容。
言归正传。前一篇文章已经说明了,corpus.load_train()返回的是_train_data(即训练数据)经过打乱后的1个batch(默认值100),又经过问题和答复以及输出拼接后的内容(列表形式),那么mini_batch毫无疑问就是这个列表中的每一项,也就是每个经过拼接后的项(形如:[[Q0, Q1, ..., Qmax, GO, U0, U1, ..., Umax], [1, 0]])。
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
以上这2句程序的意思是以0初始化各个层的weights和biases。对于机器学习保险行业问答开放数据集DeepQA-1的详细注解(一)中提到,weights是[array(100行120列), array(50行100列), array(2行50列)],biases是[array(100行1列), array(50行1列), array(2行1列)]。相应地,nabla_w就是[array([0.,0., ..., 一共100*120个0]), array([0., 0., 一共50*100个0]), array([0., 0., 一共2*50个0])],nabla_b就是 [array([0.,0., ..., 一共100*1个0]), array([0., 0., 一共50*1个0]), array([0., 0., 一共2*1个0])]。
for x, y_ in mini_batch:
参考上边的mini_batch样例,x为[Q0, Q1, ..., Qmax, GO, U0, U1, ..., Umax],y为[1, 0]或[0, 1]。
接下来就是重头戏了,self.back_propagation()。其与run()同在network.py中,源码如下:
def back_propagation(self, x, y_):
'''
back propagation algorithm to compute the error rates for every W and b
'''
cost = 0.0
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
zs = [] # z vectors
activations = [x]
activation = x
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation) + b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
cost += self.loss_fn(activations[-1], y_)
# backward
delta = self.loss_fn_derivative(activations[-1], y_) * sigmoid_derivative(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
for l in range(2, self.layers_num):
z = zs[-l]
delta = np.dot(self.weights[-l+1].transpose(), delta) * sigmoid_derivative(zs[-l])
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
return (nabla_b, nabla_w, cost)
cost = 0.0,初始化损失为0。、
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
这2句和上面run()中的2句如出一辙,nabla_w就是[array([0.,0., ..., 一共100*120个0]), array([0., 0., 一共50*100个0]), array([0., 0., 一共2*50个0])],nabla_b就是 [array([0.,0., ..., 一共100*1个0]), array([0., 0., 一共50*1个0]), array([0., 0., 一共2*1个0])]。nabla_w[0].shape是(100, 120),nabla_w[1].shape是(50, 100),nabla_w[2].shape是(2, 50);nabla_b[0].shape是(100, 1),nabla_b[1].shape是(100, 1),nabla_b[2].shape是(2, 1)。
zs = [] # z vectors
activations = [x]
activation = x
接下来的这3行代码是初始化zs列表以激活值activation和activations。
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation) + b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
这段代码就是针对于各个层(输入层->隐含层1,隐含层1->隐含层2,隐含层2->输出层)依次计算加权和以及经过激活函数后的值。分别保存于zs和activations列表中。activations是[(120,1), (100,1), (50,1), (2,1)]的列表;zs是[(100,1), (50,1), (2,1)]的列表。