在手敲神经网络进行训练时,发现神经网络权重w并没有更新。
主要代码如下(activations.activator 是用来返回激活函数的函数指针的)
import activations
import numpy as np
import pandas as pd
class Neuro():
def __init__(self,cellnum,learning_rate=0.1,activators=1):
self.cellnum = cellnum
self.layer_num = len(cellnum)-1
self.learning_rate = learning_rate
if type(activators) == int :
self.activator = [activations.activator(activators)] * self.layer_num
else :
self.activator = [activations.activator(activator) for activator in activators]
self.w , self.b = [] , [] # weights and biases
for height, width in zip(cellnum[:-1],cellnum[1:]):
self.w.append(np.zeros((height,width)))
self.b.append(np.zeros((1,width)))
def fit(self,_x,_y,epochs,batch_size):
credit_learning_rate = self.learning_rate / batch_size
for epoch in range(epochs):
index = np.random.permutation(_x.shape[0])
# activated value, x[layer].shape = ( batch_size , cell[layer] )
x = [ np.empty((batch_size,width)) for width in self.cellnum ]
# pre-activated value , z[layer].shape = ( batch_size , cell[layer+1])
z = [ np.empty((batch_size,width)) for width in self.cellnum[1:] ]
# error[layer].shape = ( batch_size , cell[layer+1] )
error = [ np.empty((batch_size,width)) for width in self.cellnum[1:] ]
for batch_num in range(_x.shape[0],batch_size-1,-batch_size):
x[0] = _x[index[batch_num-batch_size : batch_num] , :]
y = _y[index[batch_num-batch_size : batch_num] , :]
# forward_propagation
for layer in range(self.layer_num):
# b[layer] should be broadcast to shape (batch_size,b[layer].width())
# (automatically broadcast by numpy)
z[layer] = np.dot(x[layer], self.w[layer]) + self.b[layer]
x[layer+1] = self.activator[layer].function()(z[layer])
error[-1] = (x[-1] - y) * self.activator[-1].derivative()(z[-1]) # hadmard multiply
print("loss = ",np.linalg.norm(x[-1]-y))
# back propagation
for layer in range(self.layer_num-2,-1,-1):
error[layer] = self.activator[layer].derivative()(z[layer]) * np.dot(error[layer+1],self.w[layer+1].transpose())
# update
for layer in range(self.layer_num-1,-1,-1):
self.w[layer] -= np.dot(x[layer].transpose() , error[layer]) * credit_learning_rate
self.b[layer] -= np.sum(error[layer],axis=0) * credit_learning_rate
注意反向传播(back propagation)的部分中,误差项的计算公式:
δ l = f ′ ( z l ) ⊙ ( δ l + 1 ( w l + 1 ) T ) \delta^l=f'(z^l)\odot (\delta^{l+1}(w^{l+1})^T) δl=f′(zl)⊙(δl+1(wl+1)T)
其中 δ l , δ l + 1 \delta^l,\delta^{l+1} δl,δl+1为第 l , l + 1 l,l+1 l,l+1 层误差项, f f f为激活函数, ⊙ \odot ⊙表示哈达玛积(矩阵对应位置元素相乘)。
或见代码中的第52~53行:
for layer in range(self.layer_num-2,-1,-1):
error[layer] = self.activator[layer].derivative()(z[layer]) * np.dot(error[layer+1],self.w[layer+1].transpose())
如果一开始初始化所有权重 w w w为0(零矩阵),那么第一次反向传播的时候在此处权重 w w w 也为0,经过点乘之后,误差项 δ l = e r r o r [ l a y e r ] \delta^l=error[layer] δl=error[layer] 也为零矩阵。
再看最后几行使用误差项计算更新:
w l ← w l − η ( x l ) T δ l b l ← b l − η δ l w^l ←w^l- η(x^l)^T\delta^l \\b^l ←\ b^l\ -\ η\delta^l\ \ \ \ \ \ wl←wl−η(xl)Tδlbl← bl − ηδl
或代码最末:
for layer in range(self.layer_num-1,-1,-1):
self.w[layer] -= np.dot(x[layer].transpose() , error[layer]) * credit_learning_rate
self.b[layer] -= np.sum(error[layer],axis=0) * credit_learning_rate
第 l l l 层权重与偏置 w l , b l w^l,b^l wl,bl 的更新都涉及到了误差项 δ l \delta^l δl,但由于前面表明误差项为零矩阵,所以权重就不会更新!如此一来,权重矩阵 w w w 仍然是0,再怎么训练依旧为0.
那么是不是只要不把权重初始化全为0就行了呢?在偷懒的情况下,试图把权重全部初始化为一样的值(比如全变成0.5),这样就解决了吗?
答案是否定的。
假设有一个三层的神经网络:输入层(input layer)、中间层(hidden layer)、输出层(output layer),假设中间层有神经元 a 1 , a 2 , . . . , a n a_1,a_2,...,a_n a1,a2,...,an。
实际上,神经网络训练中的每一步都是矩阵运算,它们对 a 1 , a 2 . . . , a n a_1,a_2...,a_n a1,a2...,an 一视同仁。而且由于权重初始化完全一致,也就是说 a 1 , a 2 , . . , a n a_1,a_2,..,a_n a1,a2,..,an 都是等价的。无论怎么训练,输入层对 a 1 , a 2 , . . . , a n a_1,a_2,...,a_n a1,a2,...,an 的贡献权重一样、它们对输出层的贡献权重一样、输出层计算对每个 a i a_i ai的偏导一样、反向传播的误差项一样、各 a i a_i ai (与之相关的权重)的更新也一样……也就是说,这些 a i a_i ai共进退,相当于中间层只有一个神经元 a 1 a_1 a1!
初始化的时候用随机数,例如 np.random.normal生成给定均值、方差的正态分布矩阵:
self.w , self.b = [] , [] # weights and biases
for height, width in zip(cellnum[:-1],cellnum[1:]):
self.w.append(np.random.normal(0,1,(height,width)))
self.b.append(np.random.normal(0,1,(1,width)))