神经网络是由一个一个神经元组成,其基本结构如下:
无数以链式结构组成在一起的神经元构成了神经网络,设置合理的 w w w和 b b b可以拟合任何数学模型,训练神经网络的本质就是利用梯度下降求解最佳 w w w和 b b b。由于神经网络颇为复杂,无法直接求解某个参数的梯度,所以主流的方法是采用反向传播( B P BP BP)的方法一步步前推,具体详见深度学习笔记(1)——神经网络详解及改进。
a = σ ( x ) = 1 1 + e − x a=\sigma(x)=\frac{1}{1+e^{-x}} a=σ(x)=1+e−x1
s i g m o i d sigmoid sigmoid作为最简单的激活函数,在 x x x趋近于负无穷时, y y y趋近于0, x x x趋近于正无穷时, y y y趋近于1。
def sigmoid(z, driv=False): # driv 微分
if(driv == True): # 微分(输入z,返回dz)
return sigmoid(z)*(1-sigmoid(z))
return 1/(1+np.exp(-z))
a = e z i ∑ j n e z j a=\frac{e^{z_i}}{\sum_{j}^ne^{z_j}} a=∑jnezjezi
s o f t m a x softmax softmax是指数除以指数之和,一般作为分类任务的输出层,输出结果可以认为是几个类别选择的概率。
例如三分类问题, z 1 = 3 , z 2 = 1 , z 3 = − 3 z_1=3,z_2=1,z_3=-3 z1=3,z2=1,z3=−3,经过 s o f t m a x softmax softmax后 a 1 = 0.88 , a 2 = 0.12 , a 3 = 0 a_1=0.88,a_2=0.12,a_3=0 a1=0.88,a2=0.12,a3=0。所以通俗得讲 s o f t m a x softmax softmax的作用就是对输出的最大值进行强化,并将其归一化至 [ 0 , 1 ] [0,1] [0,1]之间。
对 s o f t m a x softmax softmax求导
如果 i = j i=j i=j:
∂ a i ∂ z i = a i ( 1 − a i ) \frac{\partial a_i}{\partial z_i}=a_i(1-a_i) ∂zi∂ai=ai(1−ai)
如果 i ≠ j i\neq j i=j:
∂ a j ∂ z i = − a i a j \frac{\partial a_j}{\partial z_i}=-a_ia_j ∂zi∂aj=−aiaj
损失函数 l l l对 z z z求导: ∂ l / ∂ z \partial l/\partial z ∂l/∂z
∂ l ∂ z i = ∂ l ∂ a j ∂ a j ∂ z i = a i − y i \frac{\partial l}{\partial z_i}=\frac{\partial l}{\partial a_j}\frac{\partial a_j}{\partial z_i}=a_i-y_i ∂zi∂l=∂aj∂l∂zi∂aj=ai−yi
def softmax(self, z=None, a=None, y=None, driv=False): # softmax层
if(driv==True): # 损失函数对z的微分
return a-y
size = z.shape[0]
log_out = np.exp(z) # 对输出结果取ln
log_sum = np.sum(log_out,axis=1).reshape(size,1) # sum(e^output)
return np.exp(z)/log_sum
训练集 x x x:
x x x是一个 s i z e ∗ d i m e n s i o n size*dimension size∗dimension的二维数组, s i z e size size代表训练集 x x x的数量, d i m e n s i o n dimension dimension代表训练集的维度。
x = [ x 11 x 12 . . . x 1 d x 21 x 22 . . . x 2 d . . . . . . . . . x s 1 x s 2 . . . x s d ] s i z e ∗ d i m e n s i o n x = \begin{bmatrix} x_{11} & x_{12} & ... & x_{1d} \\ x_{21} & x_{22} & ... & x_{2d} \\ ...&...&&...\\x_{s1}&x_{s2}&...&x_{sd} \end{bmatrix}_{size*dimension} x=⎣⎢⎢⎡x11x21...xs1x12x22...xs2.........x1dx2d...xsd⎦⎥⎥⎤size∗dimension ,其中 x i j x_{ij} xij代表第 i i i组数据的第 j j j个输入值。
神经元值 a a a:
a a a是一个 l a y e r s ∗ s i z e ∗ u n i t s layers*size*units layers∗size∗units的三维数组, l a y e r s layers layers代表神经网络层数, s i z e size size代表训练集数量, u n i t s units units代表该层网络有几个神经元。
a [ i ] = [ a 11 a 12 . . . a 1 l a 21 a 22 . . . a 2 l . . . . . . . . . a s 1 a s 2 . . . a s l ] s i z e ∗ l a y e r s a[i] = \begin{bmatrix} a_{11} & a_{12} & ... & a_{1l} \\ a_{21} & a_{22} & ... & a_{2l} \\ ...&...&&...\\a_{s1}&a_{s2}&...&a_{sl} \end{bmatrix}_{size*layers} a[i]=⎣⎢⎢⎡a11a21...as1a12a22...as2.........a1la2l...asl⎦⎥⎥⎤size∗layers,其中 a [ i ] [ j ] [ k ] a[i][j][k] a[i][j][k]代表第 i i i层网络,第 j j j组数据,第 k k k个神经元的值。例: a [ 1 ] = x a[1]=x a[1]=x,第1层网络就是输入层。
未经过激活函数的神经元 z z z
数据形式和 a a a相同。
损失函数 l l l对 z z z的微分 d z dz dz
数据形式和 a a a相同。
d z [ i ] = [ d z 11 d z 12 . . . d z 1 l d z 21 d z 22 . . . d z 2 l . . . . . . . . . d z s 1 d z s 2 . . . d z s l ] s i z e ∗ l a y e r s dz[i] = \begin{bmatrix} dz_{11} & dz_{12} & ... & dz_{1l} \\ dz_{21} & dz_{22} & ... & dz_{2l} \\ ...&...&&...\\dz_{s1}&dz_{s2}&...&dz_{sl} \end{bmatrix}_{size*layers} dz[i]=⎣⎢⎢⎡dz11dz21...dzs1dz12dz22...dzs2.........dz1ldz2l...dzsl⎦⎥⎥⎤size∗layers
偏差 b b b
b b b是一个 l a y e r s ∗ u n i t s layers*units layers∗units的二维数组, b [ i ] [ j ] b[i][j] b[i][j]代表第 i i i层网络第 j j j个神经元上的偏差。
权重 w w w
w w w是一个 ( l a y e r s − 1 ) ∗ u n i t s 1 ∗ u n i t s 2 (layers-1)*units1*units2 (layers−1)∗units1∗units2的三维数组, w [ i ] w[i] w[i]表示第 i i i层和第 i + 1 i+1 i+1层之间的权值, u n i t s 1 units1 units1是第 i i i层的神经元个数, u n i t s 2 units2 units2是第 i + 1 i+1 i+1层的神经元个数。
w [ i ] = [ w 11 w 12 . . . w 1 u 2 w 21 w 22 . . . w 2 u 2 . . . . . . . . . w u 1 1 w u 1 2 . . . w u 1 u 2 ] u n i t s 1 ∗ u n i t s 2 w[i] = \begin{bmatrix} w_{11} & w_{12} & ... & w_{1u_{2}} \\ w_{21} & w_{22} & ... & w_{2u_{2}} \\ ...&...&&...\\w_{u_{1}1}&w_{u_{1}2}&...&w_{u_1u_2} \end{bmatrix}_{units1*units2} w[i]=⎣⎢⎢⎡w11w21...wu11w12w22...wu12.........w1u2w2u2...wu1u2⎦⎥⎥⎤units1∗units2, w [ i ] [ j ] [ k ] w[i][j][k] w[i][j][k]代表第 i i i层第 j j j个神经元和第 i + 1 i+1 i+1层第 k k k个神经元之间的权重。
比较简单,每一层的值等于前一层的值矩阵乘权重加上 b i a s bias bias
z i + 1 = a i ∗ w i + b i z_{i+1}=a_i*w_i+b_i zi+1=ai∗wi+bi
def ForwardPropagation(x_train, num_layers): # 神经网络的结果
'''
@param num_layers:网络层数
'''
z = {} # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a = {} # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a[0] = x_train
for i in range(self.num_layers-2):
z[i+1] = a[i]@self.w[i]+self.b[i]
a[i+1] = sigmoid(z[i+1])
# softmax层
z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
a[self.num_layers-1] = softmax(z[self.num_layers-1])
return z,a
def BackPropagation(z, a, y_train): # bp计算Loss对z的偏微分
dz = {} # dl/dz [第x个层][第y个数据, 第z个神经元]
# softmax层
dz[self.num_layers-1] = self.softmax(a[self.num_layers-1],y_train,driv=True)/y_train.shape[0]
# 隐藏层
for i in range(self.num_layers-2,0,-1):
dz[i] = dz[i+1]@self.w[i].T*self.sigmoid(z[i],driv=True)
return dz
def cal_driv(dz,a): # 计算dw和db
dw =0
db = np.mean(dz,axis=0)
for i in range(dz.shape[0]):
dw = dw+a[i].reshape(len(a[i]),1)@dz[i].reshape(1,len(dz[i]))
dw = dw/a.shape[0]
return dw,db
def gradient_decent(x_train, y_train):
size = x_train.shape[0]
z,a = ForwardPropagation(x_train) # 进行一次前馈运算
dz = BackPropagation(z,a,y_train)
# 对 w,b进行梯度下降
for i in range(self.num_layers-1):
dw,db = cal_driv(dz[i+1],a[i])
self.w[i] = self.w[i] - dw*learning_rate
self.b[i] = self.b[i] - db*learning_rate
import numpy as np
class BP_network():
def __init__(self,input_dim, hidden_layer, output_dim):
self.input_dim = input_dim
self.output_dim = output_dim
self.w = {} # 利用字典存储 w
self.b = {} # 利用字典存储 bias
self.num_layers = len(hidden_layer)+2 # 网络层数
hidden_layer.insert(0,input_dim)
hidden_layer.append(output_dim)
for i in range(len(hidden_layer)-1):
self.w[i] = np.random.rand(hidden_layer[i],hidden_layer[i+1])
self.b[i] = np.zeros((1,hidden_layer[i+1]))
def sigmoid(self, z, driv=False): # driv 微分
if(driv == True): # 微分(输入z,返回dz)
return self.sigmoid(z)*(1-self.sigmoid(z))
return 1/(1+np.exp(-z))
def softmax(self, z=None, a=None, y=None, driv=False): # softmax层
if(driv==True): # loss函数对z微分
return a-y
size = z.shape[0]
log_out = np.exp(z) # 对输出结果取ln
log_sum = np.sum(log_out,axis=1).reshape(size,1) # sum(e^output)
return np.exp(z)/log_sum
def fit(self, x_train, y_train, batch_size, learning_rate=0.01, epochs=20):
def Loss(output, y_train): # 交叉熵
return -np.sum(y_train*np.log(output))/output.shape[0]
def ForwardPropagation(x_train): # 神经网络的结果
z = {} # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a = {} # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a[0] = x_train
for i in range(self.num_layers-2):
z[i+1] = a[i]@self.w[i]+self.b[i]
a[i+1] = self.sigmoid(z[i+1])
# softmax层
z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
a[self.num_layers-1] = self.softmax(z[self.num_layers-1])
return z,a
def BackPropagation(z, a, y_train): # bp计算Loss对z的偏微分
dz = {} # dl/dz [第x个层][第y个数据, 第z个神经元]
# softmax层
dz[self.num_layers-1] = self.softmax(a=a[self.num_layers-1],y=y_train,driv=True)/y_train.shape[0]
# 隐藏层
for i in range(self.num_layers-2,0,-1):
dz[i] = dz[i+1]@self.w[i].T*self.sigmoid(z[i],driv=True)
return dz
def cal_driv(dz,a): # 计算dw和db
dw = 0
db = np.mean(dz,axis=0)
for i in range(dz.shape[0]):
dw = dw+a[i].reshape(len(a[i]),1)@dz[i].reshape(1,len(dz[i]))
dw = dw/a.shape[0]
return dw,db
def gradient_decent(x_train, y_train):
size = x_train.shape[0]
z,a = ForwardPropagation(x_train) # 进行一次前馈运算
dz = BackPropagation(z,a,y_train)
# 对 w,b进行梯度下降
for i in range(self.num_layers-1):
dw,db = cal_driv(dz[i+1],a[i])
self.w[i] = self.w[i] - dw*learning_rate
self.b[i] = self.b[i] - db*learning_rate
size = x_train.shape[0] # 数据集数量
for i in range(epochs):
for j in range(0,size,batch_size):
gradient_decent(x_train[j:j+batch_size],y_train[j:j+batch_size])
z,a = ForwardPropagation(x_train)
loss = Loss(a[self.num_layers-1], y_train) # 计算loss
accuracy = self.accuracy(self.predict(x_train),y_train)
print('第%d个epochs: Loss=%f,accuracy:%f' %(i+1,loss,accuracy))
def predict(self, x_train):
z = {} # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a = {} # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
a[0] = x_train
for i in range(self.num_layers-2):
z[i+1] = a[i]@self.w[i]+self.b[i]
a[i+1] = self.sigmoid(z[i+1])
# softmax层
z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
a[self.num_layers-1] = self.softmax(z[self.num_layers-1])
y = np.zeros(a[self.num_layers-1].shape)
y[a[self.num_layers-1]-a[self.num_layers-1].max(axis=1).reshape(a[self.num_layers-1].shape[0],1)>=0]=1
return y
def accuracy(self,y,y_true):
return np.sum(np.where(y==1)[1]==np.where(y_true==1)[1])/y.shape[0]
准备数据集(简单的三分类数据集):
x_train = np.random.randint(0,100,size=(100,2))
temp = (2*x_train[:,0]+x_train[:,1])
y_train = np.zeros((100,3))
y_train[temp>150,0] = 1
y_train[temp<50,1] = 1
y_train[((temp<=150) & (temp>=50)),2] = 1
x_train = (x_train-x_train.min(axis=0))/(x_train.max(axis=0)-x_train.min(axis=0))
训练网络
bp = BP_network(input_dim=2, hidden_layer=[5,5], output_dim=3)
bp.fit(x_train,y_train,batch_size=10,learning_rate=0.1,epochs=2000)
结果
第1个epochs: Loss=1.053458,accuracy:0.390000
第2个epochs: Loss=1.005622,accuracy:0.390000
第3个epochs: Loss=0.968565,accuracy:0.390000
第4个epochs: Loss=0.939970,accuracy:0.390000
第5个epochs: Loss=0.917880,accuracy:0.500000
第6个epochs: Loss=0.900729,accuracy:0.570000
第7个epochs: Loss=0.887299,accuracy:0.570000
第8个epochs: Loss=0.876668,accuracy:0.570000
第9个epochs: Loss=0.868147,accuracy:0.570000
第10个epochs: Loss=0.861226,accuracy:0.570000
...
第1997个epochs: Loss=0.561577,accuracy:0.830000
第1998个epochs: Loss=0.561012,accuracy:0.830000
第1999个epochs: Loss=0.560447,accuracy:0.830000
第2000个epochs: Loss=0.559881,accuracy:0.840000
代码正常运行