手写BP神经网络

文章目录

  • 一、梯度下降的数学公式
    • 1、 s i g m o i d sigmoid sigmoid
    • 2、 s o f t m a x softmax softmax
  • 二、代码编写
    • 1、数据形式
    • 1、前向传播
    • 2、反向传播
    • 3、计算 d w dw dw d b db db
    • 4、梯度下降
    • 5、代码总览
    • 6、代码测试


一、梯度下降的数学公式

神经网络是由一个一个神经元组成,其基本结构如下:

手写BP神经网络_第1张图片
无数以链式结构组成在一起的神经元构成了神经网络,设置合理的 w w w b b b可以拟合任何数学模型,训练神经网络的本质就是利用梯度下降求解最佳 w w w b b b。由于神经网络颇为复杂,无法直接求解某个参数的梯度,所以主流的方法是采用反向传播( B P BP BP)的方法一步步前推,具体详见深度学习笔记(1)——神经网络详解及改进。

1、 s i g m o i d sigmoid sigmoid

a = σ ( x ) = 1 1 + e − x a=\sigma(x)=\frac{1}{1+e^{-x}} a=σ(x)=1+ex1
s i g m o i d sigmoid sigmoid作为最简单的激活函数,在 x x x趋近于负无穷时, y y y趋近于0, x x x趋近于正无穷时, y y y趋近于1。

  • s i g m o i d sigmoid sigmoid求导
    a ′ = σ ( x ) ( 1 − σ ( x ) ) a'=\sigma(x)(1-\sigma(x)) a=σ(x)(1σ(x))
  • 损失函数 l l l z z z求导: ∂ l / ∂ z \partial l/\partial z l/z
    ∂ l ∂ z i = σ ′ ( z ) ∑ w i ∂ l ∂ z i + 1 \frac{\partial l}{\partial z_i}=\sigma'(z)\sum w_i\frac{\partial l}{\partial z_{i+1}} zil=σ(z)wizi+1l
def sigmoid(z, driv=False):    # driv 微分
    if(driv == True):   # 微分(输入z,返回dz)
        return sigmoid(z)*(1-sigmoid(z))
    return 1/(1+np.exp(-z))

2、 s o f t m a x softmax softmax

a = e z i ∑ j n e z j a=\frac{e^{z_i}}{\sum_{j}^ne^{z_j}} a=jnezjezi
s o f t m a x softmax softmax是指数除以指数之和,一般作为分类任务的输出层,输出结果可以认为是几个类别选择的概率。
例如三分类问题, z 1 = 3 , z 2 = 1 , z 3 = − 3 z_1=3,z_2=1,z_3=-3 z1=3,z2=1,z3=3,经过 s o f t m a x softmax softmax a 1 = 0.88 , a 2 = 0.12 , a 3 = 0 a_1=0.88,a_2=0.12,a_3=0 a1=0.88,a2=0.12,a3=0。所以通俗得讲 s o f t m a x softmax softmax的作用就是对输出的最大值进行强化,并将其归一化至 [ 0 , 1 ] [0,1] [0,1]之间。

  • s o f t m a x softmax softmax求导
    如果 i = j i=j i=j
    ∂ a i ∂ z i = a i ( 1 − a i ) \frac{\partial a_i}{\partial z_i}=a_i(1-a_i) ziai=ai(1ai)
    如果 i ≠ j i\neq j i=j
    ∂ a j ∂ z i = − a i a j \frac{\partial a_j}{\partial z_i}=-a_ia_j ziaj=aiaj

  • 损失函数 l l l z z z求导: ∂ l / ∂ z \partial l/\partial z l/z
    ∂ l ∂ z i = ∂ l ∂ a j ∂ a j ∂ z i = a i − y i \frac{\partial l}{\partial z_i}=\frac{\partial l}{\partial a_j}\frac{\partial a_j}{\partial z_i}=a_i-y_i zil=ajlziaj=aiyi

def softmax(self, z=None, a=None, y=None, driv=False):  # softmax层
	if(driv==True): # 损失函数对z的微分
	    return a-y
	size = z.shape[0]
	log_out = np.exp(z)    # 对输出结果取ln
	log_sum = np.sum(log_out,axis=1).reshape(size,1)   # sum(e^output)
    return np.exp(z)/log_sum

二、代码编写

1、数据形式

  • 训练集 x x x:
    x x x是一个 s i z e ∗ d i m e n s i o n size*dimension sizedimension的二维数组, s i z e size size代表训练集 x x x的数量, d i m e n s i o n dimension dimension代表训练集的维度。
    x = [ x 11 x 12 . . . x 1 d x 21 x 22 . . . x 2 d . . . . . . . . . x s 1 x s 2 . . . x s d ] s i z e ∗ d i m e n s i o n x = \begin{bmatrix} x_{11} & x_{12} & ... & x_{1d} \\ x_{21} & x_{22} & ... & x_{2d} \\ ...&...&&...\\x_{s1}&x_{s2}&...&x_{sd} \end{bmatrix}_{size*dimension} x=x11x21...xs1x12x22...xs2.........x1dx2d...xsdsizedimension ,其中 x i j x_{ij} xij代表第 i i i组数据的第 j j j个输入值。

  • 神经元值 a a a:
    a a a是一个 l a y e r s ∗ s i z e ∗ u n i t s layers*size*units layerssizeunits的三维数组, l a y e r s layers layers代表神经网络层数, s i z e size size代表训练集数量, u n i t s units units代表该层网络有几个神经元。
    a [ i ] = [ a 11 a 12 . . . a 1 l a 21 a 22 . . . a 2 l . . . . . . . . . a s 1 a s 2 . . . a s l ] s i z e ∗ l a y e r s a[i] = \begin{bmatrix} a_{11} & a_{12} & ... & a_{1l} \\ a_{21} & a_{22} & ... & a_{2l} \\ ...&...&&...\\a_{s1}&a_{s2}&...&a_{sl} \end{bmatrix}_{size*layers} a[i]=a11a21...as1a12a22...as2.........a1la2l...aslsizelayers,其中 a [ i ] [ j ] [ k ] a[i][j][k] a[i][j][k]代表第 i i i层网络,第 j j j组数据,第 k k k个神经元的值。例: a [ 1 ] = x a[1]=x a[1]=x,第1层网络就是输入层。

  • 未经过激活函数的神经元 z z z
    数据形式和 a a a相同。

  • 损失函数 l l l z z z的微分 d z dz dz
    数据形式和 a a a相同。
    d z [ i ] = [ d z 11 d z 12 . . . d z 1 l d z 21 d z 22 . . . d z 2 l . . . . . . . . . d z s 1 d z s 2 . . . d z s l ] s i z e ∗ l a y e r s dz[i] = \begin{bmatrix} dz_{11} & dz_{12} & ... & dz_{1l} \\ dz_{21} & dz_{22} & ... & dz_{2l} \\ ...&...&&...\\dz_{s1}&dz_{s2}&...&dz_{sl} \end{bmatrix}_{size*layers} dz[i]=dz11dz21...dzs1dz12dz22...dzs2.........dz1ldz2l...dzslsizelayers

  • 偏差 b b b
    b b b是一个 l a y e r s ∗ u n i t s layers*units layersunits的二维数组, b [ i ] [ j ] b[i][j] b[i][j]代表第 i i i层网络第 j j j个神经元上的偏差。

  • 权重 w w w
    w w w是一个 ( l a y e r s − 1 ) ∗ u n i t s 1 ∗ u n i t s 2 (layers-1)*units1*units2 (layers1)units1units2的三维数组, w [ i ] w[i] w[i]表示第 i i i层和第 i + 1 i+1 i+1层之间的权值, u n i t s 1 units1 units1是第 i i i层的神经元个数, u n i t s 2 units2 units2是第 i + 1 i+1 i+1层的神经元个数。
    w [ i ] = [ w 11 w 12 . . . w 1 u 2 w 21 w 22 . . . w 2 u 2 . . . . . . . . . w u 1 1 w u 1 2 . . . w u 1 u 2 ] u n i t s 1 ∗ u n i t s 2 w[i] = \begin{bmatrix} w_{11} & w_{12} & ... & w_{1u_{2}} \\ w_{21} & w_{22} & ... & w_{2u_{2}} \\ ...&...&&...\\w_{u_{1}1}&w_{u_{1}2}&...&w_{u_1u_2} \end{bmatrix}_{units1*units2} w[i]=w11w21...wu11w12w22...wu12.........w1u2w2u2...wu1u2units1units2 w [ i ] [ j ] [ k ] w[i][j][k] w[i][j][k]代表第 i i i层第 j j j个神经元和第 i + 1 i+1 i+1层第 k k k个神经元之间的权重。

1、前向传播

比较简单,每一层的值等于前一层的值矩阵乘权重加上 b i a s bias bias
z i + 1 = a i ∗ w i + b i z_{i+1}=a_i*w_i+b_i zi+1=aiwi+bi

def ForwardPropagation(x_train, num_layers): # 神经网络的结果
	'''
	@param num_layers:网络层数
	'''
    z = {}      # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
    a = {}      # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
    a[0] = x_train
    for i in range(self.num_layers-2):
        z[i+1] = a[i]@self.w[i]+self.b[i]
        a[i+1] = sigmoid(z[i+1])
    # softmax层
    z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
    a[self.num_layers-1] = softmax(z[self.num_layers-1])
    return z,a

2、反向传播

def BackPropagation(z, a, y_train):   # bp计算Loss对z的偏微分
    dz = {}     # dl/dz [第x个层][第y个数据, 第z个神经元]
    # softmax层
    dz[self.num_layers-1] = self.softmax(a[self.num_layers-1],y_train,driv=True)/y_train.shape[0]      
    # 隐藏层
    for i in range(self.num_layers-2,0,-1):
        dz[i] = dz[i+1]@self.w[i].T*self.sigmoid(z[i],driv=True)  
    return dz  

3、计算 d w dw dw d b db db

def cal_driv(dz,a):   # 计算dw和db
	dw =0
	db = np.mean(dz,axis=0)
	for i in range(dz.shape[0]):
	    dw = dw+a[i].reshape(len(a[i]),1)@dz[i].reshape(1,len(dz[i]))
	dw = dw/a.shape[0]
	return dw,db

4、梯度下降

def gradient_decent(x_train, y_train):
    size = x_train.shape[0]
    z,a = ForwardPropagation(x_train)     # 进行一次前馈运算
    dz = BackPropagation(z,a,y_train)
    # 对 w,b进行梯度下降
    for i in range(self.num_layers-1):
        dw,db = cal_driv(dz[i+1],a[i])
        self.w[i] = self.w[i] - dw*learning_rate
        self.b[i] = self.b[i] - db*learning_rate

5、代码总览

import numpy as np

class BP_network():
    def __init__(self,input_dim, hidden_layer, output_dim):
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.w = {}                     # 利用字典存储 w    
        self.b = {}                     # 利用字典存储 bias
        self.num_layers = len(hidden_layer)+2   # 网络层数

        hidden_layer.insert(0,input_dim)
        hidden_layer.append(output_dim)
        for i in range(len(hidden_layer)-1):
            self.w[i] = np.random.rand(hidden_layer[i],hidden_layer[i+1])
            self.b[i] = np.zeros((1,hidden_layer[i+1]))

    def sigmoid(self, z, driv=False):    # driv 微分
        if(driv == True):   # 微分(输入z,返回dz)
            return self.sigmoid(z)*(1-self.sigmoid(z))
        return 1/(1+np.exp(-z))
    
    def softmax(self, z=None, a=None, y=None, driv=False):  # softmax层
        if(driv==True): # loss函数对z微分
            return a-y
        size = z.shape[0]
        log_out = np.exp(z)    # 对输出结果取ln
        log_sum = np.sum(log_out,axis=1).reshape(size,1)   # sum(e^output)
        return np.exp(z)/log_sum

    def fit(self, x_train, y_train, batch_size, learning_rate=0.01, epochs=20):
        def Loss(output, y_train): # 交叉熵
            return -np.sum(y_train*np.log(output))/output.shape[0]

        def ForwardPropagation(x_train): # 神经网络的结果
            z = {}      # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
            a = {}      # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
            a[0] = x_train
            for i in range(self.num_layers-2):
                z[i+1] = a[i]@self.w[i]+self.b[i]
                a[i+1] = self.sigmoid(z[i+1])
            # softmax层
            z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
            a[self.num_layers-1] = self.softmax(z[self.num_layers-1])
            return z,a

        def BackPropagation(z, a, y_train):   # bp计算Loss对z的偏微分
            dz = {}     # dl/dz [第x个层][第y个数据, 第z个神经元]
            # softmax层
            dz[self.num_layers-1] = self.softmax(a=a[self.num_layers-1],y=y_train,driv=True)/y_train.shape[0]      
            # 隐藏层
            for i in range(self.num_layers-2,0,-1):
                dz[i] = dz[i+1]@self.w[i].T*self.sigmoid(z[i],driv=True)  
            return dz  

        def cal_driv(dz,a):   # 计算dw和db
            dw = 0
            db = np.mean(dz,axis=0)
            for i in range(dz.shape[0]):
                dw = dw+a[i].reshape(len(a[i]),1)@dz[i].reshape(1,len(dz[i]))
            dw = dw/a.shape[0]
            return dw,db

        def gradient_decent(x_train, y_train):
            size = x_train.shape[0]
            z,a = ForwardPropagation(x_train)     # 进行一次前馈运算
            dz = BackPropagation(z,a,y_train)
            # 对 w,b进行梯度下降
            for i in range(self.num_layers-1):
                dw,db = cal_driv(dz[i+1],a[i])
                self.w[i] = self.w[i] - dw*learning_rate
                self.b[i] = self.b[i] - db*learning_rate

        size = x_train.shape[0]  # 数据集数量
        for i in range(epochs):
			for j in range(0,size,batch_size):
  			    gradient_decent(x_train[j:j+batch_size],y_train[j:j+batch_size])

            z,a = ForwardPropagation(x_train)
            loss = Loss(a[self.num_layers-1], y_train)   # 计算loss
            accuracy = self.accuracy(self.predict(x_train),y_train)
            print('第%d个epochs: Loss=%f,accuracy:%f' %(i+1,loss,accuracy))

    def predict(self, x_train):
        z = {}      # 未经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
        a = {}      # 经过激活函数的神经元 [第x个层][第y个数据, 第z个神经元]
        a[0] = x_train
        for i in range(self.num_layers-2):
            z[i+1] = a[i]@self.w[i]+self.b[i]
            a[i+1] = self.sigmoid(z[i+1])
        # softmax层
        z[self.num_layers-1] = a[self.num_layers-2]@self.w[self.num_layers-2]+self.b[self.num_layers-2]
        a[self.num_layers-1] = self.softmax(z[self.num_layers-1])
        y = np.zeros(a[self.num_layers-1].shape)
        y[a[self.num_layers-1]-a[self.num_layers-1].max(axis=1).reshape(a[self.num_layers-1].shape[0],1)>=0]=1
        return y
        
    def accuracy(self,y,y_true):
        return np.sum(np.where(y==1)[1]==np.where(y_true==1)[1])/y.shape[0]

6、代码测试

准备数据集(简单的三分类数据集):

x_train = np.random.randint(0,100,size=(100,2))
temp = (2*x_train[:,0]+x_train[:,1])
y_train = np.zeros((100,3))
y_train[temp>150,0] = 1
y_train[temp<50,1] = 1
y_train[((temp<=150) & (temp>=50)),2] = 1
x_train = (x_train-x_train.min(axis=0))/(x_train.max(axis=0)-x_train.min(axis=0))

训练网络

bp = BP_network(input_dim=2, hidden_layer=[5,5], output_dim=3)
bp.fit(x_train,y_train,batch_size=10,learning_rate=0.1,epochs=2000)

结果

第1个epochs: Loss=1.053458,accuracy:0.390000
第2个epochs: Loss=1.005622,accuracy:0.390000
第3个epochs: Loss=0.968565,accuracy:0.390000
第4个epochs: Loss=0.939970,accuracy:0.390000
第5个epochs: Loss=0.917880,accuracy:0.500000
第6个epochs: Loss=0.900729,accuracy:0.570000
第7个epochs: Loss=0.887299,accuracy:0.570000
第8个epochs: Loss=0.876668,accuracy:0.570000
第9个epochs: Loss=0.868147,accuracy:0.570000
第10个epochs: Loss=0.861226,accuracy:0.570000
...
第1997个epochs: Loss=0.561577,accuracy:0.830000
第1998个epochs: Loss=0.561012,accuracy:0.830000
第1999个epochs: Loss=0.560447,accuracy:0.830000
第2000个epochs: Loss=0.559881,accuracy:0.840000

代码正常运行

你可能感兴趣的:(机器学习,神经网络,深度学习,机器学习)