【DL】CNN的前向传播和反向传播(python手动实现)

卷积层的前向传播和反向传播

说明

本文中,只实现一层卷积层的正反向传播(带激活函数Relu),实现的是多通道输入,多通道输出的。之前对深度学习的理解大多止于pytorch里面现成的API,这还是第一次摆脱import torch,若有错误还烦请帮忙指正~
为了更加方便地表示张量,我们这里调用numpy包。

前向传播

原理

a l = σ ( z l ) = σ ( a l − 1 ∗ W l + b l ) a^{l}=\sigma\left(z^{l}\right)=\sigma\left(a^{l-1} * W^{l}+b^{l}\right) al=σ(zl)=σ(al1Wl+bl)

其中, σ \sigma σ是激活函数,在本文中使用Relu; a l − 1 a^{l-1} al1是该卷积层的输入, a l a^{l} al是经过激活函数后的输出, z l z^{l} zl是未经过激活函数的输出, W l W^{l} Wl是卷积核, b l b^{l} bl是该层的偏置。

代码

激活函数Relu的前向传播,在输入大于等于零时保持原值,小于零时输出置零。

def Rulu(z):
	z[z<0]=0
	return z

卷积层的前向传播的代码如下所示(带激活函数Relu)。
X , W , b X,W,b XWb分别与 a l − 1 , W l , b l a^{l-1}, W^{l}, b^{l} al1Wlbl 相对应;输出 O u t Out Out即对应 a l a^{l} al

def conv_forward(X, W, b, stride=(1,1), padding=(0,0)):
    # number of sample, number of channel, height, widtd  (input X)
    m,c,Ih,Iw=X.shape
    # dimension of filter, number of channel, height, width (kernel W)
    f,_,Kw,Kh=W.shape
    #size of stride and padding
    Sw,Sh=stride
    Pw,Ph=padding
    # calculate the width and height of the output
    Oh = int( 1 + (Ih + 2 * Ph - Kh) / Sh )
    Ow = int( 1 + (Iw + 2 * Pw - Kw) / Sw )
    # pre-allocate output Out
    # number of sample, number of channel, height, width  (output Out)
    Out=np.zeros([m, f, Oh, Ow]) 
    X_pad = np.zeros((m, c, Ih +2 * Ph, Iw +2 * Pw))
    X_pad[:,:,Ph:Ph+Ih,Pw:Pw+Iw]= X
    
    # multi in (c channels), multi out (f channels)
    # dimension of filter, also the number of output channel
    for n in range(Out.shape[1]):
        # consider the multi-in-single-out situation
        for i in range(Out.shape[2]):
            for j in range(Out.shape[3]):
                # the m samples are dealt in parallel
                # (m,c,Ih,Iw) * (c,Kh,Kw) = (m,c,Oh,Ow) 
                # sum -> m*Oh*Ow
                Out[:,n,i,j] = np.sum(X_pad[:, :, i*Sh : i*Sh+Kh, j*Sw : j*Sw+Kw] * W[n, :, :, :], axis=(1, 2, 3))
        #bias added in each dimension of filter
        Out[:,n,:,:]+=b[n]
        #Relu_forward
        relu(Out)                
    return Out

反向传播

原理

我们要通过反向传播求出损失函数 J ( W , b ) J(W,b) J(W,b) z l , W l , b l z^{l}, W^{l}, b^{l} zl,Wl,bl三者的偏导数。其中记
δ l = ∂ J ( W , b ) ∂ z l \delta^{l}=\frac{\partial J(W, b)}{\partial z^{l}} δl=zlJ(W,b)

δ l − 1 = δ l ∗ rot ⁡ 180 ( W l ) ⊙ σ ′ ( z l − 1 ) \delta^{l-1}=\delta^{l} * \operatorname{rot} 180\left(W^{l}\right) \odot \sigma^{\prime}\left(z^{l-1}\right) δl1=δlrot180(Wl)σ(zl1)
其中, ⊙ \odot 代表Hadamard积,对于两个维度相同的向量: A = ( a 1 , a 2 , . . . , a n ) T A=(a_{1},a_{2},...,a_{n})^{T} A=(a1,a2,...,an)T B = ( b 1 , b 2 , . . . , b n ) T B=(b_{1},b_{2},...,b_{n})^{T} B=(b1,b2,...,bn)T,则 A ⊙ B = ( a 1 b 1 , a 2 b 2 , . . . , a n b n ) T A⊙B=(a_{1}b_{1},a_{2}b_{2},...,a_{n}b_{n})^{T} AB=(a1b1,a2b2,...,anbn)T。而 σ ′ ( z l − 1 ) \sigma^{\prime}(z^{l-1}) σ(zl1)为Relu函数的导数,即当函数的自变量小于零时,导数为零,否则为1。至于为何要旋转180度,链接1的文章中有解释。
∂ J ( W , b ) ∂ W l = a l − 1 ∗ δ l \frac{\partial J(W, b)}{\partial W^{l}}=a^{l-1} * \delta^{l} WlJ(W,b)=al1δl

∂ J ( W , b ) ∂ b l = ∑ u , v ( δ l ) u , v \frac{\partial J(W, b)}{\partial b^{l}}=\sum_{u, v}\left(\delta^{l}\right)_{u, v} blJ(W,b)=u,v(δl)u,v

代码

损失函数 J ( W , b ) J(W,b) J(W,b) z l , a l − 1 , W l , b l , z l − 1 z^{l}, a^{l-1},W^{l}, b^{l},z^{l-1} zl,al1,Wl,bl,zl1三者的偏导数分别记为dz,dx,dw,db,dz0

def conv_backward(dz, X, W, b, stride=(1,1), padding=(0,0)):  
    """
    dz: Gradient with respect to z
    dz0: Gradient with respect to z of the former convolutional layer
    dx: Gradient with respect to x
    dw: Gradient with respect to w
    db: Gradient with respect to b
    """
    m, f, _, _ = dz.shape
    m, c, Ih, Iw = X.shape
    _,_,Kh,Kw = W.shape
    Sw,Sh=stride
    Pw,Ph=padding
  
    dx, dw, db = np.zeros_like(X), np.zeros_like(W), np.zeros_like(b)
    X_pad = np.pad(X, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    dx_pad = np.pad(dx, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    db = np.sum(dz, axis=(0,2,3))
 
    for k in range(dz.shape[0]):
        for i in range(dz.shape[2]):
            for j in range(dz.shape[3]):
                X_w = X_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw]
 
            for n in range(f):
                #f,c,Kw,Kh
                dw[n] += X_w* dz[k, n, i, j] 
                dx_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw] += np.flip(W[n],axis=(1,2)) * dz[k, n, i, j]
 
    dx = dx_pad[:, :, Ph:Ph+Ih, Pw:Pw+Iw]
    dz0 = dx
    dz0[dz0<0]=0 #Relu
 
    return dx, dw, db, dz0

完整代码

import numpy as np

def relu(z):
    z[z<0]=0
    return z;

def conv_forward(X, W, b, stride=(1,1), padding=(0,0)):
    # number of sample, number of channel, height, widtd  (input X)
    m,c,Ih,Iw=X.shape
    # dimension of filter, number of channel, height, width (kernel W)
    f,_,Kw,Kh=W.shape
    #size of stride and padding
    Sw,Sh=stride
    Pw,Ph=padding
    # calculate the width and height of the output
    Oh = int( 1 + (Ih + 2 * Ph - Kh) / Sh )
    Ow = int( 1 + (Iw + 2 * Pw - Kw) / Sw )
    # pre-allocate output O
    # number of sample, number of channel, height, width  (output O)
    Out=np.zeros([m, f, Oh, Ow]) 
    X_pad = np.zeros((m, c, Ih +2 * Ph, Iw +2 * Pw))
    X_pad[:,:,Ph:Ph+Ih,Pw:Pw+Iw]= X
    
    # multi in (c channels), multi out (f channels)
    # dimension of filter, also the number of output channel
    for n in range(Out.shape[1]):
        # consider the multi-in-single-out situation
        for i in range(Out.shape[2]):
            for j in range(Out.shape[3]):
                # the m samples are dealt in parallel
                # (m,c,Ih,Iw) * (c,Kh,Kw) = (m,c,Oh,Ow) 
                # sum -> m*Oh*Ow
                Out[:,n,i,j] = np.sum(X_pad[:, :, i*Sh : i*Sh+Kh, j*Sw : j*Sw+Kw] * W[n, :, :, :], axis=(1, 2, 3))
        #bias added in each dimension of filter
        Out[:,n,:,:]+=b[n]
        #Relu_forward
        relu(Out)                
    return Out

def conv_backward(dz, X, W, b, stride=(1,1), padding=(0,0)):  
    """
    dz: Gradient with respect to z
    dz0: Gradient with respect to z of the former convolutional layer
    dx: Gradient with respect to x
    dw: Gradient with respect to w
    db: Gradient with respect to b
    """
    m, f, _, _ = dz.shape
    m, c, Ih, Iw = X.shape
    _,_,Kh,Kw = W.shape
    Sw,Sh=stride
    Pw,Ph=padding
  
    dx, dw, db = np.zeros_like(X), np.zeros_like(W), np.zeros_like(b)
    X_pad = np.pad(X, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    dx_pad = np.pad(dx, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    db = np.sum(dz, axis=(0,2,3))
 
    for k in range(dz.shape[0]):
        for i in range(dz.shape[2]):
            for j in range(dz.shape[3]):
                x_window = X_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw]
 
            for n in range(f):
                #f,c,Kw,Kh
                dw[n] += x_window * dz[k, n, i, j] 
                dx_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw] += np.flip(W[n],axis=(1,2)) * dz[k, n, i, j]
 
    dx = dx_pad[:, :, Ph:Ph+Ih, Pw:Pw+Iw]
    dz0 = dx
    dz0[dz0<0]=0 #Relu
 
    return dx, dw, db, dz0

参考链接

  1. https://www.cnblogs.com/pinard/p/6494810.html
  2. https://blog.csdn.net/qq_38585359/article/details/102658211?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ETopBlog-1.topblog&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ETopBlog-1.topblog&utm_relevant_index=2

你可能感兴趣的:(深度学习与计算机视觉,深度学习,卷积神经网络)