深度学习笔记——CNN中的数学推导

参考:
矩阵求导术
刘建平pinard

前向传播

一般输入的张量维度的格式有两种
一种是channels_last : (batch_sizes, height, width, channels)
一种是channels_first : (batch_sizes, channels, height, width)

卷积层

  记 s 为 卷 积 步 长 s t r i d e \ 记 s为卷积步长stride  sstride   p 为 p a d d i n g 填 充 的 大 小 \ p为padding填充的大小  ppadding,前一层长或宽为   n \ n  n,滤波器在对应维度上的长度为   f \ f  f,滤波器的数量为   n c \ n_c  nc,则
这里讲的卷积,实际上叫相关系数(cross-correlations),与数学上的卷积有区别(需要旋转卷积核),我们在卷积层中的卷积并没有作旋转处理,是因为我们把旋转的步骤通过网络的训练来得到相同的效果,忽略旋转处理可以提高效率

p取值

填充类型
vaild : no padding
same :   p = f − 1 2 \ p=\frac{f-1}{2}  p=2f1

卷积后张量的尺寸

长或宽的维度变为:
  n n e w = ⌊ n + 2 p − f s + 1 ⌋ \ n_{new} = \lfloor\frac{n+2p-f}{s} + 1\rfloor  nnew=sn+2pf+1
通道数变为:
  n c \ n_c  nc
即卷积前,张量尺寸为
  ( m , n h p r e v , n w p r e v , n c p r e v ) \ (m, n_{h_{prev}}, n_{w_{prev}}, n_{c_{prev}})  (m,nhprev,nwprev,ncprev)
卷积后
  ( m , n h n e w , n w n e w , n c ) \ (m, n_{h_{new}}, n_{w_{new}}, n_c)  (m,nhnew,nwnew,nc)

单层结构

深度学习笔记——CNN中的数学推导_第1张图片
  Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] \ Z^{[l]} = W^{[l]}A^{[l-1]}+b^{[l]}  Z[l]=W[l]A[l1]+b[l]
  A [ l ] = σ ( Z [ l − 1 ] ) \ A^{[l]} = \sigma(Z^{[l-1]})  A[l]=σ(Z[l1])
其中,偏移项   b [ l ] \ b^{[l]}  b[l]的尺寸为   ( 1 , 1 , 1 , n c ) \ (1, 1, 1, n_c)  (1,1,1,nc)

池化层

一般使用max pooling或averange pooling
池化后张量的尺寸中长宽的计算方式跟上面相同,由于池化核是影响单个通道,所以池化后不会影响输出张量的通道数,也就是说,卷积层的操作是增加通道数(提取更多特征),而池化层的操作是缩减张量的尺寸

全连接层(Dense层)

池化层到全连接层的转化

在进行最后一层池化后,要将   m 个 尺 寸 为 ( n h , n w , n c ) 的 张 量 展 开 成 一 维 的 向 量 \ m个尺寸为(n_h,n_w, n_c)的张量展开成一维的向量  m(nh,nw,nc),最终得到尺寸为   ( m , n h ∗ n w ∗ n c ) \ (m, n_h*n_w*n_c)  (m,nhnwnc)的矩阵

全连接层间传播

  Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] \ Z^{[l]} = W^{[l]}A^{[l-1]}+b^{[l]}  Z[l]=W[l]A[l1]+b[l]
  A [ l ] = σ ( Z [ l − 1 ] ) \ A^{[l]} = \sigma(Z^{[l-1]})  A[l]=σ(Z[l1])
层间张量维度:
  A [ l − 1 ] : ( m , p ) \ A^{[l-1]}:(m, p)  A[l1]:(m,p)
  Z [ l ] : ( m , q ) \ Z^{[l]}:(m,q)  Z[l]:(m,q)
  W [ l ] : ( p , q ) \ W^{[l]}:(p,q)  W[l]:(p,q)
  b [ l ] : ( 1 , q ) \ b^{[l]}:(1,q)  b[l]:(1,q)
值得一提的是,在吴恩达的机器学习教程中,偏差项的添加不是添加b,而是在张量中增加一维全1向量,即
  A b i a s [ l ] = [ 1 T ; A [ l ] ] \ A^{[l]}_{bias}=[\textbf1^T;A^{[l]}]  Abias[l]=[1T;A[l]]
激活函数的选择:
一般在隐层中选择ReLU或sigmond,在输出层中选择sigmond(二分类)或softmax(多分类)

反向传播

需要用到的矩阵求导法则

  1. d ( X ± Y ) = d X ± d Y ; d ( X Y ) = d ( X ) Y + X d Y ; d ( X T ) = ( d X ) T ; d t r ( X ) = t r ( d X ) \ 1.d(X\pm Y)=dX \pm dY; \quad d(XY)=d(X)Y+XdY; \quad d(X^T)=(dX)^T; \quad dtr(X)=tr(dX)  1.d(X±Y)=dX±dY;d(XY)=d(X)Y+XdY;d(XT)=(dX)T;dtr(X)=tr(dX)
  2. d X − 1 = − X − 1 d X X − 1 \ 2.dX^{-1}=-X^{-1}dXX^{-1}  2.dX1=X1dXX1
  3. d ∣ X ∣ = t r ( X ∗ d X ) , X ∗ 为 X 的 伴 随 矩 阵 , 当 X 可 逆 时 , d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) \ 3.d|X|=tr(X^*dX),X^*为X的伴随矩阵,当X可逆时,d|X|=|X|tr(X^{-1}dX)  3.dX=tr(XdX),XXXdX=Xtr(X1dX)
  4. d ( X ⨀ Y ) = d X ⨀ Y + X ⨀ d Y \ 4.d(X \bigodot Y)=dX \bigodot Y+X \bigodot dY  4.d(XY)=dXY+XdY
  5. d σ ( X ) = σ ′ ( X ) ⨀ d X , σ ( X ) = [ σ ( X i j ) ] \ 5.d \sigma(X)=\sigma'(X) \bigodot dX, \quad \sigma(X)=[\sigma(X_{ij})]  5.dσ(X)=σ(X)dX,σ(X)=[σ(Xij)]

迹技巧:
  1. 标 量 套 迹 : a = t r ( a ) \ 1.标量套迹: a=tr(a)  1.a=tr(a)
  2. t r ( A T ) = t r ( A ) \ 2.tr(A^T)=tr(A)  2.tr(AT)=tr(A)
  3. t r ( A ± B ) = t r ( A ) ± t r ( B ) \ 3.tr(A \pm B) = tr(A) \pm tr(B)  3.tr(A±B)=tr(A)±tr(B)
  4. t r ( A B ) = t r ( B A ) , 其 中 A 与 B T 尺 寸 相 同 \ 4.tr(AB) = tr(BA), 其中A与B^T尺寸相同  4.tr(AB)=tr(BA),ABT
  5. t r ( A T ( B ⨀ C ) ) = t r ( ( A ⨀ B ) T C ) , 其 中 A , B , C 尺 寸 相 同 \ 5.tr(A^T(B \bigodot C))=tr((A \bigodot B)^TC),其中A,B,C尺寸相同  5.tr(AT(BC))=tr((AB)TC),ABC

在开始求导之前,我们首先定义各层的误差为损失函数对该层未激活单元的偏导数,即   δ l = ∂ L ∂ z l \ \delta^l=\frac{\partial{L}}{\partial{z^l}}  δl=zlL

全连接层

要进行反向传播对权重系数进行求导,首先要定义代价函数
由于输出层是softmax,   σ ( z ) = e x p ( z ) 1 T e x p ( z ) \ \sigma(z)= \frac{exp(\mathbf{z})}{\mathbf{1^T}exp(\mathbf{z})}  σ(z)=1Texp(z)exp(z)
定义loss为:
  L = − y T l o g σ ( z ) \ L=-\mathbf{y}^Tlog\sigma(\mathbf{z})  L=yTlogσ(z)
我们先求出输出层的误差:
展开损失函数:
  L = − y T ( l o g ( e x p ( z ) ) − 1 l o g ( 1 T e x p ( z ) ) ) = − y T z + l o g ( 1 T e x p ( z ) ) \ L=-\mathbf{y}^T(log(exp(\mathbf{z}))-\mathbf{1}log(\mathbf{1}^Texp(\mathbf{z})))=-\mathbf{y}^T\mathbf{z}+log(\mathbf{1}^Texp(\mathbf{z}))  L=yT(log(exp(z))1log(1Texp(z)))=yTz+log(1Texp(z))
两边求微分:
  d L = − y T d z + 1 T ( e x p ( z ) ⨀ ( d z ) ) 1 T e x p ( z ) = − y T d z + e x p ( z ) T d z 1 T e x p ( z ) \ dL=-\mathbf{y}^Td\mathbf{z}+\frac{\mathbf{1}^T(exp(\mathbf{z})\bigodot (d\mathbf{z}))}{\mathbf{1}^Texp(\mathbf{z})}=-\mathbf{y}^Td\mathbf{z}+\frac{exp(\mathbf{z})^Td\mathbf{z}}{\mathbf{1}^Texp(\mathbf{z})}  dL=yTdz+1Texp(z)1T(exp(z)(dz))=yTdz+1Texp(z)exp(z)Tdz
两边套迹:
  d L = t r ( − y T d z + e x p ( z ) T d z 1 T e x p ( z ) ) = t r ( ( σ ( z ) T − y T ) d z ) \ dL=tr(-\mathbf{y}^Td\mathbf{z}+\frac{exp(\mathbf{z})^Td\mathbf{z}}{\mathbf{1}^Texp(\mathbf{z})})=tr((\sigma(\mathbf{z})^T-\mathbf{y}^T)d\mathbf{z})  dL=tr(yTdz+1Texp(z)exp(z)Tdz)=tr((σ(z)TyT)dz)
得:
  ∂ L ∂ z = ( σ ( z ) T − y T ) T = σ ( z ) − y \ \frac{\partial{L}}{\partial{\mathbf{z}}}=(\sigma(\mathbf{z})^T-\mathbf{y}^T)^T=\sigma(\mathbf{z})-\mathbf{y}  zL=(σ(z)TyT)T=σ(z)y
用链式法则求出前面层的误差:
  ∂ L ∂ z l = ( ∂ L T ∂ z l ) T = ( ∂ L T ∂ z L ∂ z L ∂ z L − 1 . . . ∂ z l + 1 ∂ z l ) T = ( ∂ z L ∂ z L − 1 . . . ∂ z l + 1 ∂ z l ) T ∂ L ∂ z L \ \frac{\partial{L}}{\partial{\mathbf{z}^l}}=(\frac{\partial{L}^T}{\partial{\mathbf{z}^l}})^T=(\frac{\partial{L}^T}{\partial{\mathbf{z}^L}}\frac{\partial{\mathbf{z}^L}}{\partial{\mathbf{z}^{L-1}}}...\frac{\partial{\mathbf{z}^{l+1}}}{\partial{\mathbf{z}^l}})^T=(\frac{\partial{\mathbf{z}^L}}{\partial{\mathbf{z}^{L-1}}}...\frac{\partial{\mathbf{z}^{l+1}}}{\partial{\mathbf{z}^l}})^T\frac{\partial{L}}{\partial{\mathbf{z}^L}}  zlL=(zlLT)T=(zLLTzL1zL...zlzl+1)T=(zL1zL...zlzl+1)TzLL

  δ l = ( ∂ z l + 1 ∂ z l ) T δ l + 1 = ( W l + 1 ) T δ l + 1 ⨀ σ ′ ( z l ) \ \delta^l=(\frac{\partial{z}^{l+1}}{\partial{z}^l})^T\delta^{l+1}=(W^{l+1})^T\delta^{l+1}\bigodot\sigma'(z^l)  δl=(zlzl+1)Tδl+1=(Wl+1)Tδl+1σ(zl)
因为
  z l = W l a l − 1 + b l \ z^l=W^la^{l-1}+b^l  zl=Wlal1+bl
可得代价函数对权重矩阵以及偏差项的偏导数为:
  ∂ L ∂ W l = ∂ L ∂ z l ∂ z l ∂ W l = δ l ( a l − 1 ) T \ \frac{\partial{L}}{\partial{W^l}}=\frac{\partial{L}}{\partial{z^l}}\frac{\partial{z^l}}{\partial{W^l}}=\delta^l(a^{l-1})^T  WlL=zlLWlzl=δl(al1)T
  ∂ L ∂ b l = ∂ L ∂ z l ∂ z l ∂ b l = δ l \ \frac{\partial{L}}{\partial{b^l}}=\frac{\partial{L}}{\partial{z^l}}\frac{\partial{z^l}}{\partial{b^l}}=\delta^l  blL=zlLblzl=δl
由此便可以梯度下降优化系数

池化层

池化层的   δ l \ \delta^l  δl由全连接层第一层的误差张量进行reshape操作或者由卷积层反向传播获得,在获得池化层的   δ l \ \delta^l  δl后,我们要求出池化层前一层的误差   δ l − 1 \ \delta^{l-1}  δl1
由于池化会压缩输入到池化层的张量,所以我们先要将池化层的张量恢复成之前的形状,然后将下采样获得的值映射回该值在输入张量的位置。
深度学习笔记——CNN中的数学推导_第2张图片
我们记转换后的矩阵为   u n s a m p l e ( δ l ) \ unsample(\delta^l)  unsample(δl)(这里只对某一通道作解释)
可知
  ∂ L ∂ a l − 1 = u n s a m p l e ( δ l ) \ \frac{\partial{L}}{\partial{a^{l-1}}}=unsample(\delta^l)  al1L=unsample(δl)
则可以求得
  δ l − 1 = ( ∂ a l − 1 ∂ z l − 1 ) T ∂ L ∂ a l − 1 = u n s a m p l e ( δ l ) ⨀ σ ′ ( z l − 1 ) \ \delta^{l-1}=(\frac{\partial{a^{l-1}}}{\partial{z^{l-1}}})^T \frac{\partial{L}}{\partial{a^{l-1}}}=unsample(\delta^l)\bigodot \sigma'(z^{l-1})  δl1=(zl1al1)Tal1L=unsample(δl)σ(zl1)

卷积层

获得卷积层的   δ l \ \delta^l  δl后,求出前一层的   δ l − 1 \ \delta^{l-1}  δl1
参考全连接层的反向传播
  δ l = ( ∂ z l + 1 ∂ z l ) T δ l + 1 = ( W l + 1 ) T δ l + 1 ⨀ σ ′ ( z l ) \ \delta^l=(\frac{\partial{z}^{l+1}}{\partial{z}^l})^T\delta^{l+1}=(W^{l+1})^T\delta^{l+1}\bigodot\sigma'(z^l)  δl=(zlzl+1)Tδl+1=(Wl+1)Tδl+1σ(zl)
卷积层的反向传播表示为
  δ l − 1 = δ l ∗ r o t 180 ( W l ) ⨀ σ ′ ( z l − 1 ) \ \delta^{l-1}=\delta^l*rot180(W^l)\bigodot\sigma'(z^{l-1})  δl1=δlrot180(Wl)σ(zl1)
为什么这里的表示有所不同,其实我们可以理解为误差的反向传播是后一层误差通过卷积核(权重矩阵)反向映射回前一层,所以我们要找出这个反向映射的关系,在全连接层中,由于前后层之间的映射就是矩阵相乘,所以可以直接通过原矩阵映射回前一层,但是在卷积层中,由于卷积层对后一层的映射由卷积核滑动完成,所以要找出映射关系,就要列出   z l 、 w l 、 a l − 1 \ z^l、w^l、a^{l-1}  zlwlal1各元素之间的关系,最后可以获得以上的表示
深度学习笔记——CNN中的数学推导_第3张图片
深度学习笔记——CNN中的数学推导_第4张图片
深度学习笔记——CNN中的数学推导_第5张图片
求出卷积层和前一层的误差后,就可以获得卷积核的梯度
  ∂ L ∂ W l = a l − 1 ∗ δ l \ \frac{\partial{L}}{\partial{W^l}}=a^{l-1}*\delta^l  WlL=al1δl
  ∂ L ∂ b l = ∑ u , v ( δ l ) u , v \ \frac{\partial{L}}{\partial{b^l}}=\sum_{u,v}(\delta^l)_{u,v}  blL=u,v(δl)u,v
这里误差项的梯度是误差张量的每个通道的矩阵的所有元素求和
对于卷积核梯度的求法,其实也是找前一层激活单元与后一层误差的映射关系
深度学习笔记——CNN中的数学推导_第6张图片
由此可以得到
深度学习笔记——CNN中的数学推导_第7张图片

手写代码实现CNN

为了加深对CNN内部结构的理解,决定手写一次代码,在网络的参数传递中,全连接层是比较简单的,但是卷积层和池化层部分参数的传递比较复杂,为此画了一下参数传递的关系
深度学习笔记——CNN中的数学推导_第8张图片

python代码

import numpy as np
import scipy.io as sio

class CNN():
    """
    一个CNN训练的类

    alpha -- 学习率
    lamda -- 正则化参数
    X -- 训练集图片 shape(m(24*100), 1, 20, 20)
    y -- 训练集标签 shape(24*100, 24)
    maxItera -- 最大迭代次数
        
    WSet -- 卷积核以及全连接层的权重 以字典键值对进行索引 
    WSet = {'convW_1':convW_1, 'convW_2':convW_2, 'FCW_1':FCW_1, 'FCW_2':FCW_2, 'FCW_3':FCW_3}

    Bias -- 卷积层输出以及全连接层输出每一项的偏差
    Bias = {'convB_1':convB_1, 'convB_2':convB_2, 'FCB_1':FCB_1, 'FCB_2':FCB_2, 'FCB_3':FCB_3}

    """

    def __init__(self, alpha, lamda, maxItera):
        self.alpha = alpha
        self.lamda = lamda
        self.maxItera = maxItera
        self.X_train = self.readData()[0]
        self.y_train = self.readData()[1]
        self.X_test = self.readData()[2]
        self.y_test = self.readData()[3]
        self.m = np.shape(self.X_train)[0]
        self.WSet = self.initializeWB()[0]
        self.Bias = self.initializeWB()[1]
        
    def readData(self):
        """
        读入训练数据

        Arguments:
        None

        Returns:
        X_train -- 一组数量为m的图片组  shape(m, 1, ?, ?)
        y_train -- 训练集标签 shape(m, 1)
        X_test
        y_test
        """
        data = sio.loadmat('MNIST_sort')
        label_t10k = data['label_t10k']
        label_train = data['label_train']
        img_t10k = data['img_t10k'] / 255
        img_train = data['img_train'] / 255

        idx_train = [0]
        for i in range(9):
            idx = idx_train[i] + np.sum(label_train == i)
            idx_train.append(idx)

        idx_test = [0]
        for i in range(9):
            idx = idx_test[i] + np.sum(label_t10k == i)
            idx_test.append(idx)

        X_train = np.zeros((400, 1, 28, 28))
        X_test = np.zeros((100, 1, 28, 28))
        
        for i in range(10):
            for j in range(40):
                x = img_train[idx_train[i]+j, :]
                x = x.reshape(1, 28, 28)
                X_train[(i+1)*j] = x

        for i in range(10):
            for j in range(10):
                x = img_t10k[idx_test[i]+j, :]
                x = x.reshape(1, 28, 28)
                X_test[(i+1)*j] = x  

        y_train = np.empty((0, 1))
        y_test = np.empty((0, 1))

        for i in range(10):
            y_train_slice = np.ones((40, 1)) * i
            y_test_slice = np.ones((10, 1)) * i
            y_train = np.r_[y_train, y_train_slice]
            y_test = np.r_[y_test, y_test_slice]             

        return X_train, y_train, X_test, y_test

    def zeroPad(self, X, pad):
        """
        扩展矩阵用于same卷积

        Argument:
        X -- ndarray of shape(m, n_C, n_H, n_W) 一批数量为m的图片
        pad -- 图片要扩充的尺寸

        Returns:
        X_pad -- 扩展后的矩阵 shape(m, n_C, n_H + 2*pad, n_W + 2*pad)
        """
        X_pad = np.pad(X, ((0, 0), (0, 0), (pad, pad), (pad, pad)))

        return X_pad

    def singleConv(self, a_slice_prev, W, b):
        """
        卷积中其中一部分切片的求和

        Argument:
        a_slice_prev --  要卷积数据的切片 shape(n_C_prev, f, f)
        W -- 卷积核 shape(n_C_prev, f, f)

        Returns:
        Z -- 局部求和后的结果
        """
        s = a_slice_prev * W + b
        Z = np.sum(s)

        return Z

    def forwardConv(self, A_prev, W, b, hPara):
        """
        前向传播 卷积层

        Argument:
        A_prev -- 前一层的输出 shape(m, n_C_prev, n_H_prev, n_W_prev)
        W -- 卷积核组     shape(n_C, n_C_prev, f, f)
        b -- bias项 shape(n_C, 1, 1, 1)
        hPara -- 超参数 字典类型 卷积步长与填充量 {'stride': 'pad':}

        Returns:
        Z -- 卷积后的输出 shape(m, n_C, n_H, n_W)
        cache -- 缓存数据 用于反向传播
        """
        (m, n_C_prev, n_H_prev, n_W_prev) = A_prev.shape
        (n_C, n_C_prev, f, f) = W.shape
        stride = hPara['stride']
        pad = hPara['pad']

        # 计算卷积后输出的尺寸
        n_H = int((n_H_prev - f + 2 * pad) / stride + 1)
        n_W = int((n_W_prev - f + 2 * pad) / stride + 1)

        # 初始化卷积层的输出
        Z = np.zeros((m, n_C, n_H, n_W))

        # 填充前一层的输入
        A_prev_pad = self.zeroPad(A_prev, pad)

        # 计算卷积
        for i in range(m):
            a_prev_pad = A_prev_pad[i]
            for c in range(n_C):
                for h in range(n_H):
                    for w in range(n_W):
                        # 获得切片起止范围
                        h_start = h * stride
                        h_end = h_start + f
                        w_start = w * stride
                        w_end = w_start + f
                        
                        a_slice_prev = a_prev_pad[:, h_start:h_end, w_start:w_end]

                        Z[i, c, h, w] = self.singleConv(a_slice_prev, W[c, ...], b[c, ...])

        cache = (A_prev, W, b, hPara)

        return Z, cache

    def forwardPool(self, A_prev, hPara, mode = 'max'):
        """
        前向传播 池化层

        Arguments:
        A_prev -- 前一层的输出 shape(m, n_C_prev, n_H_prev, n_W_prev)
        hPara -- 超参数 字典形式 池化核的大小核卷积步长 {'f': , 'stride'}
        mode -- 池化模式:最大值池化'max' 均值池化'mean' 默认为'max'

        Returns:
        A -- 池化后的输出 shape(m, n_C, n_H, n_W)
        cache -- 缓存数据 用于反向传播
        """
        (m, n_C_prev, n_H_prev, n_W_prev) = A_prev.shape
        f = hPara['f']
        stride = hPara['stride']

        # 计算池化后的输出的尺寸
        n_H = int((n_H_prev - f) / stride + 1)
        n_W = int((n_W_prev - f) / stride + 1)   
        n_C = n_C_prev

        # 初始化输出
        A = np.zeros((m, n_C, n_H, n_W))

        for i in range(m):
            for c in range(n_C):
                for h in range(n_H):
                    for w in range(n_W):
                        h_start = h * stride
                        h_end = h_start + f
                        w_start = w * stride
                        w_end = w_start + f

                        a_slice_prev = A_prev[i, c, h_start:h_end, w_start:w_end]

                        if mode == 'max':
                            A[i, c, h, w] = np.max(a_slice_prev)
                        elif mode == 'mean':
                            A[i, c, h, w] = np.mean(a_slice_prev)

        cache = (A_prev, hPara)

        return A, cache

    def backwardConv(self, dZ, cache):
        """
        卷积层的误差反向传播

        Arguments:
        dZ -- 卷积层输出Z的梯度 shape(m, n_C, n_H, n_W)
        cache -- forwardConv()的输出 反向传播需要的数据

        Returns:
        dA_prev -- 卷积层输入的梯度 shape(m, n_C_prev, n_H_prev, n_W_prev)
        dW -- 卷积核的梯度 shape(n_C, n_C_prev, f, f)
        db -- bias项的梯度 shape(n_C, 1, 1, 1)
        """

        (A_prev, W, b, hPara) = cache

        (m, n_C_prev, n_H_prev, n_W_prev) = A_prev.shape
        (n_C, n_C_prev, f, f) = W.shape

        stride = hPara['stride']
        pad = hPara['pad']

        (m, n_C, n_H, n_W) = dZ.shape

        # 初始化 dA_prev dW db
        dA_prev = np.zeros((m, n_C_prev, n_H_prev, n_W_prev))
        dW = np.zeros((n_C, n_C_prev, f, f))
        db = np.zeros((n_C, 1, 1, 1))

        # 填充张量
        A_prev_pad = self.zeroPad(A_prev, pad)
        dA_prev_pad = self.zeroPad(dA_prev, pad)

        for i in range(m):

            a_prev_pad = A_prev_pad[i]
            da_prev_pad = dA_prev_pad[i]

            for c in range(n_C):
                for h in range(n_H):
                    for w in range(n_W):
                        h_start = h * stride
                        h_end = h_start + f
                        w_start = w * stride
                        w_end = w_start + f

                        a_slice = a_prev_pad[:, h_start:h_end, w_start:w_end]
                        
                        da_prev_pad[:, h_start:h_end, w_start:w_end] += W[c, :, :, :] * dZ[i, c, h, w]
                        dW[c, :, :, :] += a_slice * dZ[i, c, h, w]
                        db[c, :, :, :] += dZ[i, c, h, w]

            dA_prev[i, :, :, :] = da_prev_pad[:, pad:-pad, pad:-pad]

        return dA_prev, dW, db

    def createMask(self, x):
        """
        池化层反向传播时 为生成矩阵残剩一个掩膜 用于 max pooling

        Arguments:
        x -- shape(f, f)

        Returns:
        mask -- shape(f, f) 显示x最大值的逻辑矩阵
        """
        mask = (x == np.max(x))

        return mask

    def distributeValue(self, dz, shape):
        """
        池化层反向传播时  分散矩阵 用于 均值池化

        Arguments:
        dz -- 输入的梯度
        shape -- 输入的dz分散后的形状 (n_H, n_W)

        Returns:
        a -- 分散后的矩阵 shape(n_H, n_W)
        """
        (n_H, n_W) = shape
    
        mean = dz / (n_H * n_W)
        a = np.ones(shape) * mean
                    
        return a

    def backwardPool(self, dA, cache, mode):
        """
        池化层反向传播

        Arguments:
        dA -- 池化层输出的梯度
        cache -- 缓存数据 由之前前向传播提供
        mode -- 池化层类型 'max' 'mean'
        
        Returns:
        dA_prev -- 池化层输入的梯度
        """
        (A_prev, hPara) = cache
        stride = hPara['stride']
        f = hPara['f']

        (m, n_C_prev, n_H_prev, n_W_prev) = A_prev.shape
        (m, n_C, n_H, n_W) = dA.shape

        # 初始化dA_prev
        dA_prev = np.zeros(A_prev.shape)

        for i in range(m):
            a_prev = A_prev[i]
            for c in range(n_C):
                for h in range(n_H):
                    for w in range(n_W):
                        h_start = h * stride
                        h_end = h_start + f
                        w_start = w * stride
                        w_end = w_start + f

                        if mode == 'max':
                            a_prev_slice = a_prev[c, h_start:h_end, w_start:w_end]
                            mask = self.createMask(a_prev_slice)
                            dA_prev[i, c, h_start:h_end, w_start:w_end] += mask * dA[i, c, h, w]

                        elif mode == 'mean':
                            da = dA[i, c, h, w]
                            shape = (f, f)
                            dA_prev[i, c, h_start:h_end, w_start:w_end] += self.distributeValue(da, shape)

                        return dA_prev

    def initializeWB(self):
        """
        初始化卷积核以及全连接层的权重矩阵

        Arguments:
        None

        Returns:
        WSet -- 各个初始化后的卷积核 以及全连接层的权重矩阵 以字典进行索引
        Bias -- 卷积层以及全连接层的偏差项 全部初始化为0
        """
        eps = 1

        convW_1 = np.random.rand(6, 1, 5, 5)
        convW_1 = 2 * eps * convW_1 - convW_1
        convB_1 = np.zeros((6, 1, 1, 1))

        convW_2 = np.random.rand(16, 6, 5, 5)
        convW_2 = 2 * eps * convW_2 - convW_2
        convB_2 = np.zeros((32, 1, 1, 1))

        FCW_1 = np.random.rand(256, 784)
        FCW_1 = 2 * eps * FCW_1 - FCW_1
        FCB_1 = np.zeros((1, 256))

        FCW_2 = np.random.rand(128, 256)
        FCW_2 = 2 * eps * FCW_2 - FCW_2
        FCB_2 = np.zeros((1, 128))

        FCW_3 = np.random.rand(10, 128)
        FCW_3 = 2 * eps * FCW_3 - FCW_3
        FCB_3 = np.zeros((1, 10))

        WSet = {'convW_1':convW_1, 'convW_2':convW_2, 'FCW_1':FCW_1, 'FCW_2':FCW_2, 'FCW_3':FCW_3}
        Bias = {'convB_1':convB_1, 'convB_2':convB_2, 'FCB_1':FCB_1, 'FCB_2':FCB_2, 'FCB_3':FCB_3}

        return WSet, Bias

    def ReLU(self, x_in):
        """
        ReLU激活函数

        Arguments:
        x_in -- 输入张量

        Returns:
        x_out -- 输出张量
        """
        mask = x_in > 0
        x_out = mask * x_in

        return x_out

    def computeCost(self, out, WSet):
        """
        计算代价函数

        Arguments:
        x -- 全连接层输出
        WSet -- 权重集合,用于正则化

        Returns:
        J -- 代价函数值
        dout -- 输出层误差
        """
        dout = -(np.log(out) * self.y_train)
        reg = self.lamda / 2 * (np.sum(WSet['convW_1']**2) + np.sum(WSet['convW_2']**2) + np.sum(WSet['FCW_1']**2) + np.sum(WSet['FCW_2']**2) + np.sum(WSet['FCW_3']**2))
        J = (np.sum(dout) + reg) / self.m

        return J, dout

    def epochRun(self):
        """
        运行一次完整的前向传播以及反向传播 更新卷积核以及全连接层权重

        Arguments:
        None

        Yield:
        J -- 前向传播输出后的代价函数
        """
        while True:
            ##---------- 前向传播 ----------##
            # 输入 -> 卷积层Ⅰ
            convZ_1, conv_cache_1 = self.forwardConv(self.X_train, self.WSet['convW_1'], self.Bias['convB_1'], {'stride':1, 'pad':2})
            # ReLU
            convA_1 = self.ReLU(convZ_1)

            # 卷积层Ⅰ -> 池化层Ⅰ
            poolA_1, pool_cache_1 = self.forwardPool(convA_1, {'f':2, 'stride':2})

            # 池化层Ⅰ -> 卷积层Ⅱ
            convZ_2, conv_cache_2 = self.forwardConv(poolA_1, self.WSet['convW_2'], self.Bias['convB_2'], {'stride':1, 'pad':2})
            #ReLU
            convA_2 = self.ReLU(convZ_2)

            # 卷积层Ⅱ -> 池化层Ⅱ
            poolA_2, pool_cache_2 = self.forwardPool(convA_2, {'f':2, 'stride':2})

            # 池化层Ⅱ -> 全连接层0 (展开池化层)
            FClayer_0 = poolA_2.reshape((self.m, 784))

            # 全连接层0 -> 全连接层1 FCmask作为ReLu的掩膜同时也是偏导数
            FClayer_1_z = np.dot(FClayer_0, self.WSet['FCW_1'].T) + self.Bias['FCB_1']
            # ReLU
            FClayer_1 = self.ReLU(FClayer_1_z)

            # 全连接层1 -> 全连接层2
            FClayer_2_z = np.dot(FClayer_1, self.WSet['FCW_2'].T) + self.Bias['FCB_2']
            # ReLU
            FClayer_2 = self.ReLU(FClayer_2_z)

            # 全连接层2 -> 输出
            FClayer_3_z = np.dot(FClayer_2, self.WSet['FCW_3'].T) + self.Bias['FCB_3']
            # softmax
            out = np.exp(FClayer_3_z)
            sum_out = np.sum(out, 1)
            sum_out = np.array([sum_out]).T
            out = out / sum_out

            ##--------- 计算loss(代价函数值J) 同时也是最后一层的误差 用于反向传播 ---------##
            J, dFC_3 = self.computeCost(out, self.WSet)
            yield J

            ##---------- 反向传播 ----------##
            # 输出 -> 全连接层2
            dFC_2 = np.dot(dFC_3, self.WSet['FCW_3']) * (FClayer_2 > 0)
            dFCW_3 = np.dot(dFC_3.T, FClayer_2)

            # 全连接层2 -> 全连接层1
            dFC_1 = np.dot(dFC_2, self.WSet['FCW_2']) * (FClayer_1 > 0)
            dFCW_2 = np.dot(dFC_2.T, FClayer_1)

            # 全连接层1 -> 全连接层0
            dFC_0 = np.dot(dFC_1, self.WSet['FCW_1']) 
            dFCW_1 = np.dot(dFC_1.T, FClayer_0)

            # 将全连接层0形状变为池化层Ⅱ
            dpoolA_2 = dFC_0.reshape((self.m, 16, 7, 7))

            # 池化层Ⅱ -> 卷积层Ⅱ
            dconvA_2 = self.backwardPool(dpoolA_2, pool_cache_2, 'max')
            dconvA_2_z = dconvA_2 * (pool_cache_2[0] > 0)

            # 卷积层Ⅱ -> 池化层Ⅰ
            dpoolA_1, dconvW_2, dconvB_2 = self.backwardConv(dconvA_2_z, conv_cache_2)

            # 池化层Ⅰ -> 卷积层Ⅰ
            dconvA_1 = self.backwardPool(dpoolA_1, pool_cache_1, 'max')
            dconvA_1_z = (dconvA_1 * pool_cache_1[0] > 0)

            # 卷积层Ⅰ误差
            din, dconvW_1, dconvB_1 = self.backwardConv(dconvA_1, conv_cache_1)

            ##---------- 梯度下降更新参数 ----------##
            self.WSet['FCW_3'] -= self.alpha * (dFCW_3 + self.lamda * self.WSet['FCW_3']) / self.m
            self.WSet['FCW_2'] -= self.alpha * (dFCW_2 + self.lamda * self.WSet['FCW_2']) / self.m
            self.WSet['FCW_1'] -= self.alpha * (dFCW_1 + self.lamda * self.WSet['FCW_1']) / self.m
            self.Bias['FCB_3'] -= self.alpha * np.sum(dFC_3, 0)
            self.Bias['FCB_2'] -= self.alpha * np.sum(dFC_2, 0)
            self.Bias['FCB_1'] -= self.alpha * np.sum(dFC_1, 0)

            self.WSet['convW_2'] -= self.alpha * (dconvW_2 + self.lamda * self.WSet['convW_2']) / self.m
            self.WSet['convW_1'] -= self.alpha * (dconvW_1 + self.lamda * self.WSet['convW_1']) / self.m
            self.Bias['convB_2'] -= self.alpha * dconvB_2
            self.Bias['convB_1'] -= self.alpha * dconvB_1

    def run(self):
        """
        训练CNN 不断更新WSet 和 Bias

        Argument:
        None

        Returns:
        None
        """
        epoch = self.epochRun()
        J = next(epoch)
        tempJ = 0
        iteraTime = 1

        while True:
            tempJ = next(epoch)
            iteraTime += 1
            print('cost=' + str(tempJ), end=' ')
            print('eps=' + str(abs(tempJ - J)), end=' ')
            print('iter_time=' + str(iteraTime))
            # 收敛或达到最大迭代次数退出
            if (abs(tempJ - J) < 0.001) | (iteraTime > self.maxItera):
                break
            else:
                J = tempJ

        # 存储数据
        sio.savemat('letterCNN.mat', {'WSet':self.WSet, 'Bias':self.Bias})

    def test(self):
        """
        测试
        """
        convZ_1, conv_cache_1 = self.forwardConv(self.X_test, self.WSet['convW_1'], self.Bias['convB_1'], {'stride':1, 'pad':2})
        convA_1 = self.ReLU(convZ_1)        
        poolA_1, pool_cache_1 = self.forwardPool(convA_1, {'f':2, 'stride':2})        
        convZ_2, conv_cache_2 = self.forwardConv(poolA_1, self.WSet['convW_2'], self.Bias['convB_2'], {'stride':1, 'pad':2})        
        convA_2 = self.ReLU(convZ_2)
        poolA_2, pool_cache_2 = self.forwardPool(convA_2, {'f':2, 'stride':2})    
        FClayer_0 = poolA_2.reshape((self.m, 784))    
        FClayer_1_z = np.dot(FClayer_0, self.WSet['FCW_1'].T) + self.Bias['FCB_1']       
        FClayer_1 = self.ReLU(FClayer_1_z)        
        FClayer_2_z = np.dot(FClayer_1, self.WSet['FCW_2'].T) + self.Bias['FCB_2']       
        FClayer_2 = self.ReLU(FClayer_2_z)      
        FClayer_3_z = np.dot(FClayer_2, self.WSet['FCW_3'].T) + self.Bias['FCB_3']     
        out = np.exp(FClayer_3_z)
        sum_out = np.sum(out, 1)
        sum_out = np.arrray([sum_out]).T
        out = out / sum_out

        y_predit = np.where(out==np.max(out, 1).reshape(self.y_test.shape))[1]
        y_predit = y_predit.reshape(self.y_test.shape)

        result = np.sum(y_predit == self.y_test)
        corr_rate = result / self.y_test.shape[0]

        print("%.2f" %(corr_rate*100))

if __name__ == "__main__":
    MNIST_CNN = CNN(0.01, 1.0, 100)
    MNIST_CNN.run()
    MNIST_CNN.test()


你可能感兴趣的:(深度学习,神经网络,深度学习,python)