本文给出 MaxPool 函数的定义, 并求解其在反向传播中的梯度
配套代码, 请参考文章 :
Python和PyTorch对比实现池化层MaxPool函数及反向传播
系列文章索引 :
https://blog.csdn.net/oBrightLamp/article/details/85067981
池化是一种尺寸缩小操作, 可以将大尺寸的图片缩小, 集约特征.
本文约定矩阵元素的脚标从 0 开始.
当 X 为 m x n 矩阵, 卷积核尺寸 2 x 2, 步长为 1 时 :
y i j = m a x ( x i , j ,    x i , j + 1 ,    x i + 1 , j ,    x i + 1 , j + 1 )    i ⩽ m − 2    j ⩽ n − 2 y_{ij} = max(x_{i,j},\; x_{i,j+1},\; x_{i+1,j},\; x_{i+1,j+1})\\ \;\\ i\leqslant m-2 \\ \;\\ j\leqslant n-2 \\ yij=max(xi,j,xi,j+1,xi+1,j,xi+1,j+1)i⩽m−2j⩽n−2
当 X 为 m x n 矩阵, 卷积核尺寸 p x q, 步长为 1 时 :
r = 0 , 1 , 2 , 3 , ⋯   , p − 1 s = 0 , 1 , 2 , 3 , ⋯   , q − 1 y i j = m a x ( x i + r , j + s ) i ⩽ m − p j ⩽ n − q r = 0,1,2,3,\cdots,p-1\\ s = 0,1,2,3,\cdots,q-1\\ y_{ij} = max(x_{i+r,j+s}) \\ i\leqslant m-p\\ j\leqslant n-q\\ r=0,1,2,3,⋯,p−1s=0,1,2,3,⋯,q−1yij=max(xi+r,j+s)i⩽m−pj⩽n−q
当 W 为 p x q 矩阵, 步长为 t, 为保证整除, 填充后的 X 是 m x n 矩阵时 :
r = 0 , 1 , 2 , 3 , ⋯   , p − 1 s = 0 , 1 , 2 , 3 , ⋯   , q − 1 y i j = m a x ( x i ⋅ t + r , j ⋅ t + s ) i ⩽ ( m − p ) / t j ⩽ ( n − q ) / t r = 0,1,2,3,\cdots,p-1\\ s = 0,1,2,3,\cdots,q-1\\ y_{ij} = max(x_{i\cdot t+r,j\cdot t+s})\\ i\leqslant (m-p)/t\\ j\leqslant (n-q)/t\\ r=0,1,2,3,⋯,p−1s=0,1,2,3,⋯,q−1yij=max(xi⋅t+r,j⋅t+s)i⩽(m−p)/tj⩽(n−q)/t
设卷积核尺寸为 p x q, 步长为 t, 为保证整除, 填充后的 X 是 m x n 矩阵, 经 MaxPooling 卷积得到 g x h 矩阵 Y, 往前 forward 传播得到误差值 error (标量 e ). 上游的误差梯度向量 ∇ e ( Y ) \nabla e_{(Y)} ∇e(Y) 已在反向传播时得到, 求 e 对 X 的梯度.
已知 :
r = 0 , 1 , 2 , 3 , ⋯   , p − 1 s = 0 , 1 , 2 , 3 , ⋯   , q − 1 y i j = m a x ( x i ⋅ t + r , j ⋅ t + s ) i ⩽ ( m − p ) / t j ⩽ ( n − q ) / t r = 0,1,2,3,\cdots,p-1\\ s = 0,1,2,3,\cdots,q-1\\ y_{ij} = max(x_{i\cdot t+r,j\cdot t+s})\\ i\leqslant (m-p)/t\\ j\leqslant (n-q)/t\\ r=0,1,2,3,⋯,p−1s=0,1,2,3,⋯,q−1yij=max(xi⋅t+r,j⋅t+s)i⩽(m−p)/tj⩽(n−q)/t
e = f o r w a r d ( Y )    ∇ e ( Y ) = d e d Y = ( ∂ e / ∂ y 11 ∂ e / ∂ y 12 ∂ e / ∂ y 13 ⋯ ∂ e / ∂ y 1 h ∂ e / ∂ y 21 ∂ e / ∂ y 22 ∂ e / ∂ y 23 ⋯ ∂ e / ∂ y 2 h ∂ e / ∂ y 31 ∂ e / ∂ y 32 ∂ e / ∂ y 33 ⋯ ∂ e / ∂ y 3 h ⋮ ⋮ ⋮ ⋱ ⋮ ∂ e / ∂ y g 1 ∂ e / ∂ y g 2 ∂ e / ∂ y g 3 ⋯ ∂ e / ∂ y g h ) e=forward(Y)\\ \;\\ \nabla e_{(Y)}=\frac{de}{dY}=\begin{pmatrix} \partial e/ \partial y_{11}&\partial e/ \partial y_{12}&\partial e/ \partial y_{13}&\cdots& \partial e/ \partial y_{1h}\\ \partial e/ \partial y_{21}&\partial e/ \partial y_{22}&\partial e/ \partial y_{23}&\cdots& \partial e/ \partial y_{2h}\\ \partial e/ \partial y_{31}&\partial e/ \partial y_{32}&\partial e/ \partial y_{33}&\cdots& \partial e/ \partial y_{3h}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ \partial e/ \partial y_{g1}&\partial e/ \partial y_{g2}&\partial e/ \partial y_{g3}&\cdots& \partial e/ \partial y_{gh}\\ \end{pmatrix} e=forward(Y)∇e(Y)=dYde=⎝⎜⎜⎜⎜⎜⎛∂e/∂y11∂e/∂y21∂e/∂y31⋮∂e/∂yg1∂e/∂y12∂e/∂y22∂e/∂y32⋮∂e/∂yg2∂e/∂y13∂e/∂y23∂e/∂y33⋮∂e/∂yg3⋯⋯⋯⋱⋯∂e/∂y1h∂e/∂y2h∂e/∂y3h⋮∂e/∂ygh⎠⎟⎟⎟⎟⎟⎞
求解过程 :
∂ y i j ∂ x u v = { 1    , x u v = m a x ( x i ⋅ t + r , j ⋅ t + s ) 0    , o t h e r s ,    其 他 情 况 \frac{\partial y_{ij}}{\partial x_{uv}}= \left\{ \begin{array}{rr} 1\;, & x_{uv}=max(x_{i\cdot t+r,j\cdot t+s})\\ 0\;, & others,\;其他情况 \end{array} \right. ∂xuv∂yij={1,0,xuv=max(xi⋅t+r,j⋅t+s)others,其他情况
∂ e ∂ x u v = ∑ i = 0 g − 1 ∑ j = 0 h − 1 ∂ e ∂ y i j ∂ y i j ∂ x u v = { ∂ e / ∂ y i j    , x u v = m a x ( x i ⋅ t + r , j ⋅ t + s ) 0    , o t h e r s ,    其 他 情 况 \frac{\partial e}{\partial x_{uv}} = \sum_{i=0}^{g-1}\sum_{j=0}^{h-1}\frac{\partial e}{\partial y_{ij}}\frac{\partial y_{ij}}{\partial x_{uv}}=\left\{ \begin{array}{rr} {\partial e}/{\partial y_{ij}}\;, & x_{uv}=max(x_{i\cdot t+r,j\cdot t+s})\\ 0\;, & others,\;其他情况 \end{array} \right. ∂xuv∂e=i=0∑g−1j=0∑h−1∂yij∂e∂xuv∂yij={∂e/∂yij,0,xuv=max(xi⋅t+r,j⋅t+s)others,其他情况
其中, ∂ e / ∂ y i j {\partial e}/{\partial y_{ij}} ∂e/∂yij 由上游计算得出.
numpy 中的 max, argmax, reshape 函数在实现 MaxPool 时非常好用.