Conv2d反向传播梯度的计算过程

我们用一个例子来说明:


x ∗ w = y x*w=y xw=y
并且
x = [ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] , w = [ w 11 w 12 w 21 w 22 ] , y = [ y 11 y 12 y 21 y 22 ] x=\begin{bmatrix} x_{11} & x_{12} & x_{13}\\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{bmatrix}, \quad w=\begin{bmatrix} w_{11} & w_{12}\\ w_{21} & w_{22} \end{bmatrix}, \quad y=\begin{bmatrix} y_{11} & y_{12}\\ y_{21} & y_{22} \end{bmatrix} x=x11x21x31x12x22x32x13x23x33,w=[w11w21w12w22],y=[y11y21y12y22]


{ y 11 = w 11 x 11 + w 12 x 12 + w 21 x 21 + w 22 x 22 y 12 = w 11 x 12 + w 12 x 13 + w 21 x 22 + w 22 x 23 y 21 = w 11 x 21 + w 12 x 22 + w 21 x 31 + w 22 x 32 y 22 = w 11 x 22 + w 12 x 23 + w 21 x 32 + w 22 x 33 \left\{\begin{aligned} &y_{11} = w_{11}x_{11} + w_{12}x_{12}+w_{21}x_{21}+w_{22}x_{22}\\ &y_{12} = w_{11}x_{12} + w_{12}x_{13}+w_{21}x_{22}+w_{22}x_{23}\\ &y_{21} = w_{11}x_{21} + w_{12}x_{22}+w_{21}x_{31}+w_{22}x_{32}\\ &y_{22} = w_{11}x_{22} + w_{12}x_{23}+w_{21}x_{32}+w_{22}x_{33} \end{aligned}\right. y11=w11x11+w12x12+w21x21+w22x22y12=w11x12+w12x13+w21x22+w22x23y21=w11x21+w12x22+w21x31+w22x32y22=w11x22+w12x23+w21x32+w22x33

梯度
∂ L ∂ x = ∂ L ∂ y ∂ y ∂ x = [ δ 11 w 11 δ 11 w 12 + δ 12 w 11 δ 12 w 12 δ 11 w 21 + δ 21 w 11 δ 11 w 22 + δ 12 w 21 + δ 21 w 12 + δ 22 w 11 δ 12 w 22 + δ 22 w 12 δ 21 w 21 δ 21 w 22 + δ 22 w 21 δ 22 w 22 ] \frac{\partial L}{\partial x} = \frac{\partial L}{\partial y}\frac{\partial y}{\partial x}=\begin{bmatrix} \delta_{11}w_{11} & \delta_{11}w_{12} + \delta_{12}w_{11} & \delta_{12}w_{12}\\ \delta_{11}w_{21} + \delta_{21}w_{11} & \delta_{11}w_{22} + \delta_{12}w_{21} + \delta_{21}w_{12} + \delta_{22}w_{11} & \delta_{12}w_{22} + \delta_{22}w_{12}\\ \delta_{21}w_{21} & \delta_{21}w_{22} + \delta_{22}w_{21} & \delta_{22}w_{22} \end{bmatrix} xL=yLxy=δ11w11δ11w21+δ21w11δ21w21δ11w12+δ12w11δ11w22+δ12w21+δ21w12+δ22w11δ21w22+δ22w21δ12w12δ12w22+δ22w12δ22w22
其计算过程如下图所示(其中不同颜色块对应位置的值相加):
Conv2d反向传播梯度的计算过程_第1张图片
另外,可以用卷积的形式计算 x x x的梯度:
∂ L ∂ x = ( 0 0 0 0 0 δ 11 δ 12 0 0 δ 21 δ 22 0 0 0 0 0 ) ∗ ( w 22 w 21 w 12 w 11 ) \frac{\partial L}{\partial x} = \left( \begin{array}{ccc} 0&0&0&0 \\ 0&\delta_{11}& \delta_{12}&0 \\ 0&\delta_{21}&\delta_{22}&0 \\ 0&0&0&0 \end{array} \right) * \left( \begin{array}{ccc} w_{22}&w_{21}\\ w_{12}&w_{11} \end{array} \right) xL=00000δ11δ2100δ12δ2200000(w22w12w21w11)
此时卷积核 w w w需要旋转180度,也即上下翻转一次,然后左右翻转一次。输入矩阵需要增加padding

设正向卷积的padding为 p p p,反向卷积的padding为 p ′ p' p,则需要增加的padding为:
p ′ = k − p − 1 p' = k-p-1 p=kp1

其中 k k k是卷积核的大小,另外假设正向卷积的stride为1。

你可能感兴趣的:(人工智能/深度学习/机器学习)