注意:这里的卷积函数,其实是数学上的相关函数,而不是数学上的卷积函数。
K \pmb{K} KKK:4维核张量
K i , j , k , l K_{i,j,k,l} Ki,j,k,l: K \pmb{K} KKK 的元素,角标的含义:i i i:输出中的第 i i i 个通道
j j j:输入中的第 i i i 个通道
k k k, l l l:第 k k k 行,第 l l l 列V \pmb{V} VVV:3维观测数据张量
V i , j , k V_{i,j,k} Vi,j,k: V \pmb{V} VVV 的元素,含义:处于通道 i i i 中的第 j j j 行第 k k k 列的值
Z \pmb{Z} ZZZ:3维张量,是卷积函数的输出
Z i , j , k Z_{i,j,k} Zi,j,k: Z \pmb{Z} ZZZ 的元素,含义:处于通道 i i i 中的第 j j j 行第 k k k 列的值
G \pmb{G} GGG:3维张量,在反向传播中得到。
G i , j , k G_{i,j,k} Gi,j,k:处于通道 i i i 中的第 j j j 行第 k k k 列的值,计算方法:
G i , j , k = ∂ ∂ Z i , j , k J ( V , K ) G_{i,j,k}=\frac{\partial}{\partial Z_{i,j,k}}J(\pmb{V},\pmb{K}) Gi,j,k=∂Zi,j,k∂J(VVV,KKK)s s s:下采样卷积的步幅(stride)
t t t:平铺卷积中的核数,这种卷积的输出在 t t t 个不同的核组成的集合中进行循环。当 t t t 等于这种卷积输出的宽度时,平铺卷积转化为局部连接卷积。
J ( V , K ) J(\pmb{V},\pmb{K}) J(VVV,KKK):损失函数
c ( V , K , s ) c(\pmb{V},\pmb{K},s) c(VVV,KKK,s):一个卷积函数,代表单层网络
Z i , j , k = c ( K , V , s ) i , j , k = ∑ l , m , n [ V l , ( j − 1 ) × s + m , ( k − 1 ) × s + n K i , l , m , n ] \begin{aligned} Z_{i,j,k}&=c(\pmb{K},\pmb{V},s)_{i,j,k} \\ &=\sum_{l,m,n}[V_{l,(j-1)\times s+m,(k-1)\times s+n}K_{i,l,m,n}] \end{aligned} Zi,j,k=c(KKK,VVV,s)i,j,k=l,m,n∑[Vl,(j−1)×s+m,(k−1)×s+nKi,l,m,n]
Z i , j , k = ∑ l , m , n [ V l , j − 1 + m , k − 1 + n K i , j , k , l , m , n ] \begin{aligned} Z_{i,j,k}=\sum_{l,m,n}[V_{l,j-1+m,k-1+n}K_{i,j,k,l,m,n}] \end{aligned} Zi,j,k=l,m,n∑[Vl,j−1+m,k−1+nKi,j,k,l,m,n]
Z i , j , k = ∑ l , m , n [ V l , j − 1 + m , k − 1 + n K i , l , m , n , j % t + 1 , k % t + 1 ] \begin{aligned} Z_{i,j,k}&=\sum_{l,m,n}[V_{l,j-1+m,k-1+n}K_{i,l,m,n,j\% t+1,k\% t+1}] \end{aligned} Zi,j,k=l,m,n∑[Vl,j−1+m,k−1+nKi,l,m,n,j%t+1,k%t+1]
花海《深度学习》中第九章第五节公式(9.11)(9.13)的推导。
首先是这两个公式的基本内容:
g ( G , V , s ) i , j , k , l = ∂ ∂ K i , j , k , l J ( V , K ) = ∑ α , β , γ ∂ Z α , β , γ ∂ K i , j , k , l ∂ ∂ Z α , β , γ J ( V , K ) = ∑ α , β , γ ∂ Z α , β , γ ∂ K i , j , k , l G α , β , γ = ∑ α , β , γ G α , β , γ ∂ Z α , β , γ ∂ K i , j , k , l = ∑ α , β , γ G α , β , γ ∂ ∂ K i , j , k , l c ( K , V , s ) α , β , γ = ∑ α , β , γ G α , β , γ ∂ ∂ K i , j , k , l ∑ η , ξ , φ [ V η , ( β − 1 ) × s + ξ , ( γ − 1 ) × s + φ K α , η , ξ , φ ] = ∑ α , β , γ G α , β , γ ∑ η , ξ , φ [ V η , ( β − 1 ) × s + ξ , ( γ − 1 ) × s + φ ∂ ∂ K i , j , k , l K α , η , ξ , φ ] = ∑ i , β , γ G i , β , γ V j , ( β − 1 ) × s + k , ( γ − 1 ) × s + l ⇐ { α = i η = j ξ = k φ = l = ∑ β , γ G i , β , γ V j , ( β − 1 ) × s + k , ( γ − 1 ) × s + l = ∑ m , n G i , m , n V j , ( m − 1 ) × s + k , ( n − 1 ) × s + l ⇐ { m ← β n ← γ \begin{aligned} g(\pmb{G},\pmb{V},s)_{i,j,k,l}&=\frac{\partial}{\partial{K_{i,j,k,l}}}J(\pmb{V},\pmb{K}) \\ &=\sum_{\alpha,\beta,\gamma}\frac{\partial Z_{\alpha,\beta,\gamma}}{\partial{K_{i,j,k,l}}}\frac{\partial}{\partial Z_{\alpha,\beta,\gamma}}J(\pmb{V},\pmb{K}) \\ &=\sum_{\alpha,\beta,\gamma}\frac{\partial Z_{\alpha,\beta,\gamma}}{\partial{K_{i,j,k,l}}}G_{\alpha,\beta,\gamma} \\ &=\sum_{\alpha,\beta,\gamma}G_{\alpha,\beta,\gamma}\frac{\partial Z_{\alpha,\beta,\gamma}}{\partial{K_{i,j,k,l}}} \\ &=\sum_{\alpha,\beta,\gamma}G_{\alpha,\beta,\gamma}\frac{\partial }{\partial{K_{i,j,k,l}}}c(\pmb{K},\pmb{V},s)_{\alpha,\beta,\gamma} \\ &=\sum_{\alpha,\beta,\gamma}G_{\alpha,\beta,\gamma}\frac{\partial}{\partial{K_{i,j,k,l}}}\sum_{\eta,\xi,\varphi}[V_{\eta,(\beta-1)\times s+\xi,(\gamma-1)\times s+\varphi}K_{\alpha,\eta,\xi,\varphi}] \\ &=\sum_{\alpha,\beta,\gamma}G_{\alpha,\beta,\gamma}\sum_{\eta,\xi,\varphi}[V_{\eta,(\beta-1)\times s+\xi,(\gamma-1)\times s+\varphi}\frac{\partial}{\partial{K_{i,j,k,l}}}K_{\alpha,\eta,\xi,\varphi}] \\ &=\sum_{i,\beta,\gamma}G_{i,\beta,\gamma}V_{j,(\beta-1)\times s+k,(\gamma-1)\times s+l} \qquad \Leftarrow \begin{cases}\alpha&=i\\ \eta&=j\\ \xi&=k\\ \varphi&=l \end{cases}\\ &=\sum_{\beta,\gamma}G_{i,\beta,\gamma}V_{j,(\beta-1)\times s+k,(\gamma-1)\times s+l} \\ &=\sum_{m,n}G_{i,m,n}V_{j,(m-1)\times s+k,(n-1)\times s+l} \qquad \Leftarrow \begin{cases}m&\gets \beta\\ n&\gets \gamma \end{cases} \end{aligned} g(GGG,VVV,s)i,j,k,l=∂Ki,j,k,l∂J(VVV,KKK)=α,β,γ∑∂Ki,j,k,l∂Zα,β,γ∂Zα,β,γ∂J(VVV,KKK)=α,β,γ∑∂Ki,j,k,l∂Zα,β,γGα,β,γ=α,β,γ∑Gα,β,γ∂Ki,j,k,l∂Zα,β,γ=α,β,γ∑Gα,β,γ∂Ki,j,k,l∂c(KKK,VVV,s)α,β,γ=α,β,γ∑Gα,β,γ∂Ki,j,k,l∂η,ξ,φ∑[Vη,(β−1)×s+ξ,(γ−1)×s+φKα,η,ξ,φ]=α,β,γ∑Gα,β,γη,ξ,φ∑[Vη,(β−1)×s+ξ,(γ−1)×s+φ∂Ki,j,k,l∂Kα,η,ξ,φ]=i,β,γ∑Gi,β,γVj,(β−1)×s+k,(γ−1)×s+l⇐⎩⎪⎪⎪⎨⎪⎪⎪⎧αηξφ=i=j=k=l=β,γ∑Gi,β,γVj,(β−1)×s+k,(γ−1)×s+l=m,n∑Gi,m,nVj,(m−1)×s+k,(n−1)×s+l⇐{mn←β←γ
h ( K , G , s ) i , j , k = ∂ ∂ V i , j , k J ( V , K ) = ∑ α , β , γ ∂ Z α , β , γ ∂ V i , j , k ∂ ∂ Z α , β , γ J ( V , K ) = ∑ α , β , γ ∂ Z α , β , γ ∂ V i , j , k G α , β , γ = ∑ α , β , γ ∂ c ( K , V , s ) α , β , γ ∂ V i , j , k G α , β , γ = ∑ α , β , γ { ∂ ∂ V i , j , k ∑ η , ξ , φ [ V η , ( β − 1 ) × s + ξ , ( γ − 1 ) × s + φ K α , η , ξ , φ ] } G α , β , γ = ∑ α , β , γ { ∑ η , ξ , φ [ ∂ V η , ( β − 1 ) × s + ξ , ( γ − 1 ) × s + φ ∂ V i , j , k K α , η , ξ , φ ] } G α , β , γ = ∑ α , β , γ K α , η , ξ , φ G α , β , γ w h e r e { η = i ( β − 1 ) × s + ξ = j ( γ − 1 ) × s + φ = k = ∑ α , β , γ K α , i , ξ , φ G α , β , γ w h e r e { ( β − 1 ) × s + ξ = j ( γ − 1 ) × s + φ = k = ∑ β , ξ s . t . ( β − 1 ) × s + ξ = j ∑ γ , φ s . t . ( γ − 1 ) × s + φ = k ∑ α K α , i , ξ , φ G α , β , γ = ∑ l , m s . t . ( l − 1 ) × s + m = j ∑ n , p s . t . ( n − 1 ) × s + p = k ∑ q K q , i , m , p G q , l , n ⇐ { q ← α l ← β n ← γ m ← ξ p ← φ \begin{aligned} h(\pmb{K},\pmb{G},s)_{i,j,k}&=\frac{\partial}{\partial{V_{i,j,k}}}J(\pmb{V},\pmb{K}) \\ &= \sum_{\alpha,\beta,\gamma}\frac{\partial{Z_{\alpha,\beta,\gamma}}}{\partial{V_{i,j,k}}}\frac{\partial}{\partial{Z_{\alpha,\beta,\gamma}}} J(\pmb{V},\pmb{K})\\ &= \sum_{\alpha,\beta,\gamma}\frac{\partial{Z_{\alpha,\beta,\gamma}}}{\partial{V_{i,j,k}}}G_{\alpha,\beta,\gamma}\\ &= \sum_{\alpha,\beta,\gamma}\frac{\partial c(\pmb{K},\pmb{V},s)_{\alpha,\beta,\gamma}}{\partial{V_{i,j,k}}} G_{\alpha,\beta,\gamma}\\ &= \sum_{\alpha,\beta,\gamma}\{\frac{\partial}{\partial{V_{i,j,k}}} \sum_{\eta,\xi,\varphi}[V_{\eta,(\beta-1)\times s+\xi,(\gamma-1)\times s+\varphi}K_{\alpha,\eta,\xi,\varphi}]\} G_{\alpha,\beta,\gamma}\\ &= \sum_{\alpha,\beta,\gamma}\{\sum_{\eta,\xi,\varphi}[\frac{\partial V_{\eta,(\beta-1)\times s+\xi,(\gamma-1)\times s+\varphi}}{\partial{V_{i,j,k}}} K_{\alpha,\eta,\xi,\varphi}]\} G_{\alpha,\beta,\gamma}\\ &=\sum_{\alpha,\beta,\gamma} K_{\alpha,\eta,\xi,\varphi} G_{\alpha,\beta,\gamma} \qquad where\begin{cases}&\eta=i\\ &(\beta-1)\times s+\xi=j\\ &(\gamma-1)\times s+\varphi=k \end{cases}\\ &=\sum_{\alpha,\beta,\gamma} K_{\alpha,i,\xi,\varphi} G_{\alpha,\beta,\gamma} \qquad where\begin{cases}&(\beta-1)\times s+\xi=j\\ &(\gamma-1)\times s+\varphi=k \end{cases}\\ &=\sum_{\beta,\xi \above{0pt} s.t. (\beta-1)\times s+\xi=j}\sum_{\gamma,\varphi\above{0pt} s.t. (\gamma-1)\times s+\varphi=k}\sum_{\alpha}K_{\alpha,i,\xi,\varphi}G_{\alpha,\beta,\gamma}\\ &=\sum_{l,m \above{0pt} s.t. (l-1)\times s+m=j}\sum_{n,p\above{0pt} s.t. (n-1)\times s+p=k}\sum_{q}K_{q,i,m,p}G_{q,l,n} \qquad \Leftarrow \begin{cases}q&\gets\alpha \\ l&\gets\beta \\ n&\gets \gamma \\ m&\gets \xi\\ p&\gets\varphi \\\end{cases} \end{aligned} h(KKK,GGG,s)i,j,k=∂Vi,j,k∂J(VVV,KKK)=α,β,γ∑∂Vi,j,k∂Zα,β,γ∂Zα,β,γ∂J(VVV,KKK)=α,β,γ∑∂Vi,j,k∂Zα,β,γGα,β,γ=α,β,γ∑∂Vi,j,k∂c(KKK,VVV,s)α,β,γGα,β,γ=α,β,γ∑{∂Vi,j,k∂η,ξ,φ∑[Vη,(β−1)×s+ξ,(γ−1)×s+φKα,η,ξ,φ]}Gα,β,γ=α,β,γ∑{η,ξ,φ∑[∂Vi,j,k∂Vη,(β−1)×s+ξ,(γ−1)×s+φKα,η,ξ,φ]}Gα,β,γ=α,β,γ∑Kα,η,ξ,φGα,β,γwhere⎩⎪⎨⎪⎧η=i(β−1)×s+ξ=j(γ−1)×s+φ=k=α,β,γ∑Kα,i,ξ,φGα,β,γwhere{(β−1)×s+ξ=j(γ−1)×s+φ=k=s.t.(β−1)×s+ξ=jβ,ξ∑s.t.(γ−1)×s+φ=kγ,φ∑α∑Kα,i,ξ,φGα,β,γ=s.t.(l−1)×s+m=jl,m∑s.t.(n−1)×s+p=kn,p∑q∑Kq,i,m,pGq,l,n⇐⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧qlnmp←α←β←γ←ξ←φ