矩阵求导推导

矩阵求导推导

1、定义两个矩阵相乘: A ⋅ B = C A \cdot B = C AB=C
2、考虑loss函数: L o s s = ∑ i m ∑ j n ( C i j − p ) 2 Loss = \sum_{i}^{m}\sum_{j}^{n}(C_{ij}-p)^2 Loss=imjn(Cijp)2
3、考虑C的每一项导数: ▽ C i j = ∂ L ∂ C i j \triangledown C_{ij} = \frac {\partial L} {\partial C_{ij}} Cij=CijL
4、定义ABC都为2*2的矩阵,G为Loss对C的导数: A = [ a b c d ] B = [ e f g h ] C = [ i j k k ] G = [ ∂ L ∂ i ∂ L ∂ j ∂ L ∂ k ∂ L ∂ l ] = [ w x y z ] A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\quad B = \begin{bmatrix} e & f\\ g & h \end{bmatrix}\quad C = \begin{bmatrix} i & j\\ k & k \end{bmatrix}\quad G = \begin{bmatrix} \frac {\partial L}{\partial i}&\frac {\partial L}{\partial j}\\ \frac {\partial L}{\partial k}&\frac{\partial L}{\partial l} \end{bmatrix} = \begin{bmatrix} w & x \\ y & z \end{bmatrix} A=[acbd]B=[egfh]C=[ikjk]G=[iLkLjLlL]=[wyxz]
5、将 A ⋅ B = C A \cdot B = C AB=C展开得到:
C = [ i = a e + b g j = a f + b h k = c e + d g l = c f + d h ] C = \begin{bmatrix} i = ae + bg && j = af + bh\\ k = ce + dg && l = cf + dh \end{bmatrix} C=[i=ae+bgk=ce+dgj=af+bhl=cf+dh]
6、L对A中的每一项导数为:
▽ A i j = ∂ L ∂ A i j \triangledown A_{ij} = \frac {\partial L}{\partial A_{ij}} Aij=AijL
展开式子如下
∂ L ∂ a = ∂ L ∂ i ∗ ∂ i ∂ a + ∂ L ∂ j ∗ ∂ j ∂ a \frac{\partial L}{\partial a} = \frac {\partial L}{\partial i} * \frac {\partial i}{\partial a} + \frac {\partial L}{\partial j} * \frac{\partial j}{\partial a} aL=iLai+jLaj
∂ L ∂ b = ∂ L ∂ i ∗ ∂ i ∂ b + ∂ L ∂ j ∗ ∂ j ∂ b \frac{\partial L}{\partial b} = \frac{\partial L}{\partial i}*\frac{\partial i}{\partial b} + \frac{\partial L}{\partial j}*\frac {\partial j}{\partial b} bL=iLbi+jLbj
∂ L ∂ c = ∂ L ∂ k ∗ ∂ k ∂ c + ∂ L ∂ l ∗ ∂ l ∂ c \frac{\partial L}{\partial c} = \frac{\partial L}{\partial k}*\frac{\partial k}{\partial c} + \frac{\partial L}{\partial l} * \frac{\partial l}{\partial c} cL=kLck+lLcl
∂ L ∂ d = ∂ L ∂ k ∗ ∂ k ∂ d + ∂ L ∂ l ∗ ∂ l ∂ d \frac{\partial L}{\partial d} = \frac {\partial L}{\partial k}*\frac{\partial k}{\partial d} + \frac{\partial L}{\partial l} * \frac{\partial l}{\partial d} dL=kLdk+lLdl
即:
∂ L ∂ a = w e + x f \frac {\partial L}{\partial a} = we + xf aL=we+xf
∂ L ∂ b = w g + x h \frac{\partial L}{\partial b } = wg + xh bL=wg+xh
∂ L ∂ c = y e + z f \frac{\partial L}{\partial c} = ye+zf cL=ye+zf
∂ L ∂ d = y g + z h \frac{\partial L}{\partial d} = yg + zh dL=yg+zh
7、所以Loss对A的导数为: ▽ A = [ w e + x f w g + x h y e + z f y g + z h ] ▽ A = [ w x y z ] [ e g f h ] \triangledown A = \begin{bmatrix} we + xf & wg + xh\\ ye + zf & yg + zh \end{bmatrix} \quad \triangledown A = \begin{bmatrix} w & x\\ y & z \end{bmatrix} \begin{bmatrix} e & g\\ f & h \end{bmatrix} A=[we+xfye+zfwg+xhyg+zh]A=[wyxz][efgh]
▽ A = G ⋅ B T \triangledown A = G \cdot B^T A=GBT
8、同理Loss对B的导数为:
∂ L ∂ e = w a + y c ∂ L ∂ f = x a + z c ∂ L ∂ g = w b + y b ∂ L ∂ h = x b + z d \frac{\partial L}{\partial e} = wa + yc\\ \frac{\partial L}{\partial f} = xa + zc\\ \frac{\partial L}{\partial g} = wb + yb\\ \frac{\partial L}{\partial h} = xb + zd eL=wa+ycfL=xa+zcgL=wb+ybhL=xb+zd
▽ B = [ w a + y c x a + z c w b + y d x b + z d ] = [ a c b d ] [ w x y z ] \triangledown B = \begin{bmatrix} wa + yc & xa + zc \\ wb + yd & xb + zd \end{bmatrix} = \begin{bmatrix} a & c \\ b & d \end{bmatrix} \begin{bmatrix} w & x\\ y & z \end{bmatrix} B=[wa+ycwb+ydxa+zcxb+zd]=[abcd][wyxz]
即: ▽ B = A T ⋅ G \triangledown B = A^T \cdot G B=ATG
注:矩阵求导可以应用到BP神经网络的反向求导过程中

你可能感兴趣的:(矩阵,线性代数,算法)