矩阵向量求导不过是一种方便的计算形式,用来保存一系列偏导结果的符号,具体形式如下(A(t)是矩阵,t是标量):
A ( t ) = [ a 11 ( t ) a 12 ( t ) a 13 ( t ) a 21 ( t ) a 22 ( t ) a 23 ( t ) a 31 ( t ) a 32 ( t ) a 33 ( t ) ] A(t)= \left[ \begin{matrix} a_{11}(t) & a_{12}(t) & a_{13}(t) \\ a_{21}(t) & a_{22}(t) & a_{23}(t) \\ a_{31}(t) & a_{32}(t) & a_{33}(t) \end{matrix} \right] A(t)=⎣⎡a11(t)a21(t)a31(t)a12(t)a22(t)a32(t)a13(t)a23(t)a33(t)⎦⎤
(1) ∂ A ( t ) ∂ t = [ a 11 ′ ( t ) a 12 ′ ( t ) a 13 ′ ( t ) a 21 ′ ( t ) a 22 ′ ( t ) a 23 ′ ( t ) a 31 ′ ( t ) a 32 ′ ( t ) a 33 ′ ( t ) ] \frac{\partial A(t)}{\partial t} = \left[ \begin{matrix} a'_{11}(t) & a'_{12}(t) & a'_{13}(t) \\ a'_{21}(t) & a'_{22}(t) & a'_{23}(t) \\ a'_{31}(t) & a'_{32}(t) & a'_{33}(t) \end{matrix} \right] \tag{1} ∂t∂A(t)=⎣⎡a11′(t)a21′(t)a31′(t)a12′(t)a22′(t)a32′(t)a13′(t)a23′(t)a33′(t)⎦⎤(1)
这种矩阵向量求导直观来看就是把分散求导聚集起来,用一个符号一次性表示,没有什么复杂的地方,所以我在分类时把它分成了高等数学领域,这篇blog会介绍这种求导方式。 ( 1 ) (1) (1)矩阵求导和下面的 ( 2 ) (2) (2)微分方程并不一样(A是矩阵,u是向量):
(2) ∂ u ∂ t = A u \frac{\partial u}{\partial t}=Au \tag{2} ∂t∂u=Au(2)
( 2 ) (2) (2)在矩阵论中用来求解微分方程,结果大体长这样: u = e A t u ( 0 ) u=e^{At}u(0) u=eAtu(0)。我们也比较容易区分这两种情况, ( 1 ) (1) (1)求导计算, ( 2 ) (2) (2)是微分方程, ( 1 ) (1) (1)中是对A求导, ( 2 ) (2) (2)中A是一个系数矩阵。关于 ( 2 ) (2) (2),我会写一个矩阵的专题单独介绍。
我们通过自变量和因变量来分类矩阵向量求导,自变量和因变量均可以是标量、向量或者矩阵,一共9种情况,见下图,但是向量与矩阵的两种情况,以及矩阵与矩阵的一种情况比较复杂,结果需要用张量(高维矩阵)表示,所以暂未讨论。
自变量\因变量 | 标量 | 向量 | 矩阵 |
---|---|---|---|
标量 | 简单 | 同(1) | 同(1) |
向量 | 同(1) | 重点讨论 | 复杂 |
矩阵 | 同(1) | 复杂 | 复杂 |
u ( t ) = [ u 1 ( t ) , u 2 ( t ) , u 3 ( t ) ] u(t)=[u_{1}(t),~u_{2}(t),~u_{3}(t)] u(t)=[u1(t), u2(t), u3(t)]
∂ u ( t ) ∂ t = [ u 1 ′ ( t ) , u 2 ′ ( t ) , u 3 ′ ( t ) ] \frac{\partial u(t)}{\partial t} = [u'_{1}(t) , ~u'_{2}(t),~ u'_{3}(t) ] ∂t∂u(t)=[u1′(t), u2′(t), u3′(t)]
或者
u ( t ) = [ u 1 ( t ) , u 2 ( t ) , u 3 ( t ) ] T u(t)=[u_{1}(t),~u_{2}(t),~u_{3}(t)]^{T} u(t)=[u1(t), u2(t), u3(t)]T
∂ u ( t ) ∂ t = [ u 1 ′ ( t ) , u 2 ′ ( t ) , u 3 ′ ( t ) ] T \frac{\partial u(t)}{\partial t} = [u'_{1}(t) , ~u'_{2}(t),~ u'_{3}(t) ]^{T} ∂t∂u(t)=[u1′(t), u2′(t), u3′(t)]T
A ( t ) = [ a 11 ( t ) a 12 ( t ) a 13 ( t ) a 21 ( t ) a 22 ( t ) a 23 ( t ) a 31 ( t ) a 32 ( t ) a 33 ( t ) ] A(t)= \left[ \begin{matrix} a_{11}(t) & a_{12}(t) & a_{13}(t) \\ a_{21}(t) & a_{22}(t) & a_{23}(t) \\ a_{31}(t) & a_{32}(t) & a_{33}(t) \end{matrix} \right] A(t)=⎣⎡a11(t)a21(t)a31(t)a12(t)a22(t)a32(t)a13(t)a23(t)a33(t)⎦⎤
∂ A ( t ) ∂ t = [ a 11 ′ ( t ) a 12 ′ ( t ) a 13 ′ ( t ) a 21 ′ ( t ) a 22 ′ ( t ) a 23 ′ ( t ) a 31 ′ ( t ) a 32 ′ ( t ) a 33 ′ ( t ) ] \frac{\partial A(t)}{\partial t} = \left[ \begin{matrix} a'_{11}(t) & a'_{12}(t) & a'_{13}(t) \\ a'_{21}(t) & a'_{22}(t) & a'_{23}(t) \\ a'_{31}(t) & a'_{32}(t) & a'_{33}(t) \end{matrix} \right] ∂t∂A(t)=⎣⎡a11′(t)a21′(t)a31′(t)a12′(t)a22′(t)a32′(t)a13′(t)a23′(t)a33′(t)⎦⎤
u ( t ) = [ u 1 ( t ) , u 2 ( t ) , u 3 ( t ) ] u(t)=[u_{1}(t),~u_{2}(t),~u_{3}(t)] u(t)=[u1(t), u2(t), u3(t)]
∂ t ∂ u ( t ) = [ ∂ t ∂ u 1 ( t ) , ∂ t ∂ u 2 ( t ) , ∂ t ∂ u 3 ( t ) ] \frac{\partial t}{\partial u(t)} = [\frac{\partial t}{\partial u_{1}(t)} , ~\frac{\partial t}{\partial u_{2}(t)} ,~ \frac{\partial t}{\partial u_{3}(t)} ] ∂u(t)∂t=[∂u1(t)∂t, ∂u2(t)∂t, ∂u3(t)∂t]
或者
u ( t ) = [ u 1 ( t ) , u 2 ( t ) , u 3 ( t ) ] T u(t)=[u_{1}(t),~u_{2}(t),~u_{3}(t)]^{T} u(t)=[u1(t), u2(t), u3(t)]T
∂ t ∂ u ( t ) = [ ∂ t ∂ u 1 ( t ) , ∂ t ∂ u 2 ( t ) , ∂ t ∂ u 3 ( t ) ] T \frac{\partial t}{\partial u(t)} = [\frac{\partial t}{\partial u_{1}(t)} , ~\frac{\partial t}{\partial u_{2}(t)} ,~ \frac{\partial t}{\partial u_{3}(t)} ]^{T} ∂u(t)∂t=[∂u1(t)∂t, ∂u2(t)∂t, ∂u3(t)∂t]T
A ( t ) = [ a 11 ( t ) a 12 ( t ) a 13 ( t ) a 21 ( t ) a 22 ( t ) a 23 ( t ) a 31 ( t ) a 32 ( t ) a 33 ( t ) ] A(t)= \left[ \begin{matrix} a_{11}(t) & a_{12}(t) & a_{13}(t) \\ a_{21}(t) & a_{22}(t) & a_{23}(t) \\ a_{31}(t) & a_{32}(t) & a_{33}(t) \end{matrix} \right] A(t)=⎣⎡a11(t)a21(t)a31(t)a12(t)a22(t)a32(t)a13(t)a23(t)a33(t)⎦⎤
∂ t ∂ A ( t ) = [ ∂ t ∂ a 11 ( t ) ∂ t ∂ a 12 ( t ) ∂ t ∂ a 13 ( t ) ∂ t ∂ a 21 ( t ) ∂ t ∂ a 22 ( t ) ∂ t ∂ a 23 ( t ) ∂ t ∂ a 31 ( t ) ∂ t ∂ a 32 ( t ) ∂ t ∂ a 33 ( t ) ] \frac{\partial t}{\partial A(t)} = \left[ \begin{matrix} \frac{\partial t}{\partial a_{11}(t)} & \frac{\partial t}{\partial a_{12}(t)} & \frac{\partial t}{\partial a_{13}(t)} \\ \frac{\partial t}{\partial a_{21}(t)} & \frac{\partial t}{\partial a_{22}(t)} & \frac{\partial t}{\partial a_{23}(t)} \\ \frac{\partial t}{\partial a_{31}(t)} & \frac{\partial t}{\partial a_{32}(t)} & \frac{\partial t}{\partial a_{33}(t)} \end{matrix} \right] ∂A(t)∂t=⎣⎢⎡∂a11(t)∂t∂a21(t)∂t∂a31(t)∂t∂a12(t)∂t∂a22(t)∂t∂a32(t)∂t∂a13(t)∂t∂a23(t)∂t∂a33(t)∂t⎦⎥⎤
我们先介绍分子布局以及分母布局。
分子布局: ∂ y ∂ x = 列 向 量 行 向 量 \frac{\partial y}{\partial x}=\frac{列向量}{行向量} ∂x∂y=行向量列向量
y = [ y 1 , y 2 , y 3 ] T y=[y_{1},~y_{2},~y_{3}]^{T} y=[y1, y2, y3]T
x = [ x 1 , x 2 , x 3 ] x=[x_{1},~x_{2},~x_{3}] x=[x1, x2, x3]
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 1 ∂ y 3 ∂ x 2 ∂ y 3 ∂ x 3 ] \frac{\partial y}{\partial x}= \left[ \begin{matrix} \frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \frac{\partial y_{1}}{\partial x_{3}} \\ \frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \frac{\partial y_{2}}{\partial x_{3}} \\ \frac{\partial y_{3}}{\partial x_{1}} & \frac{\partial y_{3}}{\partial x_{2}} & \frac{\partial y_{3}}{\partial x_{3}} \end{matrix} \right] ∂x∂y=⎣⎢⎡∂x1∂y1∂x1∂y2∂x1∂y3∂x2∂y1∂x2∂y2∂x2∂y3∂x3∂y1∂x3∂y2∂x3∂y3⎦⎥⎤
分母布局: ∂ y ∂ x = 行 向 量 列 向 量 \frac{\partial y}{\partial x}=\frac{行向量}{列向量} ∂x∂y=列向量行向量
y = [ y 1 , y 2 , y 3 ] y=[y_{1},~y_{2},~y_{3}] y=[y1, y2, y3]
x = [ x 1 , x 2 , x 3 ] T x=[x_{1},~x_{2},~x_{3}]^{T} x=[x1, x2, x3]T
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ∂ y 3 ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ∂ y 3 ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 3 ] \frac{\partial y}{\partial x}= \left[ \begin{matrix} \frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{3}}{\partial x_{1}} \\ \frac{\partial y_{1}}{\partial x_{2}} & \frac{\partial y_{2}}{\partial x_{2}} & \frac{\partial y_{3}}{\partial x_{2}} \\ \frac{\partial y_{1}}{\partial x_{3}} & \frac{\partial y_{2}}{\partial x_{3}} & \frac{\partial y_{3}}{\partial x_{3}} \end{matrix} \right] ∂x∂y=⎣⎢⎡∂x1∂y1∂x2∂y1∂x3∂y1∂x1∂y2∂x2∂y2∂x3∂y2∂x1∂y3∂x2∂y3∂x3∂y3⎦⎥⎤
有时就算在同一篇论文或书中,这两种布局也会混着用,所以要结合上下文来看。在周志华《机器学习方法》书中,多数都是分母布局。
有些说法觉得矩阵与标量以及向量与标量的导数运算也要考虑分子分母布局的问题,但我觉得一个行向量对标量求导,考虑分子布局,结果是个列向量,总觉得很怪,所以我只在向量与向量求导中使用布局,如果哪天碰到问题回来改。
a,x是向量,a与x无关,A,B是矩阵,使用分母布局
∂ x T x ∂ x = 2 x \frac{\partial x^{T}x}{\partial x}=2x ∂x∂xTx=2x
∂ x T a ∂ x = ∂ a T x ∂ x = a \frac{\partial x^{T}a}{\partial x}=\frac{\partial a^{T}x}{\partial x}=a ∂x∂xTa=∂x∂aTx=a
∂ A B ∂ x = ∂ A ∂ x B + A ∂ B ∂ x \frac{\partial AB}{\partial x}=\frac{\partial A}{\partial x}B+A\frac{\partial B}{\partial x} ∂x∂AB=∂x∂AB+A∂x∂B
参考文献:
《机器学习》周志华
https://blog.csdn.net/uncle_gy/article/details/78879131
https://en.wikipedia.org/wiki/Matrix_calculus