矩阵求导的方法一直以来都很让我困惑,最近看了一些博客参考,得到了一些理解。
接下来尝试从最基础的地方开始讲,
在中学的时候,我们最常见到的函数是一元函数,类似这种:
f 1 ( x ) = x f 2 ( x ) = a x 2 + b f_1(x) = x \\ f_2(x) = a x^2 + b f1(x)=xf2(x)=ax2+b
它们的求导很简单:
d f 1 ( x ) d x = 1 d f 2 ( x ) d x = 2 a x \frac{\text{d}f_1(x)} {\text{d}x} = 1 \\ \frac{\text{d}f_2(x)} {\text{d}x} = 2ax dxdf1(x)=1dxdf2(x)=2ax
但是学习了线性代数之后,我们可以用线性代数的方式表述上面的两个公式:
f 1 ( x ) = [ x ] = A x = [ 1 ] x f_1(x) =\begin{bmatrix} x \end{bmatrix} = Ax = \begin{bmatrix} 1 \end{bmatrix} x f1(x)=[x]=Ax=[1]x
f 2 ( x ) = [ a x 2 + b ] = x T A x + b = x T [ a ] x + [ b ] f_2(x) = \begin{bmatrix} a x^2 + b \end{bmatrix} = x^T Ax +b =x^T \begin{bmatrix} a \end{bmatrix} x + \begin{bmatrix} b\end{bmatrix} f2(x)=[ax2+b]=xTAx+b=xT[a]x+[b]
那么问题来了,对这种表述方式如何进行求导呢?
在学习线性代数的时候,我们知道 x x x和 f ( x ) f(x) f(x)可能有如下这几种形式的数据类型:
x x x = 标量、向量和矩阵
f ( x ) f(x) f(x) = 标量、向量和矩阵
文章最开头的两个函数是 x x x和 f ( x ) f(x) f(x)都是为标量,它们的求导可以应用中学的求导法则;
但是如果 x x x和 f ( x ) f(x) f(x)是向量或者矩阵的时候,该如何对 f ( x ) f(x) f(x)进行求导,很明显中学的求导法则,不太适用在线性代数中的向量或者矩阵求导。
通过网上查资料我们可以得到一些常见的矩阵求导公式:
∂ A x ∂ x = A T ∂ x T A x ∂ x = ( A + A T ) x ∂ x T x ∂ x = 2 x \frac{\partial \mathbf{Ax}}{\partial \mathbf{x}} = \mathbf{A^T} \\ \\ \frac{\partial \mathbf{ x^T A x}}{\partial \mathbf{x}} = \mathbf{( A + A^T)x} \\ \frac{\partial \mathbf {x^T x}}{ \partial \mathbf x} = 2\mathbf x ∂x∂Ax=AT∂x∂xTAx=(A+AT)x∂x∂xTx=2x
它们求导的本质规律其实很简单:
假设有一个变量y,y可以是标量,向量和矩阵;有一个变量x,x可以是标量,向量和矩阵。那么y对x进行求导,它的规则是矩阵y中的每一个元素对矩阵x中的每一个元素进行求导。
为了说明这个规律,我们用矩阵求导种的常见类型进行举例说明:
类型 | 标量 | 向量 | 矩阵 |
---|---|---|---|
标量 | ∂ y ∂ x \frac{\partial y}{\partial x} ∂x∂y | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial x} ∂x∂y | ∂ Y ∂ x \frac{\partial \mathbf{Y}}{\partial x} ∂x∂Y |
向量 | ∂ y ∂ x \frac{\partial y}{\partial \mathbf{x}} ∂x∂y | ∂ y ∂ x \frac{\partial \mathbf{y}}{\partial \mathbf{x}} ∂x∂y | |
矩阵 | ∂ y ∂ X \frac{\partial y}{\partial \mathbf{X}} ∂X∂y |
文章开头的两个例子属于这种情况:
第一个例子:
f 1 ( x ) = [ x ] f_1(x) = \begin{bmatrix} x \end{bmatrix} f1(x)=[x]
f 1 ( x ) f_1(x) f1(x)是一个标量,它的矩阵形式中只有一个元素 x x x, f 1 ( x ) f_1(x) f1(x)对标量 x x x进行求导:
∂ f 1 ( x ) ∂ x = [ ∂ x ∂ x ] = [ 1 ] \frac{\partial f_1(x)}{ \partial x} =\begin{bmatrix} \frac{\partial x}{\partial x}\end{bmatrix} = \begin{bmatrix} 1 \end{bmatrix} ∂x∂f1(x)=[∂x∂x]=[1]
第二个例子:
f 2 ( x ) = [ a x 2 + b ] f_2(x) = \begin{bmatrix} ax^2 + b \end{bmatrix} f2(x)=[ax2+b]
f 2 ( x ) f_2(x) f2(x)是一个标量,它的矩阵形式中只有一个元素 a x 2 + b ax^2 + b ax2+b, f 2 ( x ) f_2(x) f2(x)对标量 x x x进行求导:
∂ f 2 ( x ) ∂ x = [ ∂ ( a x 2 + b ) ∂ x ] = [ 2 a x + b ] \frac{\partial f_2(x)}{ \partial x} =\begin{bmatrix} \frac{\partial (ax^2 +b)}{\partial x}\end{bmatrix} = \begin{bmatrix} 2ax +b \end{bmatrix} ∂x∂f2(x)=[∂x∂(ax2+b)]=[2ax+b]
如果 y 和 x \mathbf{y}和\mathbf x y和x 不是标量,在求导的时候有两种方式去求导,分别是分子布局和分母布局
设定向量 x = [ x 1 x 2 ⋮ x n ] \mathbf x = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} x=⎣⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎤ , y = [ y 1 y 2 ⋮ y m ] \mathbf y = \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} y=⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤ 和矩阵 X = [ x 11 x 12 … x 1 n x 21 x 22 … x 2 n … … … … x m 1 x m 2 … x m n ] \mathbf X = \begin{bmatrix}x_{11} & x_{12} & \dots & x_{1n} \\ x_{21} & x_{22} & \dots & x_{2n} \\ \dots & \dots & \dots & \dots \\ x_{m1} & x_{m2} & \dots & x_{mn} \end{bmatrix} X=⎣⎢⎢⎡x11x21…xm1x12x22…xm2…………x1nx2n…xmn⎦⎥⎥⎤
分子是标量,分母是向量。分子 y y y是标量,分母 x \mathbf x x是向量,根据求导规律矩阵 x \mathbf x x中的每一个元素对矩阵 y y y中的每一个元素进行求导可以得出最后的结果矩阵中的元素个数应该为 1 × n 1\times n 1×n。
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] \frac{\partial y}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y}{\partial x_1}& \frac{\partial y}{\partial x_2} & \vdots & \frac{\partial y}{\partial x_n} \end{bmatrix} ∂x∂y=[∂x1∂y∂x2∂y⋮∂xn∂y]
分子是向量, 分母是标量。分子 y \mathbf y y是向量,分母 x x x是标量,根据求导规律矩阵 x x x中的每一个元素对矩阵 y \mathbf y y中的每一个元素进行求导可以得出最后的结果矩阵中的元素个数应该为 m × 1 m\times 1 m×1。
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x … ∂ y m ∂ x ] \frac{\partial \mathbf{y}}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x} \\ \dots \\ \frac{\partial y_m}{\partial x} \end{bmatrix} ∂x∂y=⎣⎢⎢⎡∂x∂y1∂x∂y2…∂x∂ym⎦⎥⎥⎤
分子是向量,分母是向量。分子 y \mathbf y y是向量,分母 x \mathbf x x是向量,根据求导规律矩阵 x \mathbf x x中的每一个元素对矩阵 y \mathbf y y中的每一个元素进行求导可以得出最后的结果矩阵中的元素个数应该为 m × n m\times n m×n。
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 … ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 … ∂ y 2 ∂ x n … … … … ∂ y m ∂ x n ∂ y m ∂ x n … ∂ y m ∂ x n ] \frac{\partial \mathbf{y}}{\partial \mathbf x} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1}& \frac{\partial y_2}{\partial x_1} & \dots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1}& \frac{\partial y_2}{\partial x_2} & \dots & \frac{\partial y_2}{\partial x_n} \\\dots & \dots & \dots & \dots \\ \frac{\partial y_m}{\partial x_n}& \frac{\partial y_m}{\partial x_n} & \dots & \frac{\partial y_m}{\partial x_n} \end{bmatrix} ∂x∂y=⎣⎢⎢⎢⎡∂x1∂y1∂x1∂y2…∂xn∂ym∂x1∂y2∂x2∂y2…∂xn∂ym…………∂xn∂y1∂xn∂y2…∂xn∂ym⎦⎥⎥⎥⎤
分子是标量,分母是矩阵。分子 y \mathbf y y是向量,分母 X \mathbf X X是矩阵,根据求导规律矩阵 X \mathbf X X中的每一个元素对矩阵 y \mathbf y y中的每一个元素进行求导可以得出最后的结果矩阵中的元素个数应该为 1 × m × n 1\times m\times n 1×m×n。
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 … ∂ y ∂ x 1 m ∂ y ∂ x 21 ∂ y ∂ x 22 … ∂ y ∂ x 2 m … … … … ∂ y ∂ x n 1 ∂ y ∂ x n 2 … ∂ y ∂ x n m ] \frac{\partial y}{\partial \mathbf X} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}}& \frac{\partial y}{\partial x_{12}} & \dots & \frac{\partial y}{\partial x_{1m}} \\ \frac{\partial y}{\partial x_{21}}& \frac{\partial y}{\partial x_{22}} & \dots & \frac{\partial y}{\partial x_{2m}} \\\dots & \dots & \dots & \dots \\ \frac{\partial y}{\partial x_{n1}}& \frac{\partial y}{\partial x_{n2}} & \dots & \frac{\partial y}{\partial x_{nm}} \end{bmatrix} ∂X∂y=⎣⎢⎢⎢⎡∂x11∂y∂x21∂y…∂xn1∂y∂x12∂y∂x22∂y…∂xn2∂y…………∂x1m∂y∂x2m∂y…∂xnm∂y⎦⎥⎥⎥⎤
通过观测可以发现,分子布局求导结果的装置就是分母布局求导的结果
这部分主要想对常用的矩阵求导公式应用分母布局进行推导:
∂ A x ∂ x = A T \frac{\partial \mathbf{Ax}}{\partial \mathbf{x}} = A^T ∂x∂Ax=AT 推导
这里的 A = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n … … … … a m 1 a m 2 … a m n ] \mathbf A =\begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \dots & \dots & \dots & \dots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{bmatrix} A=⎣⎢⎢⎡a11a21…am1a12a22…am2…………a1na2n…amn⎦⎥⎥⎤ 和 x = [ x 1 x 2 ⋮ x n ] \mathbf x = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} x=⎣⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎤
A x = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n … a m 1 x 1 + a m 2 x 2 + ⋯ + a m n x n ] m × 1 \mathbf{Ax} = \begin{bmatrix} a_{11} x_{1} + a_{12} x_{2}+ \dots + a_{1n}x_{n} \\ a_{21} x_{1} + a_{22} x_{2}+ \dots + a_{2n}x_{n} \\ \dots \\ a_{m1} x_{1} + a_{m2} x_{2}+ \dots + a_{mn}x_{n} \end{bmatrix}_{m\times 1} Ax=⎣⎢⎢⎡a11x1+a12x2+⋯+a1nxna21x1+a22x2+⋯+a2nxn…am1x1+am2x2+⋯+amnxn⎦⎥⎥⎤m×1
可以看出 A x \mathbf{Ax} Ax是一个向量,这里应用分母布局的分母是向量,分子是向量的求导展开,可以得到:
∂ A x ∂ x = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n x 1 a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n x 1 … a m 1 x 1 + a m 2 x 2 + ⋯ + a m n x n x 1 a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n x 2 a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n x 2 … a m 1 x 1 + a m 2 x 2 + ⋯ + a m n x n x 2 … … … … a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n x n … a m 1 x 1 + a m 2 x 2 + ⋯ + a m n x n x n ] = [ a 11 a 21 … a m 1 a 12 a 22 … a m 2 a 1 n a 2 n … a m n ] = A T \frac{\partial \mathbf{Ax}}{\partial \mathbf{x}} =\begin{bmatrix} \frac{ a_{11} x_{1} + a_{12} x_{2}+ \dots + a_{1n}x_{n}}{x_1} & \frac{a_{21} x_{1} + a_{22} x_{2}+ \dots + a_{2n}x_{n} }{x_1} & \dots & \frac{ a_{m1} x_{1} + a_{m2} x_{2}+ \dots + a_{mn}x_{n} }{ x_1} \\ \frac{ a_{11} x_{1} + a_{12} x_{2}+ \dots + a_{1n}x_{n}}{x_2} & \frac{a_{21} x_{1} + a_{22} x_{2}+ \dots + a_{2n}x_{n} }{x_2} & \dots & \frac{ a_{m1} x_{1} + a_{m2} x_{2}+ \dots + a_{mn}x_{n} }{ x_2} \\ \dots & \dots & \dots& \dots \\ \frac{a_{11} x_{1} + a_{12} x_{2}+ \dots + a_{1n}x_{n} }{x_n} & \frac{a_{21} x_{1} + a_{22} x_{2}+ \dots + a_{2n}x_{n} }{x_n} & \dots & \frac{a_{m1} x_{1} + a_{m2} x_{2}+ \dots + a_{mn}x_{n}}{x_n} \end{bmatrix} \\ =\begin{bmatrix} a_{11} & a_{21} & \dots & a_{m1} \\ a_{12} & a_{22} & \dots & a_{m2} \\ a_{1n} & a_{2n} & \dots & a_{mn}\end{bmatrix} = A^T ∂x∂Ax=⎣⎢⎢⎡x1a11x1+a12x2+⋯+a1nxnx2a11x1+a12x2+⋯+a1nxn…xna11x1+a12x2+⋯+a1nxnx1a21x1+a22x2+⋯+a2nxnx2a21x1+a22x2+⋯+a2nxn…xna21x1+a22x2+⋯+a2nxn…………x1am1x1+am2x2+⋯+amnxnx2am1x1+am2x2+⋯+amnxn…xnam1x1+am2x2+⋯+amnxn⎦⎥⎥⎤=⎣⎡a11a12a1na21a22a2n………am1am2amn⎦⎤=AT
∂ x T A x ∂ x = ( A + A T ) x \frac{\partial \mathbf{x^T A x}}{\partial \mathbf{x}} =\mathbf{ ( A + A^T)x} ∂x∂xTAx=(A+AT)x 推导
同样,这里的 A = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n … … … … a n 1 a n 2 … a n n ] \mathbf A =\begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \dots & \dots & \dots & \dots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{bmatrix} A=⎣⎢⎢⎡a11a21…an1a12a22…an2…………a1na2n…ann⎦⎥⎥⎤ 和 x = [ x 1 x 2 ⋮ x n ] \mathbf x = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} x=⎣⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎤
从右向左计算 x T A x \mathbf{x^TAx} xTAx
A x = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n ] n × 1 \mathbf{Ax} = \begin{bmatrix} a_{11} x_{1} + a_{12} x_{2}+ \dots + a_{1n}x_{n} \\ a_{21} x_{1} + a_{22} x_{2}+ \dots + a_{2n}x_{n} \\ \dots \\ a_{n1} x_{1} + a_{n2} x_{2}+ \dots + a_{nn}x_{n} \end{bmatrix}_{n\times 1} Ax=⎣⎢⎢⎡a11x1+a12x2+⋯+a1nxna21x1+a22x2+⋯+a2nxn…an1x1+an2x2+⋯+annxn⎦⎥⎥⎤n×1
x T A x = [ x 1 x 2 … x n ] 1 × n [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x 1 x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n ] n × 1 = [ ( a 11 x 1 2 + a 12 x 1 x 2 + ⋯ + a 1 n x 1 x n ) + ( a 21 x 1 x 2 + a 22 x 2 2 + ⋯ + a 2 n x 2 x n ) + ⋯ + ( a n 1 x 1 x n + a n 2 x 2 x n + ⋯ + a n n x n 2 ) ] 1 × 1 \mathbf{x^TAx} = \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix}_{1\times n} \begin{bmatrix} a_{11} x_{1} + a_{12} x_{2}+ \dots + a_{1n} x_1 x_{n} \\ a_{21} x_{1} + a_{22} x_{2}+ \dots + a_{2n}x_{n} \\ \dots \\ a_{n1} x_{1} + a_{n2} x_{2}+ \dots + a_{nn}x_{n} \end{bmatrix}_{n\times 1}\\=\begin{bmatrix} (a_{11} x_{1}^2 + a_{12}x_1 x_{2}+ \dots + a_{1n}x_1x_{n}) + (a_{21} x_{1}x_2 + a_{22} x_{2}^2+ \dots + a_{2n}x_2x_{n} ) + \dots + ( a_{n1} x_{1} x_n+ a_{n2} x_{2}x_n+ \dots + a_{nn}x_{n}^2)\end{bmatrix}_{1\times 1} xTAx=[x1x2…xn]1×n⎣⎢⎢⎡a11x1+a12x2+⋯+a1nx1xna21x1+a22x2+⋯+a2nxn…an1x1+an2x2+⋯+annxn⎦⎥⎥⎤n×1=[(a11x12+a12x1x2+⋯+a1nx1xn)+(a21x1x2+a22x22+⋯+a2nx2xn)+⋯+(an1x1xn+an2x2xn+⋯+annxn2)]1×1
可以看出 x T A x \mathbf{x^TAx} xTAx是一个标量,这里应用分母布局的分母是向量,分子是标量的求导展开,可以得到:
∂ x T A x ∂ x = [ ∂ [ a 11 x 1 2 + a 12 x 1 x 2 + ⋯ + a 1 n x 1 x n ) + ( a 21 x 1 x 2 + a 22 x 2 2 + ⋯ + a 2 n x 2 x n ) + ⋯ + ( a n 1 x 1 x n + a n 2 x 2 x n + ⋯ + a n n x n 2 ) ] ∂ x 1 ∂ [ ( a 11 x 1 2 + a 12 x 1 x 2 + ⋯ + a 1 n x 1 x n ) + ( a 21 x 1 x 2 + a 22 x 2 2 + ⋯ + a 2 n x 2 x n ) + ⋯ + ( a n 1 x 1 x n + a n 2 x 2 x n + ⋯ + a n n x n 2 ) ] ∂ x 2 … ∂ [ ( a 11 x 1 2 + a 12 x 1 x 2 + ⋯ + a 1 n x 1 x n ) + ( a 21 x 1 x 2 + a 22 x 2 2 + ⋯ + a 2 n x 2 x n ) + ⋯ + ( a n 1 x 1 x n + a n 2 x 2 x n + ⋯ + a n n x n 2 ) ] ∂ x n ] = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n … … … … a n 1 a n 2 … a n n ] + [ a 11 a 21 … a n 1 a 12 a 22 … a n 2 … … … … a n 1 a n 2 … a n n ] = ( A + A T ) x \frac{\partial \mathbf{x^TAx}}{\partial \mathbf{x}} =\begin{bmatrix} \frac{\partial [a_{11} x_{1}^2 + a_{12}x_1 x_{2}+ \dots + a_{1n}x_1x_{n}) + (a_{21} x_{1}x_2 + a_{22} x_{2}^2+ \dots + a_{2n}x_2x_{n} ) + \dots + ( a_{n1} x_{1} x_n+ a_{n2} x_{2}x_n+ \dots + a_{nn}x_{n}^2)]}{\partial x_1} \\ \frac{\partial[(a_{11} x_{1}^2 + a_{12}x_1 x_{2}+ \dots + a_{1n}x_1x_{n}) + (a_{21} x_{1}x_2 + a_{22} x_{2}^2+ \dots + a_{2n}x_2x_{n} ) + \dots + ( a_{n1} x_{1} x_n+ a_{n2} x_{2}x_n+ \dots + a_{nn}x_{n}^2)]}{\partial x_2} \\ \dots \\ \frac{\partial [(a_{11} x_{1}^2 + a_{12}x_1 x_{2}+ \dots + a_{1n}x_1x_{n}) + (a_{21} x_{1}x_2 + a_{22} x_{2}^2+ \dots + a_{2n}x_2x_{n} ) + \dots + ( a_{n1} x_{1} x_n+ a_{n2} x_{2}x_n+ \dots + a_{nn}x_{n}^2)]}{\partial x_n}\end{bmatrix} \\= \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \dots & \dots & \dots & \dots \\ a_{n1} & a_{n2} & \dots & a_{nn}\end{bmatrix} + \begin{bmatrix} a_{11} & a_{21} & \dots & a_{n1} \\ a_{12} & a_{22} & \dots & a_{n2} \\ \dots & \dots & \dots & \dots \\ a_{n1} & a_{n2} & \dots & a_{nn}\end{bmatrix} = \mathbf{(A + A^T) x} ∂x∂xTAx=⎣⎢⎢⎢⎢⎡∂x1∂[a11x12+a12x1x2+⋯+a1nx1xn)+(a21x1x2+a22x22+⋯+a2nx2xn)+⋯+(an1x1xn+an2x2xn+⋯+annxn2)]∂x2∂[(a11x12+a12x1x2+⋯+a1nx1xn)+(a21x1x2+a22x22+⋯+a2nx2xn)+⋯+(an1x1xn+an2x2xn+⋯+annxn2)]…∂xn∂[(a11x12+a12x1x2+⋯+a1nx1xn)+(a21x1x2+a22x22+⋯+a2nx2xn)+⋯+(an1x1xn+an2x2xn+⋯+annxn2)]⎦⎥⎥⎥⎥⎤=⎣⎢⎢⎡a11a21…an1a12a22…an2…………a1na2n…ann⎦⎥⎥⎤+⎣⎢⎢⎡a11a12…an1a21a22…an2…………an1an2…ann⎦⎥⎥⎤=(A+AT)x
∂ x T x ∂ x = 2 x \frac{\partial \mathbf{x^T x}}{\partial \mathbf{x}} =2\mathbf{ x} ∂x∂xTx=2x 推导
同样,这里的 x = [ x 1 x 2 ⋮ x n ] \mathbf x = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} x=⎣⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎤
那么:
x T x = [ x 1 x 2 … x n ] 1 × n [ x 1 x 2 ⋮ x n ] n × 1 = [ x 1 x 1 + x 2 x 2 + ⋯ + x n x n ] 1 × 1 \mathbf{x^T x} = \begin{bmatrix}x_1 & x_2 & \dots & x_n \end{bmatrix}_{1\times n} \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}_{n\times 1} = \begin{bmatrix}x_1 x_1 + x_2 x_2 + \dots + x_n x_n\end{bmatrix}_{1\times1} xTx=[x1x2…xn]1×n⎣⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎤n×1=[x1x1+x2x2+⋯+xnxn]1×1
可以看出 x T x \mathbf{x^Tx} xTx是一个标量,这里应用分母布局的分母是向量,分子是标量的求导展开,可以得到:
∂ x T x ∂ x = [ x 1 x 1 + x 2 x 2 + ⋯ + x n x n ∂ x 1 x 1 x 1 + x 2 x 2 + ⋯ + x n x n ∂ x 2 ⋮ x 1 x 1 + x 2 x 2 + ⋯ + x n x n ∂ x n ] = [ 2 x 1 2 x 2 ⋮ 2 x n ] = 2 x \frac{\partial \mathbf{x^T x}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{x_1 x_1 + x_2 x_2 + \dots + x_n x_n}{\partial x_1} \\ \frac{x_1 x_1 + x_2 x_2 + \dots + x_n x_n}{\partial x_2} \\ \vdots \\ \frac{x_1 x_1 + x_2 x_2 + \dots + x_n x_n}{\partial x_n}\end{bmatrix} = \begin{bmatrix} 2x_1 \\ 2x_2 \\ \vdots \\ 2x_n \end{bmatrix} = 2\mathbf x ∂x∂xTx=⎣⎢⎢⎢⎡∂x1x1x1+x2x2+⋯+xnxn∂x2x1x1+x2x2+⋯+xnxn⋮∂xnx1x1+x2x2+⋯+xnxn⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡2x12x2⋮2xn⎦⎥⎥⎥⎤=2x