在数学中,矩阵分析是多变量分析在矩阵空间上的一个专有概念,囊括了实值函数对于多变量,以及多值函数对单变量的偏导数的问题,其中向量和矩阵被当做整体对待。
简单来说,矩阵分析研究标量、向量、矩阵关于标量、向量、矩阵的导数,对于函数 f ( x 1 , x 2 , x 3 ) f(x_1,x_2,x_3) f(x1,x2,x3),下面是一个简单的例子
∇ f = ∂ f ∂ x = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ∂ f ∂ x 3 ] T \nabla f=\dfrac{\partial f}{\partial x}=[\frac{\partial f}{\partial x_1}\ \ \ \ \frac{\partial f}{\partial x_2}\ \ \ \ \frac{\partial f}{\partial x_3}]^T ∇f=∂x∂f=[∂x1∂f ∂x2∂f ∂x3∂f]T
这是一个标量关于向量的偏导数,按照上述的说法,我们可以求标量关于矩阵、标量关于向量、标量关于矩阵、向量关于标量…等等9种不同类别的偏导数,但这些偏导数最终不一定能在二维以内表示,比如向量函数 f = A x f=Ax f=Ax关于矩阵 A A A的偏导数,就是一个三维张量,下表列出了可以在二维内表示的偏导数形式
类型 | 标量 | 向量 | 矩阵 |
---|---|---|---|
标量 | ∂ y ∂ x \dfrac{\partial y}{\partial x} ∂x∂y | ∂ y ∂ x \dfrac{\partial \textbf{y}}{\partial x} ∂x∂y | ∂ Y ∂ x \dfrac{\partial \textbf{Y}}{\partial x} ∂x∂Y |
向量 | ∂ y ∂ x \dfrac{\partial y}{\partial \textbf{x}} ∂x∂y | ∂ y ∂ x \dfrac{\partial \textbf{y}}{\partial \textbf{x}} ∂x∂y | |
矩阵 | ∂ y ∂ X \dfrac{\partial y}{\partial \textbf{X}} ∂X∂y |
矩阵求导有两种布局方式——分子布局和分母布局,二者之间的区别就是差了一个转秩
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ] . \frac{\partial y}{\partial \mathbf{x}} = \left[ \frac{\partial y}{\partial x_1} \frac{\partial y}{\partial x_2} \cdots \frac{\partial y}{\partial x_n} \right]. ∂x∂y=[∂x1∂y∂x2∂y⋯∂xn∂y].
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] . \frac{\partial \mathbf{y}}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x}\\ \vdots\\ \frac{\partial y_m}{\partial x}\\ \end{bmatrix}. ∂x∂y=⎣⎢⎢⎢⎡∂x∂y1∂x∂y2⋮∂x∂ym⎦⎥⎥⎥⎤.
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] . \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}. ∂x∂y=⎣⎢⎢⎢⎢⎡∂x1∂y1∂x1∂y2⋮∂x1∂ym∂x2∂y1∂x2∂y2⋮∂x2∂ym⋯⋯⋱⋯∂xn∂y1∂xn∂y2⋮∂xn∂ym⎦⎥⎥⎥⎥⎤.
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 21 ⋯ ∂ y ∂ x p 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ⋯ ∂ y ∂ x p 2 ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 q ∂ y ∂ x 2 q ⋯ ∂ y ∂ x p q ] . \frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}. ∂X∂y=⎣⎢⎢⎢⎢⎡∂x11∂y∂x12∂y⋮∂x1q∂y∂x21∂y∂x22∂y⋮∂x2q∂y⋯⋯⋱⋯∂xp1∂y∂xp2∂y⋮∂xpq∂y⎦⎥⎥⎥⎥⎤.
以下两种定义只有分子布局形式:
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] . \frac{\partial \mathbf{Y}}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x}\\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x}\\ \end{bmatrix}. ∂x∂Y=⎣⎢⎢⎢⎡∂x∂y11∂x∂y21⋮∂x∂ym1∂x∂y12∂x∂y22⋮∂x∂ym2⋯⋯⋱⋯∂x∂y1n∂x∂y2n⋮∂x∂ymn⎦⎥⎥⎥⎤.
d X = [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋱ ⋮ d x m 1 d x m 2 ⋯ d x m n ] . d\mathbf{X} = \begin{bmatrix} dx_{11} & dx_{12} & \cdots & dx_{1n}\\ dx_{21} & dx_{22} & \cdots & dx_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ dx_{m1} & dx_{m2} & \cdots & dx_{mn}\\ \end{bmatrix}. dX=⎣⎢⎢⎢⎡dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋱⋯dx1ndx2n⋮dxmn⎦⎥⎥⎥⎤.
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] . \frac{\partial y}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y}{\partial x_1}\\ \frac{\partial y}{\partial x_2}\\ \vdots\\ \frac{\partial y}{\partial x_n}\\ \end{bmatrix}. ∂x∂y=⎣⎢⎢⎢⎢⎡∂x1∂y∂x2∂y⋮∂xn∂y⎦⎥⎥⎥⎥⎤.
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] . \frac{\partial \mathbf{y}}{\partial x} = \left[ \frac{\partial y_1}{\partial x} \frac{\partial y_2}{\partial x} \cdots \frac{\partial y_m}{\partial x} \right]. ∂x∂y=[∂x∂y1∂x∂y2⋯∂x∂ym].
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ⋯ ∂ y m ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ⋯ ∂ y m ∂ x n ] . \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1}\\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}. ∂x∂y=⎣⎢⎢⎢⎢⎡∂x1∂y1∂x2∂y1⋮∂xn∂y1∂x1∂y2∂x2∂y2⋮∂xn∂y2⋯⋯⋱⋯∂x1∂ym∂x2∂ym⋮∂xn∂ym⎦⎥⎥⎥⎥⎤.
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 ⋯ ∂ y ∂ x 1 q ∂ y ∂ x 21 ∂ y ∂ x 22 ⋯ ∂ y ∂ x 2 q ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x p 1 ∂ y ∂ x p 2 ⋯ ∂ y ∂ x p q ] . \frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1q}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2q}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}. ∂X∂y=⎣⎢⎢⎢⎢⎡∂x11∂y∂x21∂y⋮∂xp1∂y∂x12∂y∂x22∂y⋮∂xp2∂y⋯⋯⋱⋯∂x1q∂y∂x2q∂y⋮∂xpq∂y⎦⎥⎥⎥⎥⎤.
当分子分母是向量或标量时,两种布局的不同在于 ∂ y i ∂ x j \dfrac{\partial y_i}{\partial \textbf{x}_j} ∂xj∂yi的位置,若设偏导数矩阵为 C = [ c i j ] C=[c_{ij}] C=[cij],就有
通常来说,我们在机器学习中求导往往是标量关于权重向量或矩阵的导数,在约定向量为列向量的情况下,采用分母布局可以使得偏导数向量的各元素与原向量各个元素相对应,可以直接执行各种基于梯度的优化方法,因此,在这里我们主要研究分母布局。
摘录一些常用的公式(分母布局)
∂ A x ∂ x = A T \dfrac{\partial Ax}{\partial x}=A^T ∂x∂Ax=AT
∂ x T A ∂ x = A \dfrac{\partial x^TA}{\partial x}=A ∂x∂xTA=A
∂ A u ∂ x = ∂ u ∂ x A ⊤ \dfrac{\partial Au}{\partial x}=\dfrac{\partial u}{\partial x}A^{\top} ∂x∂Au=∂x∂uA⊤
∂ g ( u ) ∂ x = ∂ u ∂ x ∂ g ( u ) ∂ u \dfrac{\partial \mathbf{g}(\mathbf{u})}{\partial x}=\dfrac{\partial \mathbf{u}}{\partial x}\dfrac{\partial \mathbf{g(u)}}{\partial \mathbf{u}} ∂x∂g(u)=∂x∂u∂u∂g(u)
∂ u ⊤ v ∂ x = ∂ u ∂ x v + ∂ v ∂ x u \dfrac{\partial \mathbf{u}^{\top}\mathbf{v}}{\partial \mathbf{x}}=\dfrac{\partial \mathbf{u}}{\partial \mathbf{x}}\mathbf{v}+\dfrac{\partial \mathbf{v}}{\partial \mathbf{x}}\mathbf{u} ∂x∂u⊤v=∂x∂uv+∂x∂vu
∂ x ⊤ A x ∂ x = ( A + A ⊤ ) x \dfrac{\partial \mathbf{x^{\top}Ax}}{\partial \mathbf{x}}=\mathbf{(A+A^{\top})x} ∂x∂x⊤Ax=(A+A⊤)x
∂ a ⊤ u ∂ x = ∂ u ∂ x a \dfrac{\partial \mathbf{a^{\top}u}}{\partial \mathbf{x}}=\dfrac{\partial \mathbf{u}}{\partial \mathbf{x}}\mathbf{a} ∂x∂a⊤u=∂x∂ua
∂ ∥ x − a ∥ ∂ x = x − a ∥ x − a ∥ \dfrac{\partial \| \mathbf{x}-\mathbf{a} \|}{\partial \mathbf{x}}=\dfrac{\mathbf{x-a}}{\| \mathbf{x-a}\|} ∂x∂∥x−a∥=∥x−a∥x−a
∂ a ⊤ X b ∂ X = a b ⊤ \dfrac{\partial a^{\top}Xb}{\partial X}=ab^{\top} ∂X∂a⊤Xb=ab⊤
∂ a ⊤ X ⊤ b ∂ X = b a ⊤ \dfrac{\partial a^{\top}X^{\top}b}{\partial X}=ba^{\top} ∂X∂a⊤X⊤b=ba⊤
本文翻译自维基百科