矩阵求导

矩阵分析

在数学中,矩阵分析是多变量分析在矩阵空间上的一个专有概念,囊括了实值函数对于多变量,以及多值函数对单变量的偏导数的问题,其中向量和矩阵被当做整体对待。

概述

简单来说,矩阵分析研究标量、向量、矩阵关于标量、向量、矩阵的导数,对于函数 f ( x 1 , x 2 , x 3 ) f(x_1,x_2,x_3) f(x1,x2,x3),下面是一个简单的例子

∇ f = ∂ f ∂ x = [ ∂ f ∂ x 1      ∂ f ∂ x 2      ∂ f ∂ x 3 ] T \nabla f=\dfrac{\partial f}{\partial x}=[\frac{\partial f}{\partial x_1}\ \ \ \ \frac{\partial f}{\partial x_2}\ \ \ \ \frac{\partial f}{\partial x_3}]^T f=xf=[x1f    x2f    x3f]T

这是一个标量关于向量的偏导数,按照上述的说法,我们可以求标量关于矩阵、标量关于向量、标量关于矩阵、向量关于标量…等等9种不同类别的偏导数,但这些偏导数最终不一定能在二维以内表示,比如向量函数 f = A x f=Ax f=Ax关于矩阵 A A A的偏导数,就是一个三维张量,下表列出了可以在二维内表示的偏导数形式

类型 标量 向量 矩阵
标量 ∂ y ∂ x \dfrac{\partial y}{\partial x} xy ∂ y ∂ x \dfrac{\partial \textbf{y}}{\partial x} xy ∂ Y ∂ x \dfrac{\partial \textbf{Y}}{\partial x} xY
向量 ∂ y ∂ x \dfrac{\partial y}{\partial \textbf{x}} xy ∂ y ∂ x \dfrac{\partial \textbf{y}}{\partial \textbf{x}} xy
矩阵 ∂ y ∂ X \dfrac{\partial y}{\partial \textbf{X}} Xy

布局

矩阵求导有两种布局方式——分子布局和分母布局,二者之间的区别就是差了一个转秩

分子布局(numerator layout)

∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ] . \frac{\partial y}{\partial \mathbf{x}} = \left[ \frac{\partial y}{\partial x_1} \frac{\partial y}{\partial x_2} \cdots \frac{\partial y}{\partial x_n} \right]. xy=[x1yx2yxny].

∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] . \frac{\partial \mathbf{y}}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x}\\ \vdots\\ \frac{\partial y_m}{\partial x}\\ \end{bmatrix}. xy=xy1xy2xym.

∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] . \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}. xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym.

∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 21 ⋯ ∂ y ∂ x p 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ⋯ ∂ y ∂ x p 2 ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 q ∂ y ∂ x 2 q ⋯ ∂ y ∂ x p q ] . \frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}. Xy=x11yx12yx1qyx21yx22yx2qyxp1yxp2yxpqy.

以下两种定义只有分子布局形式:
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] . \frac{\partial \mathbf{Y}}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x}\\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x}\\ \end{bmatrix}. xY=xy11xy21xym1xy12xy22xym2xy1nxy2nxymn.

d X = [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋱ ⋮ d x m 1 d x m 2 ⋯ d x m n ] . d\mathbf{X} = \begin{bmatrix} dx_{11} & dx_{12} & \cdots & dx_{1n}\\ dx_{21} & dx_{22} & \cdots & dx_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ dx_{m1} & dx_{m2} & \cdots & dx_{mn}\\ \end{bmatrix}. dX=dx11dx21dxm1dx12dx22dxm2dx1ndx2ndxmn.

分母布局(denominator layout)

∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] . \frac{\partial y}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y}{\partial x_1}\\ \frac{\partial y}{\partial x_2}\\ \vdots\\ \frac{\partial y}{\partial x_n}\\ \end{bmatrix}. xy=x1yx2yxny.

∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] . \frac{\partial \mathbf{y}}{\partial x} = \left[ \frac{\partial y_1}{\partial x} \frac{\partial y_2}{\partial x} \cdots \frac{\partial y_m}{\partial x} \right]. xy=[xy1xy2xym].

∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ⋯ ∂ y m ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ⋯ ∂ y m ∂ x n ] . \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1}\\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix}. xy=x1y1x2y1xny1x1y2x2y2xny2x1ymx2ymxnym.

∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 ⋯ ∂ y ∂ x 1 q ∂ y ∂ x 21 ∂ y ∂ x 22 ⋯ ∂ y ∂ x 2 q ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x p 1 ∂ y ∂ x p 2 ⋯ ∂ y ∂ x p q ] . \frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1q}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2q}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}. Xy=x11yx21yxp1yx12yx22yxp2yx1qyx2qyxpqy.

当分子分母是向量标量时,两种布局的不同在于 ∂ y i ∂ x j \dfrac{\partial y_i}{\partial \textbf{x}_j} xjyi的位置,若设偏导数矩阵为 C = [ c i j ] C=[c_{ij}] C=[cij],就有

  • 分子布局中, c i j = ∂ y i ∂ x j c_{ij}=\dfrac{\partial y_i}{\partial \textbf{x}_j} cij=xjyi
  • 分母布局中, c i j = ∂ y j ∂ x i c_{ij}=\dfrac{\partial y_j}{\partial \textbf{x}_i} cij=xiyj

通常来说,我们在机器学习中求导往往是标量关于权重向量或矩阵的导数,在约定向量为列向量的情况下,采用分母布局可以使得偏导数向量的各元素与原向量各个元素相对应,可以直接执行各种基于梯度的优化方法,因此,在这里我们主要研究分母布局。

常用公式

摘录一些常用的公式(分母布局)

向量对向量的导数

∂ A x ∂ x = A T \dfrac{\partial Ax}{\partial x}=A^T xAx=AT

∂ x T A ∂ x = A \dfrac{\partial x^TA}{\partial x}=A xxTA=A

∂ A u ∂ x = ∂ u ∂ x A ⊤ \dfrac{\partial Au}{\partial x}=\dfrac{\partial u}{\partial x}A^{\top} xAu=xuA

∂ g ( u ) ∂ x = ∂ u ∂ x ∂ g ( u ) ∂ u \dfrac{\partial \mathbf{g}(\mathbf{u})}{\partial x}=\dfrac{\partial \mathbf{u}}{\partial x}\dfrac{\partial \mathbf{g(u)}}{\partial \mathbf{u}} xg(u)=xuug(u)

标量对向量的导数

∂ u ⊤ v ∂ x = ∂ u ∂ x v + ∂ v ∂ x u \dfrac{\partial \mathbf{u}^{\top}\mathbf{v}}{\partial \mathbf{x}}=\dfrac{\partial \mathbf{u}}{\partial \mathbf{x}}\mathbf{v}+\dfrac{\partial \mathbf{v}}{\partial \mathbf{x}}\mathbf{u} xuv=xuv+xvu

∂ x ⊤ A x ∂ x = ( A + A ⊤ ) x \dfrac{\partial \mathbf{x^{\top}Ax}}{\partial \mathbf{x}}=\mathbf{(A+A^{\top})x} xxAx=(A+A)x

∂ a ⊤ u ∂ x = ∂ u ∂ x a \dfrac{\partial \mathbf{a^{\top}u}}{\partial \mathbf{x}}=\dfrac{\partial \mathbf{u}}{\partial \mathbf{x}}\mathbf{a} xau=xua

∂ ∥ x − a ∥ ∂ x = x − a ∥ x − a ∥ \dfrac{\partial \| \mathbf{x}-\mathbf{a} \|}{\partial \mathbf{x}}=\dfrac{\mathbf{x-a}}{\| \mathbf{x-a}\|} xxa=xaxa

标量对矩阵的导数

∂ a ⊤ X b ∂ X = a b ⊤ \dfrac{\partial a^{\top}Xb}{\partial X}=ab^{\top} XaXb=ab

∂ a ⊤ X ⊤ b ∂ X = b a ⊤ \dfrac{\partial a^{\top}X^{\top}b}{\partial X}=ba^{\top} XaXb=ba

本文翻译自维基百科

你可能感兴趣的:(矩阵求导)