李沐pytorch学习-矩阵求导

一、基础定义

对于 \frac{\partial y}{\partial x} 的计算结果,根据y和x的维度,可根据下图得到导数的维度

李沐pytorch学习-矩阵求导_第1张图片

 图1. 矩阵求导结果维度图

例子:

对于函数:

 y=f\left ( \begin{bmatrix} x_0\\ x_1 \end{bmatrix} \right )=x_0^2+2x_1^2

此时 y 是标量,自变量为矩阵

\mathbf{X}=\begin{bmatrix} x_0\\ x_1 \end{bmatrix}

对自变量求导数

\frac{\partial y}{\partial \mathbf{X}}=\frac{\partial y}{\partial \begin{bmatrix} x_0\\ x_1 \end{bmatrix}}=\left [ \frac{\partial y}{\partial x_0},\frac{\partial y}{\partial x_1} \right ]=\left [ 2x_0,4x_1 \right ]

注意到 y 为标量,x 为向量时,求得导数需要进行转置,反之则不用。

求导样例如下:

李沐pytorch学习-矩阵求导_第2张图片

 图2. 求导样例

二、结论推导

以下是根据个人理解对图2 中一些样例的推导:

2.1 第一排第三个结论证明

y=sum(\mathbf{X})=sum\left ( \begin{bmatrix} x_0\\ x_1\\ ...\\ x_{n-1} \end{bmatrix} \right )=\sum_{i=0}^{n-1}x_i\\ \frac{\partial y}{\partial \mathbf{X}}=\begin{bmatrix} \frac{\partial \sum_{i=0}^{n-1}x_i}{\partial x_0} & \frac{\partial \sum_{i=0}^{n-1}x_i}{\partial x_1} & ... & \frac{\partial \sum_{i=0}^{n-1}x_i}{\partial x_n-1} \end{bmatrix}=\begin{bmatrix} 1,1,1,1 \end{bmatrix}=\mathbf{1}^{T}

2.2 第一排第四个结论证明

y=\left \| \mathbf{X} \right \|^{2}=\sum_{i=0}^{n-1}x_i^2

\frac{\partial y}{\partial \mathbf{X}}=\begin{bmatrix} 2x_0,2x_1,...,2x_{n-1} \end{bmatrix}=2\mathbf{X}^{T}

2.3 第二排第三个结论证明

这里用黑体严格区分张量与标量

y=\left \langle \mathbf{u} ,\mathbf{v}\right \rangle=\left \langle \begin{bmatrix} u_0\left ( \mathbf{X} \right )\\ u_1\left ( \mathbf{X} \right )\\ ...\\ u_{n-1}\left ( \mathbf{X} \right ) \end{bmatrix},\begin{bmatrix} v_0\left ( \mathbf{X} \right )\\ v_1\left ( \mathbf{X} \right )\\ ...\\ v_{n-1}\left ( \mathbf{X} \right ) \end{bmatrix} \right \rangle =\sum_{i=0}^{n-1}u_i\left ( \mathbf{X} \right )v_i\left ( \mathbf{X} \right )

\frac{\partial y}{\partial \mathbf{X}}=\sum_{i=0}^{n-1}\frac{\partial u_i\left ( \mathbf{X} \right )v_i\left ( \mathbf{X} \right )}{\partial \mathbf{X}}\\\because\frac{\partial uv}{\partial \mathbf{X}}=v\frac{\partial u}{\partial \mathbf{X}}+u\frac{\partial v}{\partial \mathbf{X}}\\\therefore \frac{\partial y}{\partial \mathbf{X}}\\=\sum_{i=0}^{n-1}\left [ v_i\frac{\partial u_i}{\partial \mathbf{X}}+u_i\frac{\partial v_i}{\partial \mathbf{X}} \right ]\\=\sum_{i=0}^{n-1}\left \{ v_i\left [\frac{\partial u_i}{\partial x_0},\frac{\partial u_i}{\partial x_1},...,\frac{\partial u_i}{\partial x_{n-1}} \right ]+u_i\left [\frac{\partial v_i}{\partial x_0},\frac{\partial v_i}{\partial x_1},...,\frac{\partial v_i}{\partial x_{n-1}} \right ] \right \}\\=\sum_{i=0}^{n-1}v_i\left [\frac{\partial u_i}{\partial x_0},\frac{\partial u_i}{\partial x_1},...,\frac{\partial u_i}{\partial x_{n-1}} \right ]+\sum_{i=0}^{n-1}u_i\left [\frac{\partial v_i}{\partial x_0},\frac{\partial v_i}{\partial x_1},...,\frac{\partial v_i}{\partial x_{n-1}} \right ]\\=\left [ v_0,v_1,...,v_{n-1} \right ]\begin{bmatrix} \frac{\partial u_0}{\partial x_0},\frac{\partial u_0}{\partial x_1},...,\frac{\partial u_0}{\partial x_{n-1}}\\ \frac{\partial u_1}{\partial x_0},\frac{\partial u_1}{\partial x_1},...,\frac{\partial u_1}{\partial x_{n-1}}\\ ...\\ \frac{\partial u_{n-1}}{\partial x_0},\frac{\partial u_{n-1}}{\partial x_1},...,\frac{\partial u_{n-1}}{\partial x_{n-1}} \end{bmatrix}+\\ \left [ u_0,u_1,...,u_{n-1} \right ]\begin{bmatrix} \frac{\partial v_0}{\partial x_0},\frac{\partial v_0}{\partial x_1},...,\frac{\partial v_0}{\partial x_{n-1}}\\ \frac{\partial v_1}{\partial x_0},\frac{\partial v_1}{\partial x_1},...,\frac{\partial v_1}{\partial x_{n-1}}\\ ...\\ \frac{\partial v_{n-1}}{\partial x_0},\frac{\partial v_{n-1}}{\partial x_1},...,\frac{\partial v_{n-1}}{\partial x_{n-1}} \end{bmatrix}\\=\mathbf{u}^{T}\frac{\partial \mathbf{v}}{\partial \mathbf{X}}+\mathbf{v}^{T}\frac{\partial \mathbf{u}}{\partial \mathbf{X}}

你可能感兴趣的:(深度学习,pytorch,学习,矩阵)