【李沐深度学习笔记】矩阵计算(2)

课程地址和说明

线性代数实现p4
本系列文章是我学习李沐老师深度学习系列课程的学习笔记,可能会对李沐老师上课没讲到的进行补充。
本节是第二篇

矩阵计算

矩阵的导数运算

此处参考了视频:矩阵的导数运算
为了方便看出区别,我将所有的向量都不按印刷体加粗,而是按手写体在向量对应字母上加箭头的方式展现。

标量方程对向量的导数

在一元函数中,求一个函数的极值点,一般令导数为0(该点切线斜率为0),求得驻点,最后通过极值点定义或推论判断其是否为极值点,也就是如下过程:
【李沐深度学习笔记】矩阵计算(2)_第1张图片
求多元函数极值的方法如下:
【李沐深度学习笔记】矩阵计算(2)_第2张图片
(这个图中给的自变量记成了 y y y,实际上记成 x x x更顺眼)

  • 假设这个多元函数有 m m m个变量,即 f ( x 1 , x 2 , . . . , x m ) f(x_{1},x_{2},...,x_{m}) f(x1,x2,...,xm),那么求其极值的偏导数方程组中的方程就有 m m m个,这样写起来有一些麻烦,于是我们将用一种简洁的方式表达它,我们将所有这 m m m个变量写成一个列向量的形式即 x → = [ x 1 x 2 ⋮ x m ] m × 1 \overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{m} \end{bmatrix}_{m\times 1} x = x1x2xm m×1,此时我们将多元函数 f ( x 1 , x 2 , . . . , x m ) f(x_{1},x_{2},...,x_{m}) f(x1,x2,...,xm)转化为一个自变量是一个向量的方程即 f ( x → ) f(\overrightarrow x) f(x )
    【注意】此处 x → \overrightarrow x x 是一个由多个自变量汇总而成的 m m m列向量 m × 1 m\times 1 m×1),而 f ( x → ) f(\overrightarrow x) f(x )是函数值,是一个标量,所以对其求偏导数就是标量对向量求导。

  • 此时我们可以定义标量方程对向量的偏导数形式(有两种)为:
    (1)分母布局(Denominator Layout):
    ∂ f ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] m × 1 \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {f(\overrightarrow x)}}{\partial{x_{1}}}\\ \frac{\partial {f(\overrightarrow x)}}{\partial{x_{2}}}\\ \vdots \\ \frac{\partial {f(\overrightarrow x)}}{\partial{x_{m}}} \end{bmatrix}_{m\times 1} x f(x )= x1f(x )x2f(x )xmf(x ) m×1
    其中, ∂ f ( x → ) ∂ x → \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} x f(x ) m × 1 m\times 1 m×1的列向量。
    (2)分子布局(Numerator Layout):
    ∂ f ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 , ∂ f ( x → ) ∂ x 2 , … , ∂ f ( x → ) ∂ x m ] 1 × m \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {f(\overrightarrow x)}}{\partial{x_{1}}},& \frac{\partial {f(\overrightarrow x)}}{\partial{x_{2}}},& \dots, & \frac{\partial {f(\overrightarrow x)}}{\partial{x_{m}}} \end{bmatrix}_{1\times m} x f(x )=[x1f(x ),x2f(x ),,xmf(x )]1×m
    其中, ∂ f ( x → ) ∂ x → \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} x f(x ) 1 × m 1\times m 1×m的行向量。
    不同的资料采用的布局不一样,分子布局与分母布局互为转置,虽然在李沐老师的课程中标量对向量的导数采用了分子布局,但是为了方便推导一些结论,我们采用分母布局,注意分母布局和分子布局的结论互为转置

  • 【例】已知 f ( x 1 , x 2 ) = x 1 2 + x 2 2 f(x_{1},x_{2})=x_{1}^{2}+x_{2}^{2} f(x1,x2)=x12+x22,其中 x → = [ x 1 x 2 ] \overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} x =[x1x2],求 ∂ f ( x → ) ∂ x → \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} x f(x )
    【答】 ∂ f ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ] = [ 2 x 1 2 x 2 ] \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {f(\overrightarrow x)}}{\partial{x_{1}}}\\ \frac{\partial {f(\overrightarrow x)}}{\partial{x_{2}}} \end{bmatrix}=\begin{bmatrix} 2x_{1}\\ 2x_{2} \end{bmatrix} x f(x )=[x1f(x )x2f(x )]=[2x12x2]

向量方程对向量的导数

设有如下函数,它本身就是一个向量,然后它的自变量也是向量(由多个自变量组成的向量),即:
f → ( x → ) = [ f 1 ( x → ) f 2 ( x → ) ⋮ f n ( x → ) ] n × 1 , x → = [ x 1 x 2 ⋮ x m ] \overrightarrow{f}(\overrightarrow x)=\begin{bmatrix} f_{1}(\overrightarrow x)\\ f_{2}(\overrightarrow x)\\ \vdots \\f_{n}(\overrightarrow x) \end{bmatrix}_{n\times 1},\overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix} f (x )= f1(x )f2(x )fn(x ) n×1,x = x1x2xm
其中, f → ( x → ) \overrightarrow{f}(\overrightarrow x) f (x )是一个 n × 1 n\times 1 n×1的列向量, x → \overrightarrow x x 是一个 m × 1 m\times 1 m×1的列向量。
此时我们将其偏导数形式定义为:

  • (1)分母布局
    ∂ f → ( x → ) n × 1 ∂ x → m × 1 = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 … ∂ f n ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x m … ∂ f n ( x → ) ∂ x m ] m × n \frac{\partial {\overrightarrow{f}(\overrightarrow x)}_{n\times 1}}{\partial\overrightarrow x_{m\times 1}} =\begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \vdots \\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}_{m\times n} x m×1f (x )n×1= x1f(x )x2f(x )xmf(x ) = x1f1(x )x2f1(x )xmf1(x )x1f2(x )x2f2(x )xmf2(x )x1fn(x )x2fn(x )xmfn(x ) m×n
    (2)分子布局
    ∂ f → ( x → ) n × 1 ∂ x → m × 1 = [ ∂ f 1 ( x → ) ∂ x → ∂ f 2 ( x → ) ∂ x → … ∂ f n ( x → ) ∂ x → ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 … ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 2 … ∂ f 2 ( x → ) ∂ x m ⋮ ⋮ ⋱ ⋮ ∂ f n ( x → ) ∂ x 1 ∂ f n ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x m ] n × m \frac{\partial {\overrightarrow{f}(\overrightarrow x)}_{n\times 1}}{\partial\overrightarrow x_{m\times 1}} =\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {\overrightarrow x}}\\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {\overrightarrow x}}\\ \dots \\ \frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {\overrightarrow x}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}} \\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}_{n\times m} x m×1f (x )n×1= x f1(x )x f2(x )x fn(x ) = x1f1(x )x1f2(x )x1fn(x )x2f1(x )x2f2(x )x2fn(x )xmf1(x )xmf2(x )xmfn(x ) n×m

  • 【例】已知 f → ( x → ) = [ f 1 ( x → ) f 2 ( x → ) ] = [ x 1 2 + x 2 2 + x 3 x 3 2 + 2 x 1 ] 2 × 1 \overrightarrow{f}(\overrightarrow x)=\begin{bmatrix} f_{1}( \overrightarrow {x})\\ f_{2}( \overrightarrow {x}) \end{bmatrix}=\begin{bmatrix} x_{1}^{2}+x_{2}^{2}+x_{3} \\ x_{3}^{2}+2x_{1} \end{bmatrix}_{2\times 1} f (x )=[f1(x )f2(x )]=[x12+x22+x3x32+2x1]2×1 x → = [ x 1 x 2 x 3 ] \overrightarrow {x}=\begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix} x = x1x2x3 ,求 ∂ f → ( x → ) ∂ x → \frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x} x f (x )
    【答】按分母布局: ∂ f → ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ∂ f ( x → ) ∂ x 3 ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 ∂ f 1 ( x → ) ∂ x 3 ∂ f 2 ( x → ) ∂ x 3 ] = [ 2 x 1 2 2 x 2 0 1 2 x 3 ] \frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x}=\begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{3}}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{3}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{3}}} \end{bmatrix}=\begin{bmatrix} 2x_{1} &2 \\ 2x_{2} & 0\\ 1 &2x_{3} \end{bmatrix} x f (x )= x1f(x )x2f(x )x3f(x ) = x1f1(x )x2f1(x )x3f1(x )x1f2(x )x2f2(x )x3f2(x ) = 2x12x21202x3
    按分子布局: ∂ f → ( x → ) ∂ x → = [ ∂ f 1 ( x → ) ∂ x → ∂ f 2 ( x → ) ∂ x → ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 1 ( x → ) ∂ x 3 ∂ f 2 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 3 ] = [ 2 x 1 2 x 2 1 2 0 2 x 3 ] \frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {\overrightarrow x}}\\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {\overrightarrow x}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{3}}}\\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}}&\frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{3}}} \\ \end{bmatrix}=\begin{bmatrix} 2x_{1} & 2x_{2} & 1\\ 2 & 0 &2x_{3} \end{bmatrix} x f (x )=[x f1(x )x f2(x )]=[x1f1(x )x1f2(x )x2f1(x )x2f2(x )x3f1(x )x3f2(x )]=[2x122x2012x3]

你可能感兴趣的:(李沐深度学习,深度学习,笔记,矩阵)