线性代数实现p4
本系列文章是我学习李沐老师深度学习系列课程的学习笔记,可能会对李沐老师上课没讲到的进行补充。
本节是第二篇
此处参考了视频:矩阵的导数运算
为了方便看出区别,我将所有的向量都不按印刷体加粗,而是按手写体在向量对应字母上加箭头的方式展现。
在一元函数中,求一个函数的极值点,一般令导数为0(该点切线斜率为0),求得驻点,最后通过极值点定义或推论判断其是否为极值点,也就是如下过程:
求多元函数极值的方法如下:
(这个图中给的自变量记成了 y y y,实际上记成 x x x更顺眼)
假设这个多元函数有 m m m个变量,即 f ( x 1 , x 2 , . . . , x m ) f(x_{1},x_{2},...,x_{m}) f(x1,x2,...,xm),那么求其极值的偏导数方程组中的方程就有 m m m个,这样写起来有一些麻烦,于是我们将用一种简洁的方式表达它,我们将所有这 m m m个变量写成一个列向量的形式即 x → = [ x 1 x 2 ⋮ x m ] m × 1 \overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{m} \end{bmatrix}_{m\times 1} x= x1x2⋮xm m×1,此时我们将多元函数 f ( x 1 , x 2 , . . . , x m ) f(x_{1},x_{2},...,x_{m}) f(x1,x2,...,xm)转化为一个自变量是一个向量的方程即 f ( x → ) f(\overrightarrow x) f(x)
【注意】此处 x → \overrightarrow x x是一个由多个自变量汇总而成的 m m m维列向量( m × 1 m\times 1 m×1),而 f ( x → ) f(\overrightarrow x) f(x)是函数值,是一个标量,所以对其求偏导数就是标量对向量求导。
此时我们可以定义标量方程对向量的偏导数形式(有两种)为:
(1)分母布局(Denominator Layout):
∂ f ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] m × 1 \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {f(\overrightarrow x)}}{\partial{x_{1}}}\\ \frac{\partial {f(\overrightarrow x)}}{\partial{x_{2}}}\\ \vdots \\ \frac{\partial {f(\overrightarrow x)}}{\partial{x_{m}}} \end{bmatrix}_{m\times 1} ∂x∂f(x)= ∂x1∂f(x)∂x2∂f(x)⋮∂xm∂f(x) m×1
其中, ∂ f ( x → ) ∂ x → \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} ∂x∂f(x)为 m × 1 m\times 1 m×1的列向量。
(2)分子布局(Numerator Layout):
∂ f ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 , ∂ f ( x → ) ∂ x 2 , … , ∂ f ( x → ) ∂ x m ] 1 × m \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {f(\overrightarrow x)}}{\partial{x_{1}}},& \frac{\partial {f(\overrightarrow x)}}{\partial{x_{2}}},& \dots, & \frac{\partial {f(\overrightarrow x)}}{\partial{x_{m}}} \end{bmatrix}_{1\times m} ∂x∂f(x)=[∂x1∂f(x),∂x2∂f(x),…,∂xm∂f(x)]1×m
其中, ∂ f ( x → ) ∂ x → \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} ∂x∂f(x)为 1 × m 1\times m 1×m的行向量。
不同的资料采用的布局不一样,分子布局与分母布局互为转置,虽然在李沐老师的课程中标量对向量的导数采用了分子布局,但是为了方便推导一些结论,我们采用分母布局,注意分母布局和分子布局的结论互为转置。
【例】已知 f ( x 1 , x 2 ) = x 1 2 + x 2 2 f(x_{1},x_{2})=x_{1}^{2}+x_{2}^{2} f(x1,x2)=x12+x22,其中 x → = [ x 1 x 2 ] \overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} x=[x1x2],求 ∂ f ( x → ) ∂ x → \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} ∂x∂f(x)
【答】 ∂ f ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ] = [ 2 x 1 2 x 2 ] \frac{\partial {f(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {f(\overrightarrow x)}}{\partial{x_{1}}}\\ \frac{\partial {f(\overrightarrow x)}}{\partial{x_{2}}} \end{bmatrix}=\begin{bmatrix} 2x_{1}\\ 2x_{2} \end{bmatrix} ∂x∂f(x)=[∂x1∂f(x)∂x2∂f(x)]=[2x12x2]
设有如下函数,它本身就是一个向量,然后它的自变量也是向量(由多个自变量组成的向量),即:
f → ( x → ) = [ f 1 ( x → ) f 2 ( x → ) ⋮ f n ( x → ) ] n × 1 , x → = [ x 1 x 2 ⋮ x m ] \overrightarrow{f}(\overrightarrow x)=\begin{bmatrix} f_{1}(\overrightarrow x)\\ f_{2}(\overrightarrow x)\\ \vdots \\f_{n}(\overrightarrow x) \end{bmatrix}_{n\times 1},\overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix} f(x)= f1(x)f2(x)⋮fn(x) n×1,x= x1x2⋮xm
其中, f → ( x → ) \overrightarrow{f}(\overrightarrow x) f(x)是一个 n × 1 n\times 1 n×1的列向量, x → \overrightarrow x x是一个 m × 1 m\times 1 m×1的列向量。
此时我们将其偏导数形式定义为:
(1)分母布局:
∂ f → ( x → ) n × 1 ∂ x → m × 1 = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 … ∂ f n ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x m … ∂ f n ( x → ) ∂ x m ] m × n \frac{\partial {\overrightarrow{f}(\overrightarrow x)}_{n\times 1}}{\partial\overrightarrow x_{m\times 1}} =\begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \vdots \\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}_{m\times n} ∂xm×1∂f(x)n×1= ∂x1∂f(x)∂x2∂f(x)⋮∂xm∂f(x) = ∂x1∂f1(x)∂x2∂f1(x)⋮∂xm∂f1(x)∂x1∂f2(x)∂x2∂f2(x)⋮∂xm∂f2(x)……⋱…∂x1∂fn(x)∂x2∂fn(x)⋮∂xm∂fn(x) m×n
(2)分子布局:
∂ f → ( x → ) n × 1 ∂ x → m × 1 = [ ∂ f 1 ( x → ) ∂ x → ∂ f 2 ( x → ) ∂ x → … ∂ f n ( x → ) ∂ x → ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 … ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 2 … ∂ f 2 ( x → ) ∂ x m ⋮ ⋮ ⋱ ⋮ ∂ f n ( x → ) ∂ x 1 ∂ f n ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x m ] n × m \frac{\partial {\overrightarrow{f}(\overrightarrow x)}_{n\times 1}}{\partial\overrightarrow x_{m\times 1}} =\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {\overrightarrow x}}\\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {\overrightarrow x}}\\ \dots \\ \frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {\overrightarrow x}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}} \\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}_{n\times m} ∂xm×1∂f(x)n×1= ∂x∂f1(x)∂x∂f2(x)…∂x∂fn(x) = ∂x1∂f1(x)∂x1∂f2(x)⋮∂x1∂fn(x)∂x2∂f1(x)∂x2∂f2(x)⋮∂x2∂fn(x)……⋱…∂xm∂f1(x)∂xm∂f2(x)⋮∂xm∂fn(x) n×m
【例】已知 f → ( x → ) = [ f 1 ( x → ) f 2 ( x → ) ] = [ x 1 2 + x 2 2 + x 3 x 3 2 + 2 x 1 ] 2 × 1 \overrightarrow{f}(\overrightarrow x)=\begin{bmatrix} f_{1}( \overrightarrow {x})\\ f_{2}( \overrightarrow {x}) \end{bmatrix}=\begin{bmatrix} x_{1}^{2}+x_{2}^{2}+x_{3} \\ x_{3}^{2}+2x_{1} \end{bmatrix}_{2\times 1} f(x)=[f1(x)f2(x)]=[x12+x22+x3x32+2x1]2×1, x → = [ x 1 x 2 x 3 ] \overrightarrow {x}=\begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \end{bmatrix} x= x1x2x3 ,求 ∂ f → ( x → ) ∂ x → \frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x} ∂x∂f(x)
【答】按分母布局: ∂ f → ( x → ) ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ∂ f ( x → ) ∂ x 3 ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 ∂ f 1 ( x → ) ∂ x 3 ∂ f 2 ( x → ) ∂ x 3 ] = [ 2 x 1 2 2 x 2 0 1 2 x 3 ] \frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x}=\begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{3}}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{3}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{3}}} \end{bmatrix}=\begin{bmatrix} 2x_{1} &2 \\ 2x_{2} & 0\\ 1 &2x_{3} \end{bmatrix} ∂x∂f(x)= ∂x1∂f(x)∂x2∂f(x)∂x3∂f(x) = ∂x1∂f1(x)∂x2∂f1(x)∂x3∂f1(x)∂x1∂f2(x)∂x2∂f2(x)∂x3∂f2(x) = 2x12x21202x3
按分子布局: ∂ f → ( x → ) ∂ x → = [ ∂ f 1 ( x → ) ∂ x → ∂ f 2 ( x → ) ∂ x → ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 1 ( x → ) ∂ x 3 ∂ f 2 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 3 ] = [ 2 x 1 2 x 2 1 2 0 2 x 3 ] \frac{\partial {\overrightarrow{f}(\overrightarrow x)}}{\partial\overrightarrow x} =\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {\overrightarrow x}}\\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {\overrightarrow x}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{3}}}\\ \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}}&\frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{3}}} \\ \end{bmatrix}=\begin{bmatrix} 2x_{1} & 2x_{2} & 1\\ 2 & 0 &2x_{3} \end{bmatrix} ∂x∂f(x)=[∂x∂f1(x)∂x∂f2(x)]=[∂x1∂f1(x)∂x1∂f2(x)∂x2∂f1(x)∂x2∂f2(x)∂x3∂f1(x)∂x3∂f2(x)]=[2x122x2012x3]