能看透人生,却看不透数学,有什么好处!
本课程讨论的向量默认都是列向量。向量和矩阵都只讨论实数情况。
对向量 x , y ∈ R n x,y\in R^n x,y∈Rn,其内积
< x , y > = x T y = ∑ i = 1 n x i y i
内积满足交换律,即 x T y = y T x x^T y = y^T x xTy=yTx .
对方阵 A ∈ R n × n A\in R^{n\times n} A∈Rn×n,其迹为对角线元素之和。
t r ( A ) = ∑ i = 1 n A i i tr(A) = \sum_{i=1}^n A_{ii} tr(A)=i=1∑nAii
t r ( A T ) = t r ( A ) t r ( A + B ) = t r ( A ) + t r ( B ) t r ( A B ) = t r ( B A ) t r ( A B C ) = t r ( B C A ) = t r ( C A B ) tr(A^T) = tr(A) \\ tr(A+B) = tr(A) + tr(B) \\ tr(AB) = tr(BA) \\ tr(ABC) = tr(BCA) = tr(CAB) tr(AT)=tr(A)tr(A+B)=tr(A)+tr(B)tr(AB)=tr(BA)tr(ABC)=tr(BCA)=tr(CAB)
两个尺寸相关的矩阵的内积,定义为两矩阵逐元素相乘后的和,即
< A , B > = ∑ i , j A i j B i j = t r ( A T B ) = \sum_{i,j}A_{ij} B_{ij} = tr(A^T B) <A,B>=i,j∑AijBij=tr(ATB)
如果函数 f : R n → R f:R^n \rightarrow R f:Rn→R 的定义域为 d o m f = R n {\rm dom f}=R^n domf=Rn 而且满足以下条件,则称 f f f 是范数:
- 非负性: 对任意的 x ∈ R n x\in R^n x∈Rn,都有 f ( x ) ≤ 0 f(x)\le 0 f(x)≤0,且 f ( x ) = 0 f(x)=0 f(x)=0 时,必有 x = 0 x=0 x=0;
- 齐次性: 对任意的 x ∈ R n , t ∈ R x\in R^n, t\in R x∈Rn,t∈R,都有 f ( t x ) = ∣ t ∣ f ( x ) f(tx)=|t|f(x) f(tx)=∣t∣f(x);
- 满足三角不等式:对于任意的 x , y ∈ R n x,y\in R^n x,y∈Rn,均有 f ( x + y ) ≤ f ( x ) + f ( y ) f(x+y)\le f(x) + f(y) f(x+y)≤f(x)+f(y)
用记号 f ( x ) = ∥ x ∥ f(x) = \Vert x\Vert f(x)=∥x∥ 表示范数。
理解:
∥ x ∥ p = ( ∣ x 1 ∣ p + ⋯ + ∣ x n ∣ p ) 1 / p \Vert x\Vert_p = (|x_1|^p + \cdots + |x_n|^p)^{1/p} ∥x∥p=(∣x1∣p+⋯+∣xn∣p)1/p
∥ x ∥ 1 = ( ∣ x 1 ∣ + ⋯ + ∣ x n ∣ ) \Vert x\Vert_1 = (|x_1| + \cdots + |x_n|) ∥x∥1=(∣x1∣+⋯+∣xn∣)
∥ x ∥ 2 = ( ∣ x 1 ∣ 2 + ⋯ + ∣ x n ∣ 2 ) 1 / 2 \Vert x\Vert_2 = (|x_1|^2 + \cdots + |x_n|^2)^{1/2} ∥x∥2=(∣x1∣2+⋯+∣xn∣2)1/2
∥ x ∥ ∞ = max { ∣ x 1 ∣ , ⋯ , ∣ x n ∣ } \Vert x\Vert_\infty = \max\{|x_1|, \cdots, |x_n|\} ∥x∥∞=max{ ∣x1∣,⋯,∣xn∣}
矩阵 A ∈ R m × n A\in R^{m\times n} A∈Rm×n 的 Frobenius 范数定义为
∥ A ∥ F = ( t r ( A T A ) ) 1 / 2 = ( ∑ i = 1 m ∑ j = 1 n A i j 2 ) 1 / 2 \Vert A\Vert_F = (tr(A^T A))^{1/2} = \left(\sum_{i=1}^m \sum_{j=1}^n A_{ij}^2\right)^{1/2} ∥A∥F=(tr(ATA))1/2=(i=1∑mj=1∑nAij2)1/2
向量和标量导数:
( ∂ a ∂ x ) i = ∂ a ∂ x i \left( \frac{\partial a}{\partial \boldsymbol x} \right)_i = \frac{\partial a}{\partial x_i} (∂x∂a)i=∂xi∂a
矩阵和标量导数:
( ∂ a ∂ X ) i j = ∂ a ∂ X i j \left( \frac{\partial a}{\partial \boldsymbol X} \right)_{ij} = \frac{\partial a}{\partial X_{ij}} (∂X∂a)ij=∂Xij∂a
一阶导数(梯度):
( ∇ f ( x ) ) i = ∂ f ( x ) ∂ x i (\nabla f(\boldsymbol x))_i = \frac{\partial f(\boldsymbol x)}{\partial x_i} (∇f(x))i=∂xi∂f(x)
( ∇ 2 f ( x ) ) i j = ∂ 2 f ( x ) ∂ x i ∂ x j (\nabla^2 f(\boldsymbol x))_{ij} = \frac{\partial^2 f(\boldsymbol x)}{\partial x_i\partial x_j} (∇2f(x))ij=∂xi∂xj∂2f(x)
向量和向量导数:
采用分子布局,即分子为列向量,分母为行向量。对分子每个元素,求导得一个行向量,最终组成一个矩阵。以 f ( x ) ∈ R 2 , x ∈ R 3 f(x)\in R^2, x\in R^3 f(x)∈R2,x∈R3 为例,两者均为列向量:
∂ f ∂ x = ∂ [ f 1 f 2 ] ∂ [ x 1 , x 2 , x 3 ] = [ ∂ f 1 ∂ x 1 , ∂ f 1 ∂ x 2 , ∂ f 1 ∂ x 3 ∂ f 2 ∂ x 1 , ∂ f 2 ∂ x 2 , ∂ f 2 ∂ x 3 ] \frac{\partial f}{\partial x} = \frac{\partial \left[\begin{matrix} f_1 \\ f_2 \end{matrix}\right] }{\partial \left[\begin{matrix} x_1, x_2,x_3 \end{matrix}\right]}= \left[ \begin{matrix} \frac{\partial f_1}{\partial x_1}, \frac{\partial f_1}{\partial x_2} , \frac{\partial f_1}{\partial x_3} \\ \frac{\partial f_2}{\partial x_1}, \frac{\partial f_2}{\partial x_2} , \frac{\partial f_2}{\partial x_3} \end{matrix} \right] ∂x∂f=∂[x1,x2,x3]∂[f1f2]=[∂x1∂f1,∂x2∂f1,∂x3∂f1∂x1∂f2,∂x2∂f2,∂x3∂f2]
看的时候,可以把分母看成一个整体。
复合求导
若 x → y → f x\rightarrow y \rightarrow f x→y→f ,其中 x ∈ R n , y ∈ R , f ∈ R x\in R^n, y\in R, f\in R x∈Rn,y∈R,f∈R ,则
∂ f ∂ x = ∂ f ∂ y ∂ y ∂ x \frac{\partial f}{\partial x} = \frac{\partial f}{\partial y} \frac{\partial y}{\partial x} ∂x∂f=∂y∂f∂x∂y
d ( A B C ) = d A ⋅ B C + A ⋅ d B ⋅ C + A B ⋅ d C d t r ( X ) = t r ( d X ) {\rm d} (ABC) = {\rm d}A\cdot BC + A\cdot{\rm d}B\cdot C + AB\cdot {\rm d}C \\ {\rm d} tr(X) = tr({\rm d}X) d(ABC)=dA⋅BC+A⋅dB⋅C+AB⋅dCdtr(X)=tr(dX)
逐元素函数的微分:
d σ ( x ) = σ ′ ( x ) ⊙ d x d σ ( X ) = σ ′ ( X ) ⊙ d X d\sigma(x) = \sigma'(x)\odot dx \\ d\sigma(X) = \sigma'(X)\odot dX dσ(x)=σ′(x)⊙dxdσ(X)=σ′(X)⊙dX
其中 σ , σ ′ \sigma, \sigma' σ,σ′ 为逐元素函数及对应导数, ⊙ \odot ⊙ 代表逐元素相乘。
利用微分和迹求导。
回顾:标量对标量求导:对标量 x ∈ R , f ( x ) ∈ R x\in R, f(x)\in R x∈R,f(x)∈R,若 d f = a d x df = a dx df=adx,则 d f d x = a \frac{df}{dx} = a dxdf=a .
回顾:推广:标量对标量求导:全微分表达式 f ( x , y ) = x 2 + x y + y 2 f(x,y) = x^2 + xy + y^2 f(x,y)=x2+xy+y2, d f = 2 x d x + y d x + x d y + 2 y d y = ( 2 x + y ) d x + ( x + 2 y ) d y df = 2x dx + y dx + x dy+ 2y dy = (2x + y)dx + (x + 2y)dy df=2xdx+ydx+xdy+2ydy=(2x+y)dx+(x+2y)dy,则 ∂ f ∂ x = ( 2 x + y ) , ∂ f ∂ y = ( x + 2 y ) \frac{\partial f}{\partial x} = (2x + y), \frac{\partial f}{\partial y} = (x + 2y) ∂x∂f=(2x+y),∂y∂f=(x+2y) .
标量对向量求导:对向量 x ∈ R n , f ( x ) ∈ R x\in R^n, f(x)\in R x∈Rn,f(x)∈R,如果 d f = a T d x df = a^T dx df=aTdx,则 d f d x = a \frac{df}{dx} = a dxdf=a 。
推广:标量对向量求导:对向量 x , y ∈ R n , f ( x ) , f ( y ) ∈ R x,y\in R^n, f(x), f(y)\in R x,y∈Rn,f(x),f(y)∈R,如果 d f = a T d x + b T d y df = a^T dx + b^T dy df=aTdx+bTdy,则 ∂ f ∂ x = a , ∂ f ∂ y = b \frac{\partial f}{\partial x} = a, \frac{\partial f}{\partial y}=b ∂x∂f=a,∂y∂f=b 。
标量对矩阵求导:对矩阵 X , Y ∈ R m × n , f ( X ) , f ( Y ) ∈ R X,Y\in R^{m\times n}, f(X), f(Y)\in R X,Y∈Rm×n,f(X),f(Y)∈R,如果 d f = t r ( A T d X ) + t r ( B T d Y ) df = tr(A^T dX) + tr(B^T dY) df=tr(ATdX)+tr(BTdY),则 ∂ f ∂ X = A , ∂ f ∂ Y = B \frac{\partial f}{\partial X}=A, \frac{\partial f}{\partial Y}=B ∂X∂f=A,∂Y∂f=B。
推广:标量对矩阵求导:对矩阵 X , Y ∈ R m × n , f ( X ) , f ( Y ) ∈ R X,Y\in R^{m\times n}, f(X), f(Y)\in R X,Y∈Rm×n,f(X),f(Y)∈R,如果 d f = t r ( A T d X ) + t r ( B T d Y ) df = tr(A^T dX) + tr(B^T dY) df=tr(ATdX)+tr(BTdY),则 ∂ f ∂ X = A , ∂ f ∂ Y = B \frac{\partial f}{\partial X}=A, \frac{\partial f}{\partial Y}=B ∂X∂f=A,∂Y∂f=B。
d f ( x ) = ( A d x ) T ( A x − b ) + ( A x − b ) T A d x = t r ( ( 2 A T ( A x − b ) ) T d x ) d f ( x ) d x = 2 A T ( A x − b ) df(x) = (A dx)^T(Ax - b) + (Ax-b)^T A dx = tr((2A^T (Ax -b))^T dx) \\ \frac{df(x)}{dx} = 2A^T (Ax-b) df(x)=(Adx)T(Ax−b)+(Ax−b)TAdx=tr((2AT(Ax−b))Tdx)dxdf(x)=2AT(Ax−b)
d f ( A ) = d t r ( A T A ) = t r ( d A T A ) + t r ( A T d A ) = t r ( A T d A ) + t r ( A T d A ) = t r ( 2 A T d A ) \begin{aligned} df(A) &= dtr(A^T A)\\ &= tr(dA^T A) + tr(A^T dA) \\ &= tr(A^T dA) + tr(A^T dA) \\ &= tr(2A^T dA) \end{aligned} df(A)=dtr(ATA)=tr(dATA)+tr(ATdA)=tr(ATdA)+tr(ATdA)=tr(2ATdA)
则 d f ( A ) d A = 2 A \frac{df(A)}{dA} = 2A dAdf(A)=2A.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats
import sklearn
x = np.array([1,2,3])
y = np.array([1,1,1])
A = np.random.randint(0, 9, (3,3))
B = np.random.randint(0, 9, (3,3))
np.trace(A)
np.linalg.norm(x, ord=np.inf)
np.linalg.norm(A, ord='fro')
import numpy as np
import matplotlib.pyplot as plt
# 1/2 范数绘制
x = np.linspace(-1, 1, 199) # 注意点数为奇数,否则尖点出不来
y05 = (1 - abs(x)**(1/2) )**2
plt.figure(figsize=(4,4))
plt.plot(x, y05, 'r', x, -y05, 'r')
plt.xlim([-1.2, 1.2]); plt.ylim([-1.2, 1.2])
plt.axis('square')
plt.axis('off')
plt.show()
准确地讲,机器学习有很多类别,其中一个主要的框架如下:
以波士顿房价预测为例,使用线性回归模型:
根据数据或者现实物理关系,可以建立各种各样的数学模型。
部分问题没有约束条件,称为无约束问题,例如:
线性回归
m i n i m i z e w ∥ y − X w ∥ 2 2 \mathop{\rm minimize}\limits_w \quad \Vert y - Xw\Vert_2^2 wminimize∥y−Xw∥22
岭回归
m i n i m i z e w ∥ y − X w ∥ 2 2 + λ ∥ w ∥ 2 2 \mathop{\rm minimize}\limits_w \quad \Vert y - Xw\Vert_2^2 + \lambda \Vert w\Vert_2^2\\ wminimize∥y−Xw∥22+λ∥w∥22
LASSO算法
m i n i m i z e w ∥ y − X w ∥ 2 2 + λ ∥ w ∥ 1 \mathop{\rm minimize}\limits_w \quad \Vert y - Xw\Vert_2^2 + \lambda \Vert w\Vert_1\\ wminimize∥y−Xw∥22+λ∥w∥1
Logistic回归
BP前馈神经网络
部分问题有等式约束或不等式约束,称为有约束问题,例如:
线性判别分析(LDA)
m i n i m i z e w − w T S b w s u b j e c t t o w T S w w = c \begin{aligned} \mathop{minimize}\limits_w &\quad -w^T S_b w \\ {\rm subject\ to} &\quad w^T S_w w = c \end{aligned} wminimizesubject to−wTSbwwTSww=c
支持向量机(SVM)
m i n i m i z e w 1 2 ∥ w ∥ 2 s u b j e c t t o y i ( w T x i + b ) ≤ 1 , i = 1 , . . . , m \begin{aligned} \mathop{\rm minimize}\limits_w &\quad \frac{1}{2}\Vert w\Vert^2 \\ {\rm subject\ to} &\quad y_i(w^T x_i + b) \le 1, \quad i=1,...,m \end{aligned} wminimizesubject to21∥w∥2yi(wTxi+b)≤1,i=1,...,m
主成分分析(PCA)
m i n i m i z e P ∑ i = 1 m ∥ P P T x i − x i ∥ 2 2 s u b j e c t t o P T P = I \begin{aligned} \mathop{\rm minimize}\limits_P &\quad \sum_{i=1}^m \Vert PP^T x_i - x_i\Vert_2^2 \\ {\rm subject\ to} &\quad P^TP = I \end{aligned} Pminimizesubject toi=1∑m∥PPTxi−xi∥22PTP=I
机器学习的任务:
求解参数方法:
交流讨论QQ群:784117704
部分视频观看地址:b站搜索“火力教育”
课件下载地址:QQ群文件(有最新更新) or 百度网盘PDF课件及代码
链接:https://pan.baidu.com/s/1lc8c7yDc30KY1L_ehJAfDg
提取码:u3ls