【机器学习笔记05】Jacobian矩阵&Hessian矩阵

【参考资料】
【1】《矩阵分析与应用》
【2】https://baike.baidu.com/item/雅可比矩阵/10753754?fr=aladdin

Jacobian矩阵

Jacobian矩阵是函数对向量求导,其结果是一阶偏导数组成的矩阵。假设: F : R n → R m F:R_n \to R_m F:RnRm也就是一个n维欧式空间向m维欧式空间的一个映射。

举例:

  1. 由球坐标系转换到直角坐标系,存在映射形式化表示如下:

R × [ 0 , π ] × [ 0 , 2 π ] → R 3 R \times [0, \pi] \times [0, 2\pi] \to R^3 R×[0,π]×[0,2π]R3

原始坐标:
x 1 = r c o s θ s i n ϕ x_1 = rcos\theta sin\phi x1=rcosθsinϕ
x 2 = r s i n θ c o s ϕ x_2 = rsin\theta cos\phi x2=rsinθcosϕ
x 3 = r c o s ϕ x_3 = rcos\phi x3=rcosϕ
其Jacobian矩阵转换后如下:
J F ( r , θ , ϕ ) = [ ∂ x 1 ∂ r ∂ x 1 ∂ θ ∂ x 1 ∂ ϕ ∂ x 2 ∂ r ∂ x 2 ∂ θ ∂ x 2 ∂ ϕ ∂ x 3 ∂ r ∂ x 3 ∂ θ ∂ x 3 ∂ ϕ ] = [ c o s θ s i n ϕ − r s i n θ s i n ϕ r c o s θ c o s ϕ s i n θ c o s ϕ r c o s θ c o s ϕ − r s i n θ s i n ϕ c o s ϕ 0 − r s i n ϕ ] J_F(r, \theta, \phi)=\begin{bmatrix} \dfrac{\partial x_1}{\partial r} & \dfrac{\partial x_1}{\partial \theta} & \dfrac{\partial x_1}{\partial \phi} \\ \dfrac{\partial x_2}{\partial r} & \dfrac{\partial x_2}{\partial \theta} & \dfrac{\partial x_2}{\partial \phi} \\ \dfrac{\partial x_3}{\partial r} & \dfrac{\partial x_3}{\partial \theta} & \dfrac{\partial x_3}{\partial \phi} \\ \end{bmatrix}=\begin{bmatrix} cos\theta sin\phi & -rsin\theta sin\phi & rcos\theta cos\phi \\ sin\theta cos\phi & rcos\theta cos\phi & -rsin\theta sin\phi \\ cos\phi& 0 & -rsin\phi \\ \end{bmatrix} JF(r,θ,ϕ)=rx1rx2rx3θx1θx2θx3ϕx1ϕx2ϕx3=cosθsinϕsinθcosϕcosϕrsinθsinϕrcosθcosϕ0rcosθcosϕrsinθsinϕrsinϕ
2. 存在 R 4 R^4 R4的函数如下(Jacobian矩阵不一定是方阵):
y 1 = x 1 y_1 = x_1 y1=x1
y 2 = 5 x 3 y_2 = 5x_3 y2=5x3
y 3 = 4 x 2 2 − 2 x 3 y_3 = 4x_2^2-2x_3 y3=4x222x3
y 4 = x 3 s i n x 1 y_4 = x_3sinx_1 y4=x3sinx1

J F ( x 1 , x 2 , x 3 ) = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 1 ∂ y 3 ∂ x 2 ∂ y 3 ∂ x 3 ∂ y 4 ∂ x 1 ∂ y 4 ∂ x 2 ∂ y 4 ∂ x 3 ] [ 1 0 0 0 0 5 0 b x 2 − 2 x 3 c o s x 1 0 s i n x 1 ] J_F(x_1, x_2, x_3)=\begin{bmatrix} \dfrac{\partial y_1}{\partial x_1} & \dfrac{\partial y_1}{\partial x_2} & \dfrac{\partial y_1}{\partial x_3}\\ \dfrac{\partial y_2}{\partial x_1} & \dfrac{\partial y_2}{\partial x_2} & \dfrac{\partial y_2}{\partial x_3}\\ \dfrac{\partial y_3}{\partial x_1} & \dfrac{\partial y_3}{\partial x_2} & \dfrac{\partial y_3}{\partial x_3}\\ \dfrac{\partial y_4}{\partial x_1} & \dfrac{\partial y_4}{\partial x_2} & \dfrac{\partial y_4}{\partial x_3} \end{bmatrix}\begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 5 \\ 0 & bx_2 & -2 \\ x_3cosx_1 & 0 & sinx_1 \end{bmatrix} JF(x1,x2,x3)=x1y1x1y2x1y3x1y4x2y1x2y2x2y3x2y4x3y1x3y2x3y3x3y4100x3cosx100bx20052sinx1

Hessian矩阵

Hessian矩阵是由目标函数f在点X处的二阶偏导数组成的n阶对称矩阵。

函数f在 x 0 x_0 x0处求梯度如下:
▽ f ( X ( 0 ) ) = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ] T \bigtriangledown f(X^{(0)})=[\dfrac{\partial f}{\partial x_1}, \dfrac{\partial f}{\partial x_2},\cdots, \dfrac{\partial f}{\partial x_n}]^T f(X(0))=[x1f,x2f,,xnf]T

函数f在 x 0 x_0 x0处的Hessian矩阵如下:

G ( X ( 0 ) ) = [ ∂ 2 f ∂ x 1 2 , ∂ 2 f ∂ x 1 ∂ x 2 , ⋯ ∂ 2 f ∂ x 1 ∂ x n , ∂ 2 f ∂ x 2 ∂ x 1 ∂ 2 f ∂ x 2 2 , ⋯ ∂ 2 f ∂ x 2 ∂ x n , ⋯ ⋯ ⋯ ⋯ ∂ 2 f ∂ x n ∂ x 1 ∂ 2 f ∂ x n ∂ x 2 , ⋯ ∂ 2 f ∂ x n 2 ] G(X^{(0)})=\begin{bmatrix} \dfrac{\partial ^2 f}{\partial x_1^2, } & \dfrac{\partial ^2 f}{\partial x_1 \partial x_2, } & \cdots & \dfrac{\partial ^2 f}{\partial x_1 \partial x_n, } \\ \dfrac{\partial ^2 f}{\partial x_2 \partial x_1 } & \dfrac{\partial ^2 f}{\partial x_2^2, } & \cdots & \dfrac{\partial ^2 f}{\partial x_2 \partial x_n, } \\ \cdots & \cdots & \cdots & \cdots \\ \dfrac{\partial ^2 f}{\partial x_n \partial x_1} & \dfrac{\partial ^2 f}{\partial x_n \partial x_2, } & \cdots & \dfrac{\partial ^2 f}{\partial x_n^2 } \\ \end{bmatrix} G(X(0))=x12,2fx2x12fxnx12fx1x2,2fx22,2fxnx2,2fx1xn,2fx2xn,2fxn22f

存在如下结论:

当G是正定矩阵时,函数f在 X 0 X_0 X0处有极小值
当G是负定矩阵时,函数f在 X 0 X_0 X0处有极大值
当G是不定矩阵时,函数f在 X 0 X_0 X0不是极值点

Hession矩阵可以作为二阶泰勒级数在多元下的展开,因此可以使用在牛顿法求极值。

备注:

正定矩阵:定义A是n阶方阵,对于任何非零向量x,都存在 x T A x > 0 x^TAx>0 xTAx>0。如果A是正定矩阵,其特征值都为正,所有顺序主子式都为正。

你可能感兴趣的:(机器学习,机器学习笔记)