机器学习-白板推导 P2_2

机器学习-白板推导 P2_2

  • 多维高斯分布

多维高斯分布

p ( x ) = 1 2 π p 2 exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(x) = \frac{1}{{2 \pi}^{\frac{p}{2}} } \exp \left( - \frac{1}{2} (x - \mu)^T \Sigma^{-1 } (x - \mu) \right) p(x)=2π2p1exp(21(xμ)TΣ1(xμ))
x ∈ R p x \in R^p xRp,是随机变量
x = [ x 1 x 2 ⋮ x p ] μ = [ μ 1 μ 2 ⋮ μ p ] Σ = [ σ 11 σ 12 ⋯ σ 1 p σ 21 σ 22 ⋯ σ 2 p ⋮ ⋮ ⋱ ⋮ σ p 1 σ p 2 ⋯ σ p p ] p × p 通 常 Σ 是 半 正 定 的 x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \qquad \mu= \begin{bmatrix} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{p} \end{bmatrix} \qquad \Sigma= \begin{bmatrix} \sigma_{11} & \sigma_{12} & \cdots &\sigma_{1p} \\ \sigma_{21} & \sigma_{22} & \cdots&\sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots &\sigma_{pp} \end{bmatrix}_{p \times p } 通常\Sigma是半正定的 x=x1x2xpμ=μ1μ2μpΣ=σ11σ21σp1σ12σ22σp2σ1pσ2pσppp×pΣ
( x − μ ) T Σ − 1 ( x − μ ) (x-\mu)^T\Sigma^{-1}(x-\mu) (xμ)TΣ1(xμ):马氏距离( x x x μ \mu μ之间) mahalabonis distance
如果: Σ = I \Sigma = I Σ=I,马氏距离=欧氏距离

Σ \Sigma Σ特征分解
Σ = U Λ U T U U T = U T U = I Λ = d i a g ( λ i )    i = 1 , . . . p U = ( u 1 , u 2 . . . u p ) p ∗ p \begin{aligned} &\Sigma=U \Lambda U^T\\ & UU^T=U^TU=I\\ & \Lambda=diag(\lambda_i)\; i=1,...p \\ & U=(u_1,u_2...u_p)_{p*p} \end{aligned} Σ=UΛUTUUT=UTU=IΛ=diag(λi)i=1,...pU=(u1,u2...up)pp
Σ = [ u 1 u 2 ⋯ u p ] [ λ 1 0 ⋯ 0 0 λ 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 ⋯ ⋯ λ p ] [ u 1 T u 2 T ⋮ u p T ] = ∑ i = 1 p μ i λ i μ i T \begin{aligned} \Sigma &= \begin{bmatrix} u_1 &u_2 & \cdots &u_p \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & \cdots & \lambda_p \\ \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \\ \vdots \\ u_p^T \end{bmatrix} \\ &= \sum_{i=1}^p \mu_i \lambda_i \mu_i^T \end{aligned} Σ=[u1u2up]λ1000λ200λpu1Tu2TupT=i=1pμiλiμiT
Σ − 1 = ( U Λ U T ) − 1 = ( U T ) − 1 Λ − 1 U − 1 = U Λ − 1 U T = ∑ i = 1 p μ i 1 λ i μ i T Λ − 1 = d i a g ( 1 λ i )    i = 1 , . . . p \begin{aligned} &\Sigma^{-1}=(U \Lambda U^T)^{-1} = (U^T)^{-1} \Lambda^{-1} U^{-1} = U\Lambda^{-1} U^T= \sum_{i=1}^p \mu_i \frac{1}{\lambda_i} \mu_i^T\\ &\Lambda^{-1} = diag(\frac{1}{\lambda_i})\; i=1,...p \end{aligned} Σ1=(UΛUT)1=(UT)1Λ1U1=UΛ1UT=i=1pμiλi1μiTΛ1=diag(λi1)i=1,...p

定 义 : y = [ x 1 x 2 ⋮ x p ] 令 : y i = ( x − μ ) T u i 定义: y = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \qquad 令:y_i =(x-\mu)^Tu_i :y=x1x2xpyi=(xμ)Tui

Δ = ( x − μ ) T Σ − 1 ( x − μ ) = ( x − μ ) T ∑ i = 1 p μ i 1 λ i μ i T ( x − μ ) = ∑ i = 1 p ( x − μ ) T μ i 1 λ i μ i T ( x − μ ) = ∑ i = 1 p y i 1 λ i y i T = ∑ i = 1 p y i 2 λ i \begin{aligned} \Delta = &(x-\mu)^T\Sigma^{-1}(x-\mu) = (x-\mu)^T \sum_{i=1}^p \mu_i \frac{1}{\lambda_i} \mu_i^T (x-\mu) \\ &= \sum_{i=1}^p (x-\mu)^T \mu_i \frac{1}{\lambda_i} \mu_i^T (x-\mu) \\ & = \sum_{i=1}^p y_i \frac{1}{\lambda_i} y_i^T \\ & = \sum_{i=1}^p \frac{y_i^2}{\lambda_i} \end{aligned} Δ=(xμ)TΣ1(xμ)=(xμ)Ti=1pμiλi1μiT(xμ)=i=1p(xμ)Tμiλi1μiT(xμ)=i=1pyiλi1yiT=i=1pλiyi2

p = 2 p=2 p=2 Δ = y 1 2 λ 1 + y 2 2 λ 2 \Delta=\frac{y_1^2}{\lambda_1}+\frac{y_2^2}{\lambda_2} Δ=λ1y12+λ2y22
假设 Δ = y 1 2 λ 1 + y 2 2 λ 2 = 1 \Delta=\frac{y_1^2}{\lambda_1}+\frac{y_2^2}{\lambda_2}=1 Δ=λ1y12+λ2y22=1,图形为椭圆。
y i = ( x − μ ) T u i y_i =(x - \mu)^Tu_i yi=(xμ)Tui物理解释, x − μ x - \mu xμ u i u_i ui方向上的投影。
机器学习-白板推导 P2_2_第1张图片

你可能感兴趣的:(#,机器学习白板推导)