机器学习常见算法推导过程

机器学习

线性回归

预测值与误差: y ( i ) = θ T x ( i ) + ε ( i ) ( 1 ) y^{(i)}=\theta ^{T} x^{(i)}+\varepsilon ^{(i)} \quad (1) y(i)=θTx(i)+ε(i)1
由于误差服从高斯分布: ρ ( ε ( i ) ) = 1 2 π σ e x p ( − ( ε ( i ) ) 2 2 σ 2 ) ( 2 ) \rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(\varepsilon ^{(i)})^2}{2\sigma ^2} )\quad(2) ρ(ε(i))=2π σ1exp(2σ2(ε(i))2)2

将(1)带入(2),可得:

ρ ( ε ( i ) ) = 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) \rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma ^2} ) ρ(ε(i))=2π σ1exp(2σ2(y(i)θTx(i))2)

似然函数: L ( θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) L(θ)=i=1mp(y(i)x(i);θ)=i=1m2π σ1exp(2σ2(y(i)θTx(i))2)

解释:什么样的参数和数据组合恰好就是真实值 以此来解释似然函数

对数似然:将两边Log一下,转化为加法:
log ⁡ L ( θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = log ⁡ ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) \log L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) logL(θ)=i=1mp(y(i)x(i);θ)=logi=1m2π σ1exp(2σ2(y(i)θTx(i))2)

展开式: log ⁡ ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) = m log ⁡ 1 2 π σ − 1 σ 2 . 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 \log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) = m\log \frac{1}{\sqrt[]{2\pi }\sigma } -\frac{1}{\sigma ^2 } .\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 logi=1m2π σ1exp(2σ2(y(i)θTx(i))2)=mlog2π σ1σ21.21i=1m(y(i)θTx(i))2

故:极大似然:保证似然函数越大越好,在上述推导中,除去常数项方程为:
− 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 -\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 21i=1m(y(i)θTx(i))2

所以保证: J ( θ ) = 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 J(θ)=21i=1m(y(i)θTx(i))2最小即可

J ( θ ) = 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 = 1 2 ( X θ − y ) T ( X θ − y ) J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 = \frac{1}{2}(X\theta -y)^{T}(X\theta -y) J(θ)=21i=1m(y(i)θTx(i))2=21(Xθy)T(Xθy)

∇ θ J ( θ ) = ∇ θ ( 1 2 ( X θ − y ) T ( X θ − y ) ) = ∇ θ ( 1 2 ( θ T X T − y T ) ( X θ − y ) ) \nabla _ \theta J(\theta ) = \nabla _\theta (\frac{1}{2}(X\theta -y)^{T}(X\theta -y) )=\nabla _\theta (\frac{1}{2}(\theta ^{T}X^{T}-y^{T})(X\theta -y) ) θJ(θ)=θ(21(Xθy)T(Xθy))=θ(21(θTXTyT)(Xθy))

= ∇ θ ( 1 2 ( θ T X T X θ − θ T X T y − y T X θ + y T y ) ) = \nabla _\theta (\frac{1}{2}(\theta^{T}X^{T}X\theta -\theta ^{T}X^{T}y-y^{T}X\theta +y^{T}y ) ) =θ(21(θTXTXθθTXTyyTXθ+yTy))

= 1 2 ( 2 X T X θ − X T y − ( y T X ) T ) = X T X θ − X T y =\frac{1}{2} (2X^{T}X\theta -X^{T}y-(y^{T}X)^{T})=X^{T}X\theta -X^{T}y =21(2XTXθXTy(yTX)T)=XTXθXTy

当偏导等于0: θ = ( X T X ) − 1 X T y \theta = (X^{T}X)^{-1}X^{T}y θ=(XTX)1XTy


你可能感兴趣的:(机器学习,python,机器学习)