预测值与误差: y ( i ) = θ T x ( i ) + ε ( i ) ( 1 ) y^{(i)}=\theta ^{T} x^{(i)}+\varepsilon ^{(i)} \quad (1) y(i)=θTx(i)+ε(i)(1)
由于误差服从高斯分布: ρ ( ε ( i ) ) = 1 2 π σ e x p ( − ( ε ( i ) ) 2 2 σ 2 ) ( 2 ) \rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(\varepsilon ^{(i)})^2}{2\sigma ^2} )\quad(2) ρ(ε(i))=2πσ1exp(−2σ2(ε(i))2)(2)
将(1)带入(2),可得:
ρ ( ε ( i ) ) = 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) \rho (\varepsilon ^{(i)})=\frac{1}{\sqrt{2\pi} \sigma } exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma ^2} ) ρ(ε(i))=2πσ1exp(−2σ2(y(i)−θTx(i))2)
似然函数: L ( θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) L(θ)=i=1∏mp(y(i)∣x(i);θ)=i=1∏m2πσ1exp(−2σ2(y(i)−θTx(i))2)
解释:什么样的参数和数据组合恰好就是真实值 以此来解释似然函数
对数似然:将两边Log一下,转化为加法:
log L ( θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = log ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) \log L(\theta) = \prod_{i=1}^{m} p(y^{(i)}|x^{(i)};\theta ) = \log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) logL(θ)=i=1∏mp(y(i)∣x(i);θ)=logi=1∏m2πσ1exp(−2σ2(y(i)−θTx(i))2)
展开式: log ∏ i = 1 m 1 2 π σ e x p ( − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ) = m log 1 2 π σ − 1 σ 2 . 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 \log \prod_{i=1}^{m}\frac{1}{\sqrt{2\pi } \sigma }exp(-\frac{(y^{(i)}-\theta ^{T}x^{(i)})^2}{2\sigma^2 } ) = m\log \frac{1}{\sqrt[]{2\pi }\sigma } -\frac{1}{\sigma ^2 } .\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 logi=1∏m2πσ1exp(−2σ2(y(i)−θTx(i))2)=mlog2πσ1−σ21.21i=1∑m(y(i)−θTx(i))2
故:极大似然:保证似然函数越大越好,在上述推导中,除去常数项方程为:
− 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 -\frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 −21i=1∑m(y(i)−θTx(i))2
所以保证: J ( θ ) = 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 J(θ)=21i=1∑m(y(i)−θTx(i))2最小即可
J ( θ ) = 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 = 1 2 ( X θ − y ) T ( X θ − y ) J(\theta ) = \frac{1}{2} \sum_{i=1}^{m} (y^{(i)}-\theta ^{T}x^{(i)})^2 = \frac{1}{2}(X\theta -y)^{T}(X\theta -y) J(θ)=21i=1∑m(y(i)−θTx(i))2=21(Xθ−y)T(Xθ−y)
∇ θ J ( θ ) = ∇ θ ( 1 2 ( X θ − y ) T ( X θ − y ) ) = ∇ θ ( 1 2 ( θ T X T − y T ) ( X θ − y ) ) \nabla _ \theta J(\theta ) = \nabla _\theta (\frac{1}{2}(X\theta -y)^{T}(X\theta -y) )=\nabla _\theta (\frac{1}{2}(\theta ^{T}X^{T}-y^{T})(X\theta -y) ) ∇θJ(θ)=∇θ(21(Xθ−y)T(Xθ−y))=∇θ(21(θTXT−yT)(Xθ−y))
= ∇ θ ( 1 2 ( θ T X T X θ − θ T X T y − y T X θ + y T y ) ) = \nabla _\theta (\frac{1}{2}(\theta^{T}X^{T}X\theta -\theta ^{T}X^{T}y-y^{T}X\theta +y^{T}y ) ) =∇θ(21(θTXTXθ−θTXTy−yTXθ+yTy))
= 1 2 ( 2 X T X θ − X T y − ( y T X ) T ) = X T X θ − X T y =\frac{1}{2} (2X^{T}X\theta -X^{T}y-(y^{T}X)^{T})=X^{T}X\theta -X^{T}y =21(2XTXθ−XTy−(yTX)T)=XTXθ−XTy
当偏导等于0: θ = ( X T X ) − 1 X T y \theta = (X^{T}X)^{-1}X^{T}y θ=(XTX)−1XTy