机器学习基础:极大似然估计高斯参数

机器学习基础:极大似然估计

    • 数据
    • 参数估计均值
    • 参数估计方差

数据

设DataSet:X={(x1,y1),(x2,y2),(x3,y3)…(xn,yn))},其中 x i ∈ R p xi \in R^p xiRp y i ∈ R yi \in R yiR,也就是说X= ( x 1 , x 2 , x 3..... x n ) T (x1,x2,x3.....xn)^T x1,x2,x3.....xnT,其中这里,每个元素x
都是一个p维的列向量,我们继续化简,X= [ x 11 x 12 . . . x 1 p x 21 x 22 . . . x 2 p . . . . . . x n 1 x n 2 . . . x n p ] (1) \left[ \begin{matrix} x_{11} & x_{12} &... x_{1p} \\ x_{21} & x_{22} &... x_{2p} \\ \\...... \\x_{n1} & x_{n2} &... x_{_{np}} \end{matrix} \right]\tag{1} x11x21......xn1x12x22xn2...x1p...x2p...xnp(1)
Y= [ y 1 y 2 . . . . . . y n ] (2) \left[ \begin{matrix} y_{1} \\ y_{2} \\ \\...... \\y_n \end{matrix} \right]\tag{2} y1y2......yn(2)

x i ∈ R p x_i\in R^p xiRp,每个元素x_i服从独立同分布,本文当中,为了方便计算,我们令p=1。设 θ = ( μ , σ 2 ) \theta=(\mu,\sigma^2) θ=(μ,σ2)一维标准高斯分布的pdf(概率密度函数): P ( X ∣ θ ) = 1 σ 2 Π + e x p ( − ( X − μ ) 2 2 σ 2 ) P(X|\theta) = \frac{1}{\sigma \sqrt{2\Pi}}+exp(-\frac{(X-\mu)^2}{2\sigma^2}) P(Xθ)=σ2Π 1+exp(2σ2(Xμ)2)

参数估计均值

ln ⁡ θ M L E = a r g m a x ln ⁡ P ( X ∣ θ ) \ln\theta _{MLE}=argmax \ln P(X|\theta) lnθMLE=argmaxlnP(Xθ)
= a r g m a x ∏ i = 1 N ln ⁡ P ( x i ∣ θ ) =argmax\prod_{i=1}^N \ln P(x_i|\theta) =argmaxi=1NlnP(xiθ)
= a r g n a x ln ⁡ ∑ i = 1 N P ( x i ∣ θ ) = argnax\ln\sum_{i=1}^NP(xi|\theta) =argnaxlni=1NP(xiθ)
= a r g m a x ln ⁡ ∑ i = 1 N ( 1 σ 2 Π + e x p ( − ( x i − μ ) 2 2 σ 2 ) ) =argmax\ln\sum_{i=1}^N(\frac{1}{\sigma \sqrt{2\Pi}}+exp(-\frac{(x_i-\mu)^2}{2\sigma^2})) =argmaxlni=1N(σ2Π 1+exp(2σ2(xiμ)2))
= a r g m a x ∑ i = 1 N ( ln ⁡ 1 2 Π − ln ⁡ σ − ( x i − μ ) 2 2 σ 2 ) ) ) =argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2}))) =argmaxi=1N(ln2 Π1lnσ2σ2(xiμ)2)))
化简到这里,我们的目标函数 L ( θ ) L(\theta) L(θ)就化简完成了。
下面我们分别对 μ , σ \mu,\sigma μ,σ进行参数估计。
ln ⁡ μ M L E = = a r g m a x ∑ i = 1 N ( ln ⁡ 1 2 Π − ln ⁡ σ − ( x i − μ ) 2 2 σ 2 ) ) ) \ln\mu_{MLE}= =argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2}))) lnμMLE==argmaxi=1N(ln2 Π1lnσ2σ2(xiμ)2)))
因为其他两项都和系数 μ \mu μ无关,在求解偏导数的时候可以约去,所以:
ln ⁡ μ M L E = a r g m a x ∑ i = 1 N − ( x i − μ ) 2 2 σ 2 \ln\mu_{MLE}=argmax\sum_{i=1}^N-\frac{(x_i-\mu)^2}{2\sigma^2} lnμMLE=argmaxi=1N2σ2(xiμ)2
= a r g m i n ∑ i = 1 N ( x i − μ ) 2 =argmin\sum_{i=1}^N(x_i-\mu)^2 =argmini=1N(xiμ)2
= ∂ ∂ μ ∑ i = 1 N ( x i 2 − 2 x i μ + μ 2 ) = 0 =\frac{\partial}{\partial \mu}\sum_{i=1}^N(x_i^2-2x_i\mu+\mu^2)=0 =μi=1N(xi22xiμ+μ2)=0
= ∑ i = 1 N ( − 2 x i + 2 μ ) = 0 =\sum_{i=1}^N(-2x_i+2\mu)=0 =i=1N(2xi+2μ)=0
∑ i = 1 N x i = N μ \sum_{i=1}^Nx_i=N\mu i=1Nxi=Nμ
μ M L E = 1 N ∑ i = 1 N x i \mu_{MLE}=\frac{1}{N}\sum_{i=1}^Nx_i μMLE=N1i=1Nxi
因为
E [ μ M L E ] = 1 N ∑ i = 1 N E [ x i ] = 1 N ∑ i = 1 N μ = μ E[\mu_{MLE}]=\frac{1}{N}\sum_{i=1}^NE[x_i]=\frac{1}{N}\sum_{i=1}^N\mu=\mu E[μMLE]=N1i=1NE[xi]=N1i=1Nμ=μ
所以此结果为无偏估计

参数估计方差

我们上面求出来的L(X)带入到这里
ln ⁡ θ M L E = a r g m a x ln ⁡ P ( X ∣ θ ) \ln\theta _{MLE}=argmax \ln P(X|\theta) lnθMLE=argmaxlnP(Xθ)
= a r g m a x ∏ i = 1 N ln ⁡ P ( x i ∣ θ ) =argmax\prod_{i=1}^N \ln P(x_i|\theta) =argmaxi=1NlnP(xiθ)
= a r g n a x ln ⁡ ∑ i = 1 N P ( x i ∣ θ ) = argnax\ln\sum_{i=1}^NP(xi|\theta) =argnaxlni=1NP(xiθ)
= a r g m a x ln ⁡ ∑ i = 1 N ( 1 σ 2 Π + e x p ( − ( x i − μ ) 2 2 σ 2 ) ) =argmax\ln\sum_{i=1}^N(\frac{1}{\sigma \sqrt{2\Pi}}+exp(-\frac{(x_i-\mu)^2}{2\sigma^2})) =argmaxlni=1N(σ2Π 1+exp(2σ2(xiμ)2))
= a r g m a x ∑ i = 1 N ( ln ⁡ 1 2 Π − ln ⁡ σ − ( x i − μ ) 2 2 σ 2 ) =argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2}) =argmaxi=1N(ln2 Π1lnσ2σ2(xiμ)2)

σ M L E 2 = a r g m a x ∑ i = 1 N ( ln ⁡ 1 2 Π − ln ⁡ σ − ( x i − μ ) 2 2 σ 2 ) \sigma^2_{MLE}=argmax\sum_{i=1}^N(\ln\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2}) σMLE2=argmaxi=1N(ln2 Π1lnσ2σ2(xiμ)2)
= ∂ ∂ σ ∑ i = 1 N ( 1 2 Π − ln ⁡ σ − ( x i − μ ) 2 2 σ 2 ) = 0 =\frac{\partial}{\partial \sigma}\sum_{i=1}^N(\frac{1}{\sqrt2\Pi}-\ln\sigma-\frac{(x_i-\mu)^2}{2\sigma^2})=0 =σi=1N(2 Π1lnσ2σ2(xiμ)2)=0
= ∑ i = 1 N ( − 1 σ − ( − 2 ) σ − 3 ( x i − μ ) 2 2 ) = 0 =\sum_{i=1}^N(-\frac{1}{\sigma}-(-2)\sigma^{-3}\frac{(x_i-\mu)^2} {2})=0 =i=1N(σ1(2)σ32(xiμ)2)=0
左右同时✖️ σ 3 \sigma^3 σ3
= ∑ i = 1 N ( ( − σ ) 2 + ( x i − μ ) 2 ) = 0 =\sum_{i=1}^N((-\sigma)^2+(x_i-\mu)^2)=0 =i=1N((σ)2+(xiμ)2)=0
σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 = 0 \sigma^2_{MLE}=\frac{1}{N}\sum_{i=1}^N(x_i-\mu)^2=0 σMLE2=N1i=1N(xiμ)2=0
因 为 E [ σ M L E 2 ] = N − 1 N σ 2 因为E[\sigma_{MLE}^2]=\frac{N-1}{N}\sigma^2 E[σMLE2]=NN1σ2
所以此结果为有偏估计

你可能感兴趣的:(机器学习,线性代数,概率论,机器学习)