【国科大模式识别】第二次作业(阉割版)

【题目一】最大似然估计也可以用来估计先验概率。假设样本是连续独立地从自然状态 ω i \omega_i ωi 中抽取的, 每一个自然状态的概率为 P ( ω i ) P\left(\omega_i\right) P(ωi) 。如果第 k k k 个样本的自然状态为 ω i \omega_i ωi, 那么就记 z i k = 1 z_{i k}=1 zik=1, 否则 z i k = 0 z_{i k}=0 zik=0

  1. 证明
    P ( z i 1 , ⋯   , z i n ∣ P ( ω i ) ) = ∏ k = 1 n P ( ω i ) z i k ( 1 − P ( ω i ) ) 1 − z i k P\left(z_{i 1}, \cdots, z_{i n} \mid P\left(\omega_i\right)\right)=\prod_{k=1}^n P\left(\omega_i\right)^{z_{i k}}\left(1-P\left(\omega_i\right)\right)^{1-z_{i k}} P(zi1,,zinP(ωi))=k=1nP(ωi)zik(1P(ωi))1zik

【解】在第 i i i 类的概率为 P ( ω i ) P(\omega_i) P(ωi)的条件下, z i 1 = 1 z_{i1}=1 zi1=1即第一个样本属于第 i i i 类的概率为 P ( ω i ) P(\omega_i) P(ωi);在第 i i i 类的概率为 P ( ω i ) P(\omega_i) P(ωi)的条件下, z i 1 = 0 z_{i1}=0 zi1=0即第一个样本不属于第 i i i 类的概率为 1 − P ( ω i ) 1-P(\omega_i) 1P(ωi)。整理一下得
P ( z i 1 ∣ P ( ω i ) ) = P ( ω i ) z i 1 ( 1 − P ( ω i ) ) 1 − z i 1 P\left(z_{i 1} \mid P\left(\omega_i\right)\right)= P\left(\omega_i\right)^{z_{i 1}}\left(1-P\left(\omega_i\right)\right)^{1-z_{i 1}} P(zi1P(ωi))=P(ωi)zi1(1P(ωi))1zi1
于是
P ( z i 1 , ⋯   , z i n ∣ P ( ω i ) ) = P ( z i 1 ∣ P ( ω i ) ) P ( z i 2 ∣ P ( ω i ) ) … P ( z i n ∣ P ( ω i ) ) = ∏ k = 1 n P ( ω i ) z i k ( 1 − P ( ω i ) ) 1 − z i k \begin{aligned} P\left(z_{i 1}, \cdots, z_{i n} \mid P\left(\omega_i\right)\right) & =P\left(z_{i 1} \mid P\left(\omega_i\right)\right) P\left(z_{i 2} \mid P\left(\omega_i\right)\right) \ldots P\left(z_{i n} \mid P\left(\omega_i\right)\right) \\ &=\prod_{k=1}^n P\left(\omega_i\right)^{z_{i k}}\left(1-P\left(\omega_i\right)\right)^{1-z_{i k}} \end{aligned} P(zi1,,zinP(ωi))=P(zi1P(ωi))P(zi2P(ωi))P(zinP(ωi))=k=1nP(ωi)zik(1P(ωi))1zik

  1. 证明对 P ( ω i ) P\left(\omega_i\right) P(ωi) 的最大似然估计为
    P ^ ( ω i ) = 1 n ∑ k = 1 n z i k \hat{P}\left(\omega_i\right)=\frac{1}{n} \sum_{k=1}^n z_{i k} P^(ωi)=n1k=1nzik
    并且简单解释这个结果。

【解】由 (1) 得对数似然函数:
ln ⁡ P ( z i 1 , ⋯   , z i n ∣ P ( ω i ) ) = ∑ k = 1 n z i k ln ⁡ P ( ω i ) + ∑ k = 1 n ( 1 − z i k ) ln ⁡ ( 1 − P ( ω i ) ) \ln P\left(z_{i 1}, \cdots, z_{i n} \mid P\left(\omega_i\right)\right)=\sum_{k=1}^n z_{i k} \ln P\left(\omega_i\right)+\sum_{k=1}^n\left(1-z_{i k}\right) \ln \left(1-P\left(\omega_i\right)\right) lnP(zi1,,zinP(ωi))=k=1nziklnP(ωi)+k=1n(1zik)ln(1P(ωi))

∂ ln ⁡ P ∂ P ( ω i ) = ∑ k = 1 n z i k 1 P ( ω i ) − ∑ k = 1 n ( 1 − z i k ) 1 1 − P ( ω i ) = 0 \frac{\partial \ln P}{\partial P\left(\omega_i\right)}=\sum_{k=1}^n z_{i k} \frac{1}{P\left(\omega_i\right)}-\sum_{k=1}^n\left(1-z_{i k}\right) \frac{1}{1-P\left(\omega_i\right)}=0 P(ωi)lnP=k=1nzikP(ωi)1k=1n(1zik)1P(ωi)1=0
得:
∑ k = 1 n z i k ( 1 − P ( ω i ) ) − ∑ k = 1 n ( 1 − z i k ) P ( ω i ) = 0 \sum_{k=1}^n z_{i k}\left(1-P\left(\omega_i\right)\right)-\sum_{k=1}^n\left(1-z_{i k}\right) P\left(\omega_i\right)=0 k=1nzik(1P(ωi))k=1n(1zik)P(ωi)=0
化简可得, 最大似然估计为:
P ^ ( ω i ) = 1 n ∑ k = 1 n z i k \hat{P}\left(\omega_i\right)=\frac{1}{n} \sum_{k=1}^n z_{i k} P^(ωi)=n1k=1nzik
该结果表示, 某个类别的先验概率的最大似然估计等于样本中属于该类的样本数在总样本数中的占比。

【题目二】 x x x 的概率密度为均匀分布:
p ( x ∣ θ ) ∼ U ( 0 , θ ) = { 1 / θ , 0 ≤ x ≤ θ 0 ,  otherwise  p(x \mid \theta) \sim U(0, \theta)=\left\{\begin{array}{cc} 1 / \theta, & 0 \leq x \leq \theta \\ 0, & \text { otherwise } \end{array}\right. p(xθ)U(0,θ)={1/θ,0,0xθ otherwise 

  1. 假设 n n n 个样本 D = { x 1 , ⋯   , x n } \mathcal{D}=\left\{x_1, \cdots, x_n\right\} D={x1,,xn} 都独立地服从分布 p ( x ∣ θ ) p(x \mid \theta) p(xθ) 。证明对 于 θ \theta θ 的最大似然估计就是 D \mathcal{D} D 中的最大值 max ⁡ [ D ] \max [\mathcal{D}] max[D]

【解】 n n n 个样本独立同分布, 则:
P ( D ∣ θ ) = ∏ k = 1 n p ( x k ∣ θ ) = { 1 θ n , 0 ≤ x 1 , x 2 , … , x n ≤ θ 0 ,  otherwise  \begin{aligned} P(\mathcal{D} \mid \theta) & =\prod_{k=1}^n p\left(x_k \mid \theta\right) \\ & = \begin{cases}\frac{1}{\theta^n}, & 0 \leq x_1, x_2, \ldots, x_n \leq \theta \\ 0, & \text { otherwise }\end{cases} \end{aligned} P(Dθ)=k=1np(xkθ)={θn1,0,0x1,x2,,xnθ otherwise 

对数似然函数为:
L ( D ∣ θ ) = ln ⁡ ( D ∣ θ ) = = { − n ln ⁡ θ , 0 ≤ x 1 , x 2 , … , x n ≤ θ − ∞ ,  otherwise  L(\mathcal{D} \mid \theta)=\ln (\mathcal{D} \mid \theta)==\left\{\begin{array}{lr} -n \ln \theta, & 0 \leq x_1, x_2, \ldots, x_n \leq \theta \\ -\infty, & \text { otherwise } \end{array}\right. L(Dθ)=ln(Dθ)=={nlnθ,,0x1,x2,,xnθ otherwise 
由于 − n ln ⁡ θ -n \ln \theta nlnθ 是递减的, θ \theta θ 越小,似然函数越大,但是 θ \theta θ 又有限制 0 ≤ x 1 , x 2 , … , x n ≤ θ 0 \leq x_1, x_2, \ldots, x_n \leq \theta 0x1,x2,,xnθ ,因此 θ \theta θ 的极大似然估计为 max ⁡ [ D ] \max [\mathcal{D}] max[D]

  1. 假设从该分布中采样 5 个样本 ( n = 5 ) (n=5) (n=5), 且有 max ⁡ k x k = 0.6 \max _k x_k=0.6 maxkxk=0.6, 画出在区间 0 ≤ θ ≤ 1 0 \leq \theta \leq 1 0θ1 上的似然函数 p ( D ∣ θ ) p(\mathcal{D} \mid \theta) p(Dθ), 并解释为什么此时不需要知道其余四个点的值。

【解】由 (1) 得, 似然函数为:
P ( D ∣ θ ) = { 1 θ 5 0 ≤ x 1 , x 2 , … , x 5 ≤ θ 0 ,  otherwise  P(\mathcal{D} \mid \theta)=\left\{\begin{array}{lr} \frac{1}{\theta^5} & 0 \leq x_1, x_2, \ldots, x_5 \leq \theta \\ 0, & \text { otherwise } \end{array}\right. P(Dθ)={θ510,0x1,x2,,x5θ otherwise 
在区间 [ 0 , 1 ] [0,1] [0,1] 上似然函数 p ( D ∣ θ ) p(\mathcal{D} \mid \theta) p(Dθ) 曲线如图 1 。由于 θ ≥ max ⁡ [ D ] \theta \geq \max [\mathcal{D}] θmax[D], 则无需知道其他四个点的具体值也可以得到似然函数。(不妨设 x 1 = 0.6 x_1=0.6 x1=0.6, 当 θ < 0.6 \theta<0.6 θ<0.6 时, p ( x 1 ∣ θ ) = 0 , p ( D ∣ θ ) = 0 p\left(x_1 \mid \theta\right)=0, p(D \mid \theta)=0 p(x1θ)=0,p(Dθ)=0; 当 θ ≥ 0.6 \theta \geq 0.6 θ0.6 时, p ( D ∣ θ ) = ( 1 θ ) 5 p(D \mid \theta)=\left(\frac{1}{\theta}\right)^5 p(Dθ)=(θ1)5)(我用MATLAB画的)
【国科大模式识别】第二次作业(阉割版)_第1张图片
【题目三】一种度量同一空间中的两个不同分布的距离的方式为 KullbackLeibler 散度 (简称 KL 散度)
D K L ( p 2 ( x ) ∥ p 1 ( x ) ) = ∫ p 2 ( x ) ln ⁡ p 2 ( x ) p 1 ( x ) d x D_{K L}\left(p_2(\mathbf{x}) \| p_1(\mathbf{x})\right)=\int p_2(\mathbf{x}) \ln \frac{p_2(\mathbf{x})}{p_1(\mathbf{x})} d x DKL(p2(x)p1(x))=p2(x)lnp1(x)p2(x)dx这个距离度量并不符合严格意义上的度量必须满足的对称性和三角不等式关系。假设我们使用正态分布 p 1 ( x ) ∼ N ( μ , Σ ) p_1(\mathbf{x}) \sim N(\boldsymbol{\mu}, \Sigma) p1(x)N(μ,Σ) 来近似某一个任意的分布 p 2 ( x ) p_2(\mathbf{x}) p2(x) 。证明能够产生最小的 KL 散度的结果为下面这个明显的结论:
μ = ε 2 [ x ] Σ = ε 2 [ ( x − μ ) ( x − μ ) t ] \begin{aligned} & \boldsymbol{\mu}=\varepsilon_2[\mathbf{x}] \\ & \Sigma=\varepsilon_2\left[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^t\right] \end{aligned} μ=ε2[x]Σ=ε2[(xμ)(xμ)t]其中的数学期望是对概率密度函数 p 2 ( x ) p_2(\mathbf{x}) p2(x) 进行的
【解】带入 p 1 ( x ) p_1(\mathbf{x}) p1(x) 的分布, 可得
D K L ( p 2 ( x ) ∥ p 1 ( x ) ) = ∫ p 2 ( x ) ln ⁡ p 2 ( x ) + 1 2 p 2 ( x ) ( d ln ⁡ 2 π + ln ⁡ ∣ Σ ∣ ) + 1 2 ( x − μ ) t Σ − 1 ( x − μ ) p 2 ( x ) d x \begin{gathered} D_{K L}\left(p_2(\mathbf{x}) \| p_1(\mathbf{x})\right)=\int p_2(\mathbf{x}) \ln p_2(\mathbf{x})+\frac{1}{2} p_2(x)(d \ln 2 \pi+\ln |\Sigma|) \\ +\frac{1}{2}(x-\mu)^t \Sigma^{-1}(x-\mu) p_2(x) d x \end{gathered} DKL(p2(x)p1(x))=p2(x)lnp2(x)+21p2(x)(dln2π+ln∣Σ∣)+21(xμ)tΣ1(xμ)p2(x)dx不考虑无关项, 令
f ( μ , Σ ) = ∫ p 2 ( x ) ( ln ⁡ ∣ Σ ∣ + ( x − μ ) t Σ − 1 ( x − μ ) ) d x f(\mu, \Sigma)=\int p_2(x)\left(\ln |\Sigma|+(x-\mu)^t \Sigma^{-1}(x-\mu)\right) d x f(μ,Σ)=p2(x)(ln∣Σ∣+(xμ)tΣ1(xμ))dx μ , Σ \mu, \Sigma μ,Σ 求偏导数
∂ f ( μ , Σ ) ∂ μ = − ( Σ − 1 + Σ − t ) ( μ − ∫ x p 2 ( x ) d x ) = − ( Σ − 1 + Σ − t ) ( μ − ε 2 [ x ] ) \frac{\partial f(\mu, \Sigma)}{\partial \mu}=-\left(\Sigma^{-1}+\Sigma^{-t}\right)\left(\mu-\int x p_2(x) d x\right)=-\left(\Sigma^{-1}+\Sigma^{-t}\right)\left(\mu-\varepsilon_2[x]\right) μf(μ,Σ)=(Σ1+Σt)(μxp2(x)dx)=(Σ1+Σt)(με2[x]) ∂ f ( μ , Σ ) ∂ Σ = ∫ p 2 ( x ) Σ − t + p 2 ( x ) [ − Σ − t ( x − μ ) ( x − μ ) t Σ − t ] d x = Σ − t ⋅ ∫ p 2 ( x ) [ Σ t − ( x − μ ) ( x − μ ) t ] Σ − t d x = Σ − t ⋅ ( Σ − ε 2 [ ( x − μ ) ( x − μ ) t ] ) Σ − t \begin{aligned} \frac{\partial f(\mu, \Sigma)}{\partial \Sigma} & =\int p_2(x) \Sigma^{-t}+p_2(x)\left[-\Sigma^{-t}(x-\mu)(x-\mu)^t \Sigma^{-t}\right] d x \\ & =\Sigma^{-t} \cdot \int p_2(x)\left[\Sigma^t-(x-\mu)(x-\mu)^t\right] \Sigma^{-t} d x \\ & =\Sigma^{-t} \cdot\left(\Sigma-\varepsilon_2\left[(x-\mu)(x-\mu)^t\right]\right) \Sigma^{-t} \end{aligned} Σf(μ,Σ)=p2(x)Σt+p2(x)[Σt(xμ)(xμ)tΣt]dx=Σtp2(x)[Σt(xμ)(xμ)t]Σtdx=Σt(Σε2[(xμ)(xμ)t])Σt令偏导数为 0 , 可得
μ = ε 2 [ x ] Σ = ε 2 [ ( x − μ ) ( x − μ ) t ] \begin{aligned} \boldsymbol{\mu} & =\varepsilon_2[\mathbf{x}] \\ \Sigma & =\varepsilon_2\left[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^t\right] \end{aligned} μΣ=ε2[x]=ε2[(xμ)(xμ)t]

【题目四】 数据 D = { ( 1 1 ) , ( 3 3 ) , ( 2 ∗ ) } \mathcal{D}=\left\{\left(\begin{array}{l}1 \\ 1\end{array}\right),\left(\begin{array}{l}3 \\ 3\end{array}\right),\left(\begin{array}{l}2 \\ *\end{array}\right)\right\} D={(11),(33),(2)} 中的样本独立地服从二维的分布 p ( x 1 , x 2 ) = p ( x 1 ) p ( x 2 ) p\left(x_1, x_2\right)=p\left(x_1\right) p\left(x_2\right) p(x1,x2)=p(x1)p(x2) 。其中, ∗ * 代表丢失的数据, 且有
p ( x 1 ) = { 1 θ 1 e − x 1 / θ 1 , x 1 ≥ 0 0 ,  otherwise  p\left(x_1\right)=\left\{\begin{array}{l} \frac{1}{\theta_1} e^{-x_1 / \theta_1}, \quad x_1 \geq 0 \\ 0, \quad \text { otherwise } \end{array}\right. p(x1)={θ11ex1/θ1,x100, otherwise  p ( x 2 ) ∼ U ( 0 , θ 2 ) = { 1 θ 2 , 0 ≤ x 2 ≤ θ 0 ,  otherwise  p\left(x_2\right) \sim U\left(0, \theta_2\right)=\left\{\begin{array}{cl} \frac{1}{\theta_2}, & 0 \leq x_2 \leq \theta \\ 0, & \text { otherwise } \end{array}\right. p(x2)U(0,θ2)={θ21,0,0x2θ otherwise 

  1. 假设初始估计为 θ 0 = ( 2 4 ) \boldsymbol{\theta}^0=\left(\begin{array}{c}2 \\ 4\end{array}\right) θ0=(24), 计算 Q ( θ ; θ 0 ) Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right) Q(θ;θ0) (EM 算法中的 E \mathrm{E} E 步)。 注意要对分布进行归一化。

【解】对于 E \mathbf{E} E 步骤:
Q ( θ ; θ 0 ) = E x 32 [ ln ⁡ p ( x g ; x b ; θ ) ∣ θ 0 , D g ] = ∫ − ∞ ∞ [ ln ⁡ p ( x 1 ∣ θ ) + ln ⁡ p ( x 2 ∣ θ ) + ln ⁡ p ( x 3 ∣ θ ) ] p ( x 32 ∣ θ 0 ; x 31 = 2 ) d x 32 = ln ⁡ p ( x 1 ∣ θ ) + ln ⁡ p ( x 2 ∣ θ ) + ∫ − ∞ ∞ ln ⁡ p ( x 3 ∣ θ ) ⋅ p ( x 32 ∣ θ 0 ; x 31 = 2 ) d x 32 = ln ⁡ p ( x 1 ∣ θ ) + ln ⁡ p ( x 2 ∣ θ ) + ∫ − ∞ ∞ ln ⁡ p ( ( 2 x 32 ) ∣ θ ) ⋅ p ( ( 2 x 32 ) ∣ θ 0 ) ∫ − ∞ ∞ p ( ( 2 x 32 ′ ) ∣ θ 0 ) d x 32 ′ ⏟ 1 / ( 2 e 4 ) d x 32 = ln ⁡ p ( x 1 ∣ θ ) + ln ⁡ p ( x 2 ∣ θ ) + 2 e ∫ − ∞ ∞ ln ⁡ p ( ( 2 x 32 ) ∣ θ ) ⋅ p ( ( 2 x 32 ) ∣ θ 0 ) d x 32 = ln ⁡ p ( x 1 ∣ θ ) + ln ⁡ p ( x 2 ∣ θ ) + C (1) \begin{aligned} & Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right)=\mathcal{E}_{x_{32}}\left[\ln p\left(\mathbf{x}_g ; \mathbf{x}_b ; \boldsymbol{\theta}\right) \mid \boldsymbol{\theta}^0, \mathcal{D}_g\right] \\ & =\int_{-\infty}^{\infty}\left[\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_3 \mid \boldsymbol{\theta}\right)\right] p\left(x_{32} \mid \boldsymbol{\theta}^0 ; x_{31}=2\right) \mathrm{d} x_{32} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+\int_{-\infty}^{\infty} \ln p\left(\mathbf{x}_3 \mid \boldsymbol{\theta}\right) \cdot p\left(x_{32} \mid \boldsymbol{\theta}^0 ; x_{31}=2\right) \mathrm{d} x_{32} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+\int_{-\infty}^{\infty} \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot \frac{p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right)}{\underbrace{\int_{-\infty}^{\infty} p\left(\left(\begin{array}{c} 2 \\ x_{32}^{\prime} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32}^{\prime}}_{1 /\left(2 e^4\right)} \mathrm{d} x_{32}} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+2 e \int_{-\infty}^{\infty} \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32} \\ & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+C \\ &\tag{1} \end{aligned} Q(θ;θ0)=Ex32[lnp(xg;xb;θ)θ0,Dg]=[lnp(x1θ)+lnp(x2θ)+lnp(x3θ)]p(x32θ0;x31=2)dx32=lnp(x1θ)+lnp(x2θ)+lnp(x3θ)p(x32θ0;x31=2)dx32=lnp(x1θ)+lnp(x2θ)+lnp((2x32)θ)1/(2e4) p((2x32)θ0)dx32dx32p((2x32)θ0)=lnp(x1θ)+lnp(x2θ)+2elnp((2x32)θ)p((2x32)θ0)dx32=lnp(x1θ)+lnp(x2θ)+C(1)其中, 式 (1) 中的归一化项计算方式为
∫ − ∞ ∞ p ( ( 2 x 32 ′ ) ∣ θ 0 ) d x 32 ′ = ∫ − ∞ ∞ p ( x 31 = 2 ∣ θ 1 0 = 2 ) ⋅ p ( x 32 ′ ∣ θ 2 0 = 4 ) d x 32 ′ = ∫ 0 4 1 2 e − 2 × 2 ⋅ 1 4   d x 32 ′ = 1 2 e 4 (2) \begin{aligned} \int_{-\infty}^{\infty} p\left(\left(\begin{array}{c} 2 \\ x_{32}^{\prime} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32}^{\prime} & =\int_{-\infty}^{\infty} p\left(x_{31}=2 \mid \theta_1^0=2\right) \cdot p\left(x_{32}^{\prime} \mid \theta_2^0=4\right) \mathrm{d} x_{32}^{\prime} \\ & =\int_0^4 \frac{1}{2} e^{-2 \times 2} \cdot \frac{1}{4} \mathrm{~d} x_{32}^{\prime} \\ & =\frac{1}{2 e^4}\tag{2} \end{aligned} p((2x32)θ0)dx32=p(x31=2θ10=2)p(x32θ20=4)dx32=0421e2×241 dx32=2e41(2)根据 θ 2 \theta_2 θ2 分情况, 求式 (1) 中 C C C 的不同取值, 由于已知样本中, max ⁡ x 2 = \max x_2= maxx2= x 22 = 3 x_{22}=3 x22=3, 故: θ 2 ≥ 3 \theta_2 \geq 3 θ23
分类讨论如下:

  • 3 ≤ θ 2 ≤ 4 3 \leq \theta_2 \leq 4 3θ24
    C = 2 e 4 ∫ 0 θ 2 ln ⁡ p ( ( 2 x 32 ) ∣ θ ) ⋅ p ( ( 2 x 32 ) ∣ θ 0 ) d x 32 = 2 e 4 ∫ 0 θ 2 ln ⁡ ( 1 θ 1 e − 2 θ 1 1 θ 2 ) ⋅ 1 2 e − 2 × 2 1 4   d x 32 = 1 4 θ 2 ln ⁡ ( 1 θ 1 e − 2 θ 1 1 θ 2 ) (3) \begin{aligned} C & =2 e^4 \int_0^{\theta_2} \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32} \\ & =2 e^4 \int_0^{\theta_2} \ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right) \cdot \frac{1}{2} e^{-2 \times 2} \frac{1}{4} \mathrm{~d} x_{32} \\ & =\frac{1}{4} \theta_2 \ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right)\tag{3} \end{aligned} C=2e40θ2lnp((2x32)θ)p((2x32)θ0)dx32=2e40θ2ln(θ11e2θ1θ21)21e2×241 dx32=41θ2ln(θ11e2θ1θ21)(3)
  • θ 2 ≥ 4 \theta_2 \geq 4 θ24
    C = 2 e 4 ∫ 0 4 ln ⁡ p ( ( 2 x 32 ) ∣ θ ) ⋅ p ( ( 2 x 32 ) ∣ θ 0 ) d x 32 = 2 e 4 ∫ 0 4 ln ⁡ ( 1 θ 1 e − 2 θ 1 1 θ 2 ) ⋅ 1 2 e − 2 × 2 1 4   d x 32 = ln ⁡ ( 1 θ 1 e − 2 θ 1 1 θ 2 ) (4) \begin{aligned} C & =2 e^4 \int_0^4 \ln p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}\right) \cdot p\left(\left(\begin{array}{c} 2 \\ x_{32} \end{array}\right) \mid \boldsymbol{\theta}^0\right) \mathrm{d} x_{32} \\ & =2 e^4 \int_0^4 \ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right) \cdot \frac{1}{2} e^{-2 \times 2} \frac{1}{4} \mathrm{~d} x_{32} \\ & =\ln \left(\frac{1}{\theta_1} e^{-2 \theta_1} \frac{1}{\theta_2}\right)\tag{4} \end{aligned} C=2e404lnp((2x32)θ)p((2x32)θ0)dx32=2e404ln(θ11e2θ1θ21)21e2×241 dx32=ln(θ11e2θ1θ21)(4)将上述几种情况的 C C C 代入到式 (1) 中, 即可得到
    Q ( θ ; θ 0 ) = ln ⁡ p ( x 1 ∣ θ ) + ln ⁡ p ( x 2 ∣ θ ) + C = ln ⁡ ( 1 θ 1 e − x 11 θ 1 1 θ 2 ) + ln ⁡ ( 1 θ 1 e − x 21 θ 1 1 θ 2 ) + C = ln ⁡ ( 1 θ 1 e − θ 1 1 θ 2 ) + ln ⁡ ( 1 θ 1 e − 3 θ 1 1 θ 2 ) + C = − 4 θ 1 − 2 ln ⁡ ( θ 1 θ 2 ) + C (5) \begin{aligned} Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right) & =\ln p\left(\mathbf{x}_1 \mid \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_2 \mid \boldsymbol{\theta}\right)+C \\ & =\ln \left(\frac{1}{\theta_1} e^{-x_{11} \theta_1} \frac{1}{\theta_2}\right)+\ln \left(\frac{1}{\theta_1} e^{-x_{21} \theta_1} \frac{1}{\theta_2}\right)+C \\ & =\ln \left(\frac{1}{\theta_1} e^{-\theta_1} \frac{1}{\theta_2}\right)+\ln \left(\frac{1}{\theta_1} e^{-3 \theta_1} \frac{1}{\theta_2}\right)+C \\ & =-4 \theta_1-2 \ln \left(\theta_1 \theta_2\right)+C\tag{5} \end{aligned} Q(θ;θ0)=lnp(x1θ)+lnp(x2θ)+C=ln(θ11ex11θ1θ21)+ln(θ11ex21θ1θ21)+C=ln(θ11eθ1θ21)+ln(θ11e3θ1θ21)+C=4θ12ln(θ1θ2)+C(5)式 (5) 中的 C C C 见分类讨论情况式 (3) 和式 (4)。化简可得
    Q ( θ ; θ 0 ) = { − 3 ln ⁡ ( θ 1 θ 2 ) − 6 θ 1 , θ 2 ≥ 4 − ( 2 + θ 2 4 ) ln ⁡ ( θ 1 θ 2 ) − ( 4 + θ 2 2 ) / θ 1 , 3 ≤ θ 2 ≤ 4 Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right)=\left\{\begin{array}{l} -3 \ln \left(\theta_1 \theta_2\right)-\frac{6}{\theta_1}, \quad \theta_2 \geq 4 \\ -\left(2+\frac{\theta_2}{4}\right) \ln \left(\theta_1 \theta_2\right)-\left(4+\frac{\theta_2}{2}\right) / \theta_1, \quad 3 \leq \theta_2 \leq 4 \end{array}\right. Q(θ;θ0)={3ln(θ1θ2)θ16,θ24(2+4θ2)ln(θ1θ2)(4+2θ2)/θ1,3θ24
  1. 求使得 Q ( θ ; θ 0 ) Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right) Q(θ;θ0) 最大的那个 θ ( E M \theta(\mathrm{EM} θ(EM 算法中的 M \mathrm{M} M ) ) )

【解】对于 M \mathrm{M} M 步骤, 估计准则为:
θ ^ = arg ⁡ max ⁡ θ Q ( θ ; θ 0 ) \hat{\boldsymbol{\theta}}=\arg \max _{\boldsymbol{\theta}} Q\left(\boldsymbol{\theta} ; \boldsymbol{\theta}^0\right) θ^=argθmaxQ(θ;θ0)

  • 3 ≤ θ 2 ≤ 4 : 3 \leq \theta_2 \leq 4: 3θ24:
    计算偏导数, 进一步可得 θ = ( 2 3 ) \theta=\left(\begin{array}{l}2 \\ 3\end{array}\right) θ=(23) 时取最优, 此时 Q = Q= Q= − 1 4 ln ⁡ 6 − 13 8 -\frac{1}{4} \ln 6-\frac{13}{8} 41ln6813
  • θ 2 ≥ 4 : \theta_2 \geq 4: θ24:
    计算偏导数, 进一步可得 θ = ( 2 4 ) \theta=\left(\begin{array}{l}2 \\ 4\end{array}\right) θ=(24) 时取最优, 此时 Q = Q= Q= − 3 ln ⁡ 8 − 3 -3 \ln 8-3 3ln83

综合两种情况, θ = ( 2 3 ) \theta=\left(\begin{array}{l}2 \\ 3\end{array}\right) θ=(23), 此时 Q Q Q 最大
(后面还有两题,知识盲区,考了吃屎)

你可能感兴趣的:(人工智能,算法)