X X X为伯努利分布,并且 Pr ( X = 1 ) = 1 − Pr ( X = 0 ) = π \text{Pr}(X = 1) = 1 - \text{Pr}(X = 0) = \pi Pr(X=1)=1−Pr(X=0)=π,并且在给定 X = j ( j = 0 , 1 ) X = j\ \ (j=0,1) X=j (j=0,1)时, Y Y Y的分布为均值 μ j \mu_j μj,方差 σ 2 \sigma^2 σ2。
针对一份完整随机样本 ( x i , y i ) , i = 1 , . . . , n (x_i,y_i),i=1,...,n (xi,yi),i=1,...,n,计算 ( π , μ 0 , μ 1 , σ 2 ) (\pi,\mu_0,\mu_1,\sigma^2) (π,μ0,μ1,σ2)的极大似然估计并计算 Y Y Y的边际均值与方差。
假设现在 X X X是完整的观测,但 Y Y Y有 n − r n-r n−r个值缺失,请使用第七章的方法,计算 Y Y Y的边际均值与方差。
当先验分布表现为 p ( π , μ 0 , μ 1 , log σ 2 ) ∝ π 1 / 2 ( 1 − π ) 1 / 2 p(\pi,\mu_0,\mu_1,\text{log}\sigma^2) \propto \pi^{1/2}(1-\pi)^{1/2} p(π,μ0,μ1,logσ2)∝π1/2(1−π)1/2的形式,描述如何从参数为 ( π , μ 0 , μ 1 , σ 2 ) (\pi,\mu_0,\mu_1,\sigma^2) (π,μ0,μ1,σ2)的后验分布中抽出参数。
(注:前面的逗号均使用全角,后面公式中的逗号为半角,中文字中间的逗号为全角。)
写出联合密度函数,首先列出一个样本时的密度:
f ( x i , y i ∣ μ 0 , μ 1 , σ 2 , π ) = f ( x i ∣ π ) ⋅ f ( y i ∣ x i , μ 0 , μ 1 , σ 2 , π ) = π x i ⋅ ( 1 − π ) 1 − x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 1 ) 2 2 σ 2 } ) x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 0 ) 2 2 σ 2 } ) 1 − x i \begin{aligned} & f(x_i,y_i|\mu_0,\mu_1,\sigma^2,\pi) \\ = & f(x_i | \pi) \cdot f(y_i|x_i, \mu_0,\mu_1,\sigma^2,\pi) \\ = & \pi^{x_i} \cdot (1 - \pi)^{1 - x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_1)^2}{2\sigma^2}\})^{x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_0)^2}{2\sigma^2}\})^{1 - x_i} \\ \end{aligned} ==f(xi,yi∣μ0,μ1,σ2,π)f(xi∣π)⋅f(yi∣xi,μ0,μ1,σ2,π)πxi⋅(1−π)1−xi⋅(2πσ21exp{−2σ2(yi−μ1)2})xi⋅(2πσ21exp{−2σ2(yi−μ0)2})1−xi
n n n个样本的联合密度函数:
f ( X , Y ∣ μ 0 , μ 0 , σ 2 , π ) = ∏ i = 1 n f ( x i ∣ π ) ⋅ f ( y i ∣ x i , μ 0 , μ 1 , σ 2 , π ) = ∏ i = 1 n π x i ⋅ ( 1 − π ) 1 − x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 1 ) 2 2 σ 2 } ) x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 0 ) 2 2 σ 2 } ) 1 − x i \begin{aligned} & f(X,Y|\mu_0,\mu_0,\sigma^2,\pi) \\ = & \prod_{i = 1}^n f(x_i | \pi) \cdot f(y_i|x_i, \mu_0,\mu_1,\sigma^2,\pi) \\ = & \prod_{i = 1}^n \pi^{x_i} \cdot (1 - \pi)^{1 - x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_1)^2}{2\sigma^2}\})^{x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_0)^2}{2\sigma^2}\})^{1 - x_i} \\ \end{aligned} ==f(X,Y∣μ0,μ0,σ2,π)i=1∏nf(xi∣π)⋅f(yi∣xi,μ0,μ1,σ2,π)i=1∏nπxi⋅(1−π)1−xi⋅(2πσ21exp{−2σ2(yi−μ1)2})xi⋅(2πσ21exp{−2σ2(yi−μ0)2})1−xi
对数似然函数:
ln f ( X , Y ∣ μ 0 , μ 1 , σ 2 , π ) = ln ∏ i = 1 n f ( x i ∣ π ) ⋅ f ( y i ∣ x i , μ 0 , μ 1 , σ 2 , π ) = ln ∏ i = 1 n π x i ⋅ ( 1 − π ) 1 − x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 1 ) 2 2 σ 2 } ) x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 0 ) 2 2 σ 2 } ) 1 − x i = ∑ i = 1 n { x i ln π + ( 1 − x i ) ln ( 1 − π ) − x i 2 ln ( 2 π σ 2 ) − x i ( y i − μ 1 ) 2 2 σ 2 − 1 − x i 2 ln ( 2 π σ 2 ) − ( 1 − x i ) ( y i − μ 0 ) 2 2 σ 2 } = ∑ i = 1 n x i ln π + ( n − ∑ i = 1 n x i ) ln ( 1 − π ) − n 2 ln ( 2 π σ 2 ) − ∑ i = 1 n x i ( y i − μ 1 ) 2 2 σ 2 − ∑ i = 1 n ( 1 − x i ) ( y i − μ 0 ) 2 2 σ 2 } \begin{aligned} & \text{ln} f(X,Y|\mu_0,\mu_1,\sigma^2,\pi) \\ = & \text{ln} \prod_{i = 1}^n f(x_i | \pi) \cdot f(y_i|x_i, \mu_0,\mu_1,\sigma^2,\pi) \\ = & \text{ln} \prod_{i = 1}^n \pi^{x_i} \cdot (1 - \pi)^{1 - x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_1)^2}{2\sigma^2}\})^{x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_0)^2}{2\sigma^2}\})^{1 - x_i} \\ = & \sum_{i = 1}^n \{x_i\text{ln} \pi + (1 - x_i) \text{ln} (1 - \pi) - \frac{x_i}{2} \text{ln} (2\pi\sigma^2) -\frac{x_i(y_i - \mu_1)^2}{2\sigma^2} - \frac{1-x_i}{2} \text{ln} (2\pi\sigma^2) -\frac{(1-x_i)(y_i - \mu_0)^2}{2\sigma^2} \} \\ = & \sum_{i = 1}^n x_i\text{ln} \pi + (n - \sum_{i = 1}^n x_i) \text{ln} (1 - \pi) - \frac{n}{2} \text{ln} (2\pi\sigma^2) - \sum_{i = 1}^n \frac{x_i(y_i - \mu_1)^2}{2\sigma^2} - \sum_{i = 1}^n \frac{(1-x_i)(y_i - \mu_0)^2}{2\sigma^2} \} \\ \end{aligned} ====lnf(X,Y∣μ0,μ1,σ2,π)lni=1∏nf(xi∣π)⋅f(yi∣xi,μ0,μ1,σ2,π)lni=1∏nπxi⋅(1−π)1−xi⋅(2πσ21exp{−2σ2(yi−μ1)2})xi⋅(2πσ21exp{−2σ2(yi−μ0)2})1−xii=1∑n{xilnπ+(1−xi)ln(1−π)−2xiln(2πσ2)−2σ2xi(yi−μ1)2−21−xiln(2πσ2)−2σ2(1−xi)(yi−μ0)2}i=1∑nxilnπ+(n−i=1∑nxi)ln(1−π)−2nln(2πσ2)−i=1∑n2σ2xi(yi−μ1)2−i=1∑n2σ2(1−xi)(yi−μ0)2}
对上式求偏导,使其等于 0 0 0即可得极大似然估计
∂ ln f ( X , Y ∣ μ 0 , μ 1 , σ 2 , π ) ∂ μ 1 = ∂ ln f ( X , Y ∣ μ 0 , μ 1 , σ 2 , π ) ∂ μ 0 = ∂ ln f ( X , Y ∣ μ 0 , μ 1 , σ 2 , π ) ∂ σ 2 = ∂ ln f ( X , Y ∣ μ 0 , μ 1 , σ 2 , π ) ∂ π = 0 \begin{aligned} \frac{\partial \text{ln} f(X,Y|\mu_0,\mu_1,\sigma^2,\pi)}{\partial \mu_1} = & \frac{\partial \text{ln}f(X,Y|\mu_0,\mu_1,\sigma^2,\pi)}{\partial \mu_0} \\ = & \frac{\partial \text{ln} f(X,Y|\mu_0,\mu_1,\sigma^2,\pi)}{\partial \sigma^2} \\ = &\frac{\partial \text{ln} f(X,Y|\mu_0,\mu_1,\sigma^2,\pi)}{\partial \pi} \\ = & 0 \end{aligned} ∂μ1∂lnf(X,Y∣μ0,μ1,σ2,π)====∂μ0∂lnf(X,Y∣μ0,μ1,σ2,π)∂σ2∂lnf(X,Y∣μ0,μ1,σ2,π)∂π∂lnf(X,Y∣μ0,μ1,σ2,π)0
可解得:
π ^ = ∑ i = 1 n x i n μ 0 ^ = ∑ i = 1 n ( 1 − x i ) y i ∑ i = 1 n ( 1 − x i ) μ 1 ^ = ∑ i = 1 n x i y i ∑ i = 1 n x i σ 2 ^ = ∑ i = 1 n y i 2 n − [ ∑ i = 1 n ( 1 − x i ) y i ] 2 n ∑ i = 1 n ( 1 − x i ) − ( ∑ i = 1 n x i y i ) 2 n ∑ i = 1 n x i \begin{aligned} \hat{\pi} = & \frac{\sum_{i = 1}^n x_i}{n} \\ \hat{\mu_0} = & \frac{\sum_{i = 1}^n (1 - x_i) y_i}{\sum_{i = 1}^n (1 - x_i)}\\ \hat{\mu_1} = & \frac{\sum_{i = 1}^n x_i y_i}{\sum_{i = 1}^n x_i} \\ \hat{\sigma^2} = & \frac{\sum_{i = 1}^n y_i^2}{n} - \frac{[\sum_{i = 1}^n (1 - x_i) y_i]^2}{n\sum_{i = 1}^n (1 - x_i)} - \frac{(\sum_{i = 1}^n x_i y_i)^2}{n\sum_{i = 1}^n x_i} \\ \end{aligned} π^=μ0^=μ1^=σ2^=n∑i=1nxi∑i=1n(1−xi)∑i=1n(1−xi)yi∑i=1nxi∑i=1nxiyin∑i=1nyi2−n∑i=1n(1−xi)[∑i=1n(1−xi)yi]2−n∑i=1nxi(∑i=1nxiyi)2
由于 σ 2 ^ \hat{\sigma^2} σ2^的计算相对麻烦,这里将其详细的计算过程写出:
∂ ln f ( X , Y ∣ μ 0 , μ 1 , σ 2 , π ) ∂ σ 2 = 0 − n 2 ⋅ − 1 2 π σ 2 ^ ⋅ 2 π + ∑ i = 1 n x i ( y i − μ 1 ^ ) 2 2 ( σ 2 ^ ) 2 + ∑ i = 1 n ( 1 − x i ) ( y i − μ 0 ^ ) 2 2 ( σ 2 ^ ) 2 } = 0 ∑ i = 1 n x i ( y i − ∑ i = 1 n x i y i ∑ i = 1 n x i ) 2 + ∑ i = 1 n ( 1 − x i ) ( y i − ∑ i = 1 n ( 1 − x i ) y i ∑ i = 1 n ( 1 − x i ) ) 2 = n σ 2 ^ ∑ i = 1 n y i 2 n − [ ∑ i = 1 n ( 1 − x i ) y i ] 2 n ∑ i = 1 n ( 1 − x i ) − ( ∑ i = 1 n x i y i ) 2 n ∑ i = 1 n x i = σ 2 ^ \begin{aligned} \frac{\partial \text{ln} f(X,Y|\mu_0,\mu_1,\sigma^2,\pi)}{\partial \sigma^2} = & 0 \\ -\frac{n}{2} \cdot -\frac{1}{2\pi \hat{\sigma^2}}\cdot 2\pi + \frac{ \sum_{i = 1}^n x_i(y_i - \hat{\mu_1})^2}{2(\hat{\sigma^2})^2} + \frac{\sum_{i = 1}^n (1-x_i)(y_i - \hat{\mu_0})^2}{2(\hat{\sigma^2})^2} \} = & 0 \\ \sum_{i = 1}^n x_i(y_i - \frac{\sum_{i = 1}^n x_i y_i}{\sum_{i = 1}^n x_i})^2 + \sum_{i = 1}^n (1-x_i)(y_i - \frac{\sum_{i = 1}^n (1 - x_i) y_i}{\sum_{i = 1}^n (1 - x_i)})^2 = & n\hat{\sigma^2} \\ \frac{\sum_{i = 1}^n y_i^2}{n} - \frac{[\sum_{i = 1}^n (1 - x_i) y_i]^2}{n\sum_{i = 1}^n (1 - x_i)} - \frac{(\sum_{i = 1}^n x_i y_i)^2}{n\sum_{i = 1}^n x_i} = & \hat{\sigma^2} \\ \end{aligned} ∂σ2∂lnf(X,Y∣μ0,μ1,σ2,π)=−2n⋅−2πσ2^1⋅2π+2(σ2^)2∑i=1nxi(yi−μ1^)2+2(σ2^)2∑i=1n(1−xi)(yi−μ0^)2}=i=1∑nxi(yi−∑i=1nxi∑i=1nxiyi)2+i=1∑n(1−xi)(yi−∑i=1n(1−xi)∑i=1n(1−xi)yi)2=n∑i=1nyi2−n∑i=1n(1−xi)[∑i=1n(1−xi)yi]2−n∑i=1nxi(∑i=1nxiyi)2=00nσ2^σ2^
均值与方差为:
将随机变量 X X X求和掉,可求得 Y Y Y的边际分布:
Y = ( 1 − π ) Y 0 + π Y 1 Y = (1-\pi) Y_0 + \pi Y_1 Y=(1−π)Y0+πY1
其中:
Y 0 ∼ N ( μ 0 , σ 2 ) Y 1 ∼ N ( μ 1 , σ 2 ) \begin{aligned} Y_0 \sim & N(\mu_0, \sigma^2) \\ Y_1 \sim & N(\mu_1, \sigma^2) \\ \end{aligned} Y0∼Y1∼N(μ0,σ2)N(μ1,σ2)
对其求期望与方差:
E Y = ( 1 − π ) μ 0 + π μ 1 EY = (1-\pi) \mu_0 + \pi \mu_1 EY=(1−π)μ0+πμ1
V a r ( Y ) = ( 1 − π ) 2 σ 2 + π 2 σ 2 Var(Y) = (1-\pi)^2 \sigma^2 + \pi^2 \sigma^2 Var(Y)=(1−π)2σ2+π2σ2
则 Y Y Y边际均值的估计为:
( 1 − π ^ ) μ 0 ^ + π ^ μ 1 ^ (1-\hat{\pi}) \hat{\mu_0} + \hat{\pi} \hat{\mu_1} (1−π^)μ0^+π^μ1^
边际方差的估计为:
( 1 − π ^ ) 2 σ 2 ^ + π ^ 2 σ 2 ^ (1-\hat{\pi})^2 \hat{\sigma^2} + \hat{\pi}^2 \hat{\sigma^2} (1−π^)2σ2^+π^2σ2^
将前面的估计得到的参数带入即可。
联合密度函数:
f ( X , Y o b s ∣ μ 0 , μ 1 , σ 2 , π ) = ∏ i = 1 r f ( x i , y i ∣ μ 0 , μ 1 , σ 2 , π ) ⋅ ∏ i = r + 1 n f ( x i ∣ π ) = ∏ i = 1 r f ( x i ∣ π ) f ( y i ∣ x i , μ 0 , μ 1 , σ 2 , π ) ⋅ ∏ i = r + 1 n f ( x i ∣ π ) = ∏ i = 1 n f ( x i ∣ π ) ⋅ ∏ i = 1 r f ( y i ∣ x i , μ 0 , μ 1 , σ 2 , π ) = ∏ i = 1 n π x i ( 1 − π ) 1 − x i ⋅ ∏ i = 1 r ( 1 2 π σ 2 exp { − ( y i − μ 1 ) 2 2 σ 2 } ) x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 0 ) 2 2 σ 2 } ) 1 − x i \begin{aligned} & f(X,Y_{obs}|\mu_0,\mu_1,\sigma^2,\pi) \\ = & \prod_{i = 1}^r f(x_i, y_i | \mu_0,\mu_1,\sigma^2,\pi) \cdot \prod_{i = r+1}^n f(x_i|\pi) \\ = & \prod_{i = 1}^r f(x_i | \pi) f(y_i|x_i, \mu_0,\mu_1,\sigma^2,\pi) \cdot \prod_{i = r+1}^n f(x_i|\pi) \\ = & \prod_{i = 1}^n f(x_i|\pi) \cdot \prod_{i = 1}^r f(y_i|x_i, \mu_0,\mu_1,\sigma^2,\pi)\\ = & \prod_{i = 1}^n \pi^{x_i}(1 - \pi)^{1 - x_i} \cdot \prod_{i = 1}^r (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_1)^2}{2\sigma^2}\})^{x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_0)^2}{2\sigma^2}\})^{1 - x_i} \\ \end{aligned} ====f(X,Yobs∣μ0,μ1,σ2,π)i=1∏rf(xi,yi∣μ0,μ1,σ2,π)⋅i=r+1∏nf(xi∣π)i=1∏rf(xi∣π)f(yi∣xi,μ0,μ1,σ2,π)⋅i=r+1∏nf(xi∣π)i=1∏nf(xi∣π)⋅i=1∏rf(yi∣xi,μ0,μ1,σ2,π)i=1∏nπxi(1−π)1−xi⋅i=1∏r(2πσ21exp{−2σ2(yi−μ1)2})xi⋅(2πσ21exp{−2σ2(yi−μ0)2})1−xi
同样对上式求对数与偏导,使其等于 0 0 0,可解得:
π ^ = ∑ i = 1 n x i n μ 1 ^ = ∑ i = 1 r x i y i ∑ i = 1 r x i μ 0 ^ = ∑ i = 1 r ( 1 − x i ) y i ∑ i = 1 r ( 1 − x i ) σ 2 ^ = ∑ i = 1 r y i 2 r − [ ∑ i = 1 r ( 1 − x i ) y i ] 2 r ∑ i = 1 r ( 1 − x i ) − ( ∑ i = 1 r x i y i ) 2 r ∑ i = 1 r x i \begin{aligned} \hat{\pi} = & \frac{\sum_{i = 1}^n x_i}{n} \\ \hat{\mu_1} = & \frac{\sum_{i = 1}^r x_i y_i}{\sum_{i = 1}^r x_i} \\ \hat{\mu_0} = & \frac{\sum_{i = 1}^r (1 - x_i) y_i}{\sum_{i = 1}^r (1 - x_i)}\\ \hat{\sigma^2} = & \frac{\sum_{i = 1}^r y_i^2}{r} - \frac{[\sum_{i = 1}^r (1 - x_i) y_i]^2}{r\sum_{i = 1}^r (1 - x_i)} - \frac{(\sum_{i = 1}^r x_i y_i)^2}{r\sum_{i = 1}^r x_i} \\ \end{aligned} π^=μ1^=μ0^=σ2^=n∑i=1nxi∑i=1rxi∑i=1rxiyi∑i=1r(1−xi)∑i=1r(1−xi)yir∑i=1ryi2−r∑i=1r(1−xi)[∑i=1r(1−xi)yi]2−r∑i=1rxi(∑i=1rxiyi)2
同前面无缺失的情况, Y Y Y边际均值的估计为:
( 1 − π ^ ) μ 0 ^ + π ^ μ 1 ^ (1-\hat{\pi}) \hat{\mu_0} + \hat{\pi} \hat{\mu_1} (1−π^)μ0^+π^μ1^
边际方差的估计为:
( 1 − π ^ ) 2 σ 2 ^ + π ^ 2 σ 2 ^ (1-\hat{\pi})^2 \hat{\sigma^2} + \hat{\pi}^2 \hat{\sigma^2} (1−π^)2σ2^+π^2σ2^
同样将前面的带缺失数据的极大似然估计得到的参数带入即可。
后验分布:
f ( μ 0 , μ 1 , σ 2 , π ∣ X , Y o b s ) ∝ f ( X , Y o b s ∣ μ 0 , μ 1 , σ 2 , π ) ⋅ f ( μ 0 , μ 1 , σ 2 , π ) ∝ ∏ i = 1 n π 1 2 + x i ( 1 − π ) 3 2 − x i ⋅ ∏ i = 1 r ( 1 2 π σ 2 exp { − ( y i − μ 1 ) 2 2 σ 2 } ) x i ⋅ ( 1 2 π σ 2 exp { − ( y i − μ 0 ) 2 2 σ 2 } ) 1 − x i \begin{aligned} & f(\mu_0,\mu_1,\sigma^2,\pi|X,Y_{obs}) \\ \propto & f(X,Y_{obs}|\mu_0,\mu_1,\sigma^2,\pi) \cdot f(\mu_0,\mu_1,\sigma^2,\pi) \\ \propto & \prod_{i = 1}^n \pi^{\frac{1}{2} + x_i}(1 - \pi)^{\frac{3}{2} - x_i} \cdot \prod_{i = 1}^r (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_1)^2}{2\sigma^2}\})^{x_i} \cdot (\frac{1}{\sqrt{2\pi\sigma^2}}\text{exp}\{-\frac{(y_i - \mu_0)^2}{2\sigma^2}\})^{1 - x_i} \\ \end{aligned} ∝∝f(μ0,μ1,σ2,π∣X,Yobs)f(X,Yobs∣μ0,μ1,σ2,π)⋅f(μ0,μ1,σ2,π)i=1∏nπ21+xi(1−π)23−xi⋅i=1∏r(2πσ21exp{−2σ2(yi−μ1)2})xi⋅(2πσ21exp{−2σ2(yi−μ0)2})1−xi
我们可以类似书上的141页进行参数任意函数 g d g_d gd的生成: