L ( μ 1 , μ 2 , Σ , ϕ ) = ∑ i = 1 N [ log N ( μ 1 , Σ ) y i ⏟ ( 1 ) + log N ( μ 2 , Σ ) 1 − y i ⏟ ( 2 ) + log ϕ y i ( 1 − ϕ ) 1 − y i ⏟ ( 3 ) ] L(\mu_{1},\mu_{2},\Sigma,\phi)=\sum\limits_{i=1}^{N}[\underbrace{\log N(\mu_{1},\Sigma)^{y_{i}}}_{(1)}+\underbrace{\log N(\mu_{2},\Sigma)^{1-y_{i}}}_{(2)}+\underbrace{\log \phi^{y_{i}}(1-\phi)^{1-y_{i}}}_{(3)}] L(μ1,μ2,Σ,ϕ)=i=1∑N[(1) logN(μ1,Σ)yi+(2) logN(μ2,Σ)1−yi+(3) logϕyi(1−ϕ)1−yi]
求 ϕ \phi ϕ,显然只有 ( 3 ) (3) (3)与 ϕ \phi ϕ相关
( 3 ) = ∑ i = 1 N log ϕ y i ( 1 − ϕ ) 1 − y i = ∑ i = 1 N [ y i log ϕ + ( 1 − y i ) log ( 1 − ϕ ) ] ∂ ( 3 ) ∂ ϕ = ∑ i = 1 N [ y i ⋅ 1 ϕ − ( 1 − y i ) 1 1 − ϕ ] = 0 0 = ∑ i = 1 N [ y i ⋅ ( 1 − ϕ ) − ( 1 − y i ) ϕ ] 0 = ∑ i = 1 N ( y i − y i ϕ − ϕ + y i ϕ ) 0 = ∑ i = 1 N ( y i − ϕ ) 0 = ∑ i = 1 N y i + N ϕ ϕ ^ = ∑ i = 1 N y i N \begin{aligned} (3)&=\sum\limits_{i=1}^{N}\log \phi^{y_{i}}(1-\phi)^{1-y_{i}}\\ &=\sum\limits_{i=1}^{N}[y_{i} \log \phi+(1-y_{i})\log(1-\phi)]\\ \frac{\partial (3)}{\partial \phi}&=\sum\limits_{i=1}^{N}\left[y_{i}\cdot \frac{1}{\phi}-\left(1-y_{i}\right) \frac{1}{1-\phi}\right]=0\\ 0&=\sum\limits_{i=1}^{N}[y_{i}\cdot (1-\phi)-(1-y_{i})\phi]\\ 0&=\sum\limits_{i=1}^{N}(y_{i}-y_{i}\phi-\phi+y_{i}\phi)\\ 0&=\sum\limits_{i=1}^{N}(y_{i}-\phi)\\ 0&=\sum\limits_{i=1}^{N}y_{i}+N \phi\\ \hat{\phi}&= \frac{\sum\limits_{i=1}^{N}y_{i}}{N} \end{aligned} (3)∂ϕ∂(3)0000ϕ^=i=1∑Nlogϕyi(1−ϕ)1−yi=i=1∑N[yilogϕ+(1−yi)log(1−ϕ)]=i=1∑N[yi⋅ϕ1−(1−yi)1−ϕ1]=0=i=1∑N[yi⋅(1−ϕ)−(1−yi)ϕ]=i=1∑N(yi−yiϕ−ϕ+yiϕ)=i=1∑N(yi−ϕ)=i=1∑Nyi+Nϕ=Ni=1∑Nyi
求 μ 1 \mu_{1} μ1,显然只有 ( 1 ) (1) (1)与 μ 1 \mu_{1} μ1相关。对于 μ 2 \mu_{2} μ2类似于 μ 1 \mu_{1} μ1,只需要 1 − y i 1-y_{i} 1−yi替换 y i y_{i} yi即可
( 1 ) = ∑ i = 1 N log N ( μ 1 , Σ ) y i = ∑ i = 1 N y i log 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp [ − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] μ 1 = a r g m a x μ 1 ( 1 ) = a r g m a x μ 1 ∑ i = 1 N y i [ − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] = a r g m a x μ 1 − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 − μ 1 T Σ − 1 ) ( x i − μ 1 ) = a r g m a x μ 1 − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i ⏟ ∈ R − x i T Σ − 1 μ 1 ⏟ 1 × 1 − μ 1 T Σ − 1 x i ⏟ 1 × 1 + μ 1 T Σ − 1 μ 1 ) = a r g m a x μ 1 − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i − 2 μ 1 T Σ − 1 x i + μ 1 T Σ − 1 μ 1 ) ⏟ Δ ∂ Δ ∂ μ 1 = − 1 2 ∑ i = 1 N y i ( − 2 Σ − 1 x i + 2 Σ − 1 μ 1 ) = 0 0 = ∑ i = 1 N y i ( Σ − 1 μ 1 − Σ − 1 x i ) 0 = ∑ i = 1 N y i ( μ 1 − x i ) ∑ i = 1 N y i μ 1 = ∑ i = 1 N y i x i μ 1 ^ = ∑ i = 1 N y i x i ∑ i = 1 N y i \begin{aligned} (1)&=\sum\limits_{i=1}^{N}\log N(\mu_{1},\Sigma)^{y_{i}}\\ &=\sum\limits_{i=1}^{N}y_{i}\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\text{exp}\left[ - \frac{1}{2}(x_{i}-\mu_{1})^{T}\Sigma^{-1}(x_{i}-\mu_{1})\right]\\ \mu_{1}&=\mathop{argmax\space}\limits_{\mu_{1}}(1)\\ &=\mathop{argmax\space}\limits_{\mu_{1}}\sum\limits_{i=1}^{N}y_{i}\left[ - \frac{1}{2}(x_{i}-\mu_{1})^{T}\Sigma^{-1}(x_{i}-\mu_{1})\right]\\ &=\mathop{argmax\space}\limits_{\mu_{1}}- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(x_{i}^{T}\Sigma^{-1}-\mu_{1}^{T}\Sigma^{-1})(x_{i}-\mu_{1})\\ &=\mathop{argmax\space}\limits_{\mu_{1}}- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(\underbrace{x_{i}^{T}\Sigma^{-1}x_{i}}_{\in \mathbb{R}}-\underbrace{x_{i}^{T}\Sigma^{-1}\mu_{1}}_{1 \times 1}-\underbrace{\mu_{1}^{T}\Sigma^{-1}x_{i}}_{1 \times 1}+\mu_{1}^{T}\Sigma^{-1}\mu_{1})\\ &=\mathop{argmax\space}\limits_{\mu_{1}}\underbrace{- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(x_{i}^{T}\Sigma^{-1}x_{i}-2\mu_{1}^{T}\Sigma^{-1}x_{i}+\mu_{1}^{T}\Sigma^{-1}\mu_{1})}_{\Delta }\\ \frac{\partial \Delta }{\partial \mu_{1}}&=- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(-2\Sigma^{-1}x_{i}+2\Sigma^{-1}\mu_{1})=0\\ 0&=\sum\limits_{i=1}^{N}y_{i}(\Sigma^{-1}\mu_{1}-\Sigma^{-1}x_{i})\\ 0&=\sum\limits_{i=1}^{N}y_{i}(\mu_{1}-x_{i})\\ \sum\limits_{i=1}^{N}y_{i}\mu_{1}&=\sum\limits_{i=1}^{N}y_{i}x_{i}\\ \hat{\mu_{1}}&=\frac{\sum\limits_{i=1}^{N}y_{i}x_{i}}{\sum\limits_{i=1}^{N}y_{i}} \end{aligned} (1)μ1∂μ1∂Δ00i=1∑Nyiμ1μ1^=i=1∑NlogN(μ1,Σ)yi=i=1∑Nyilog(2π)2p∣Σ∣211exp[−21(xi−μ1)TΣ−1(xi−μ1)]=μ1argmax (1)=μ1argmax i=1∑Nyi[−21(xi−μ1)TΣ−1(xi−μ1)]=μ1argmax −21i=1∑Nyi(xiTΣ−1−μ1TΣ−1)(xi−μ1)=μ1argmax −21i=1∑Nyi(∈R xiTΣ−1xi−1×1 xiTΣ−1μ1−1×1 μ1TΣ−1xi+μ1TΣ−1μ1)=μ1argmax Δ −21i=1∑Nyi(xiTΣ−1xi−2μ1TΣ−1xi+μ1TΣ−1μ1)=−21i=1∑Nyi(−2Σ−1xi+2Σ−1μ1)=0=i=1∑Nyi(Σ−1μ1−Σ−1xi)=i=1∑Nyi(μ1−xi)=i=1∑Nyixi=i=1∑Nyii=1∑Nyixi
这里我们设
C 1 = { x i ∣ y i = 1 , i = 1 , 2 , ⋯ , N } , ∣ C 1 ∣ = N 1 C 0 = { x i ∣ y i = 0 , i = 1 , 2 , ⋯ , N } , ∣ C 0 ∣ = N 0 N = N 1 + N 0 \begin{aligned} C_{1}&=\left\{x_{i}|y_{i}=1,i=1,2,\cdots,N\right\},|C_{1}|=N_{1}\\ C_{0}&=\left\{x_{i}|y_{i}=0,i=1,2,\cdots,N\right\},|C_{0}|=N_{0}\\ N&=N_{1}+N_{0} \end{aligned} C1C0N={xi∣yi=1,i=1,2,⋯,N},∣C1∣=N1={xi∣yi=0,i=1,2,⋯,N},∣C0∣=N0=N1+N0
因此
μ 1 ^ = ∑ i = 1 N y i x i N 1 \hat{\mu_{1}}=\frac{\sum\limits_{i=1}^{N}y_{i}x_{i}}{N_{1}} μ1^=N1i=1∑Nyixi
再用 1 − y i 1-y_{i} 1−yi替换 y i y_{i} yi得 μ 2 ^ \hat{\mu_{2}} μ2^
μ 2 ^ = ∑ i = 1 N ( 1 − y i ) x i ∑ i = 1 N ( 1 − y i ) = ∑ i = 1 N ( 1 − y i ) x i N − N 1 = ∑ i = 1 N ( 1 − y i ) x i N 0 \hat{\mu_{2}}=\frac{\sum\limits_{i=1}^{N}(1-y_{i})x_{i}}{\sum\limits_{i=1}^{N}(1-y_{i})}=\frac{\sum\limits_{i=1}^{N}(1-y_{i})x_{i}}{N-N_{1}}=\frac{\sum\limits_{i=1}^{N}(1-y_{i})x_{i}}{N_{0}} μ2^=i=1∑N(1−yi)i=1∑N(1−yi)xi=N−N1i=1∑N(1−yi)xi=N0i=1∑N(1−yi)xi
求 Σ \Sigma Σ,显然只有 ( 1 ) , ( 2 ) (1),(2) (1),(2)与 Σ \Sigma Σ相关
( 1 ) + ( 2 ) = ∑ i = 1 N y i log N ( μ 1 , Σ ) + ∑ i = 1 N ( 1 − y i ) log N ( μ 2 , Σ ) = ∑ x i ∈ C 1 log ( μ 1 , Σ ) + ∑ x i ∈ C 2 log N ( μ 2 , Σ ) ∑ i = 1 N log N ( μ , Σ ) = ∑ i = 1 N 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp [ − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ] = ∑ i = 1 N [ log 1 ( 2 π ) p 2 + log ∣ Σ ∣ 1 2 + ( − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) ] = ∑ i = 1 N [ C − 1 2 log ∣ Σ ∣ − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ] = C − 1 2 N log ∣ Σ ∣ − 1 2 ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) ⏟ ∈ R ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) = ∑ i = 1 N tr [ ( x i − μ ) T Σ − 1 ( x i − μ ) ] = ∑ i = 1 N tr [ ( x i − μ ) ( x i − μ ) T Σ − 1 ] = tr [ ∑ i = 1 N ( x i − μ ) ( x i − μ ) T ⏟ x i 的方差 S Σ − 1 ] 设 S = 1 N ∑ i = 1 N ( x i − μ ) ( x i − μ ) T = N ⋅ tr ( S Σ − 1 ) 带回 ∑ i = 1 N log N ( μ , Σ ) ∑ i = 1 N log N ( μ , Σ ) = C − 1 2 N log ∣ Σ ∣ − 1 2 ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) = − 1 2 N log ∣ Σ ∣ − 1 2 N ⋅ tr ( S ⋅ Σ − 1 ) + C 带回 ( 1 ) + ( 2 ) ( 1 ) + ( 2 ) = − 1 2 N 1 log ∣ Σ ∣ − 1 2 N ⋅ tr ( S ⋅ Σ − 1 ) − 1 2 N 2 log ∣ Σ ∣ − 1 2 N ⋅ tr ( S 2 Σ − 1 ) + C = − 1 2 N log ∣ Σ ∣ − 1 2 N ⋅ tr ( S 2 Σ − 1 ) − 1 2 N ⋅ tr ( S ⋅ Σ − 1 ) + C = − 1 2 [ N log ∣ Σ ∣ + N 1 tr ( S 1 Σ − 1 ) + N 2 tr ( S 2 Σ − 1 ) ] + C ∂ ( 1 ) + ( 2 ) ∂ Σ = − 1 2 ( N ⋅ 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 − N 1 S 1 Σ − 1 Σ − 1 − N 2 S 2 Σ − 1 Σ − 1 ) = 0 N Σ − N 1 S 1 − N 2 S 2 = 0 Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \begin{aligned} (1)+(2)&=\sum\limits_{i=1}^{N}y_{i}\log N(\mu_{1},\Sigma)+\sum\limits_{i=1}^{N}(1-y_{i})\log N(\mu_{2},\Sigma)\\ &=\sum\limits_{x_{i}\in C_{1}}^{}\log(\mu_{1},\Sigma)+\sum\limits_{x_{i}\in C_{2}}^{}\log N(\mu_{2},\Sigma)\\ \sum\limits_{i=1}^{N}\log N(\mu,\Sigma)&=\sum\limits_{i=1}^{N} \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\text{exp}\left[- \frac{1}{2}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)\right]\\ &=\sum\limits_{i=1}^{N}\left[\log \frac{1}{\left(2\pi\right)^{\frac{p}{2}}}+ \log |\Sigma|^{\frac{1}{2}}+\left(- \frac{1}{2}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}- \mu)\right)\right]\\ &=\sum\limits_{i=1}^{N}\left[C - \frac{1}{2}\log|\Sigma|- \frac{1}{2}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)\right]\\ &=C- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}\underbrace{\sum\limits_{i=1}^{N}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)}_{\in \mathbb{R}}\\ \sum\limits_{i=1}^{N}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)&=\sum\limits_{i=1}^{N}\text{tr }[(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)]\\ &=\sum\limits_{i=1}^{N}\text{tr }[(x_{i}-\mu)(x_{i}-\mu)^{T}\Sigma^{-1}]\\ &=\text{tr }\left[\underbrace{\sum\limits_{i=1}^{N}(x_{i}-\mu)(x_{i}-\mu)^{T}}_{x_{i}的方差S}\Sigma^{-1}\right]\\ &设S= \frac{1}{N}\sum\limits_{i=1}^{N}(x_{i}-\mu)(x_{i}-\mu)^{T}\\ &=N \cdot \text{tr }(S \Sigma^{-1})\\ &带回\sum\limits_{i=1}^{N}\log N(\mu,\Sigma)\\ \sum\limits_{i=1}^{N}\log N(\mu,\Sigma)&=C- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}\sum\limits_{i=1}^{N}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)\\ &=- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S \cdot \Sigma^{-1})+C\\ &带回(1)+(2)\\ (1)+(2)&=- \frac{1}{2}N_{1}\log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S \cdot \Sigma^{-1})- \frac{1}{2}N_{2}\log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S_{2}\Sigma^{-1})+C\\ &=- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S_{2}\Sigma^{-1})- \frac{1}{2}N \cdot \text{tr }(S \cdot \Sigma^{-1})+C \\ &=- \frac{1}{2}[N \log|\Sigma|+ N_{1}\text{tr }(S_{1}\Sigma^{-1})+N_{2}\text{tr }(S_{2}\Sigma^{-1})]+C\\ \frac{\partial (1)+(2)}{\partial \Sigma}&=- \frac{1}{2}(N \cdot \frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}-N_{1}S_{1}\Sigma^{-1}\Sigma^{-1}-N_{2}S_{2}\Sigma^{-1}\Sigma^{-1})=0\\ N \Sigma-N_{1}S_{1}-N_{2}S_{2}&=0\\ \hat{\Sigma}&=\frac{1}{N}(N_{1}S_{1}+N_{2}S_{2}) \end{aligned} (1)+(2)i=1∑NlogN(μ,Σ)i=1∑N(xi−μ)TΣ−1(xi−μ)i=1∑NlogN(μ,Σ)(1)+(2)∂Σ∂(1)+(2)NΣ−N1S1−N2S2Σ^=i=1∑NyilogN(μ1,Σ)+i=1∑N(1−yi)logN(μ2,Σ)=xi∈C1∑log(μ1,Σ)+xi∈C2∑logN(μ2,Σ)=i=1∑N(2π)2p∣Σ∣211exp[−21(xi−μ)TΣ−1(xi−μ)]=i=1∑N[log(2π)2p1+log∣Σ∣21+(−21(xi−μ)TΣ−1(xi−μ))]=i=1∑N[C−21log∣Σ∣−21(xi−μ)TΣ−1(xi−μ)]=C−21Nlog∣Σ∣−21∈R i=1∑N(xi−μ)TΣ−1(xi−μ)=i=1∑Ntr [(xi−μ)TΣ−1(xi−μ)]=i=1∑Ntr [(xi−μ)(xi−μ)TΣ−1]=tr ⎣ ⎡xi的方差S i=1∑N(xi−μ)(xi−μ)TΣ−1⎦ ⎤设S=N1i=1∑N(xi−μ)(xi−μ)T=N⋅tr (SΣ−1)带回i=1∑NlogN(μ,Σ)=C−21Nlog∣Σ∣−21i=1∑N(xi−μ)TΣ−1(xi−μ)=−21Nlog∣Σ∣−21N⋅tr (S⋅Σ−1)+C带回(1)+(2)=−21N1log∣Σ∣−21N⋅tr (S⋅Σ−1)−21N2log∣Σ∣−21N⋅tr (S2Σ−1)+C=−21Nlog∣Σ∣−21N⋅tr (S2Σ−1)−21N⋅tr (S⋅Σ−1)+C=−21[Nlog∣Σ∣+N1tr (S1Σ−1)+N2tr (S2Σ−1)]+C=−21(N⋅∣Σ∣1∣Σ∣Σ−1−N1S1Σ−1Σ−1−N2S2Σ−1Σ−1)=0=0=N1(N1S1+N2S2)
迹的性质
tr ( A B ) = tr ( B A ) tr ( A B C ) = tr ( C A B ) = tr ( B C A ) \begin{aligned} \text{tr }(AB)&=\text{tr }(BA)\\\text{tr }(ABC)&=\text{tr }(CAB)=\text{tr }(BCA)\end{aligned} tr (AB)tr (ABC)=tr (BA)=tr (CAB)=tr (BCA)
矩阵求导
∂ tr ( A B ) ∂ A = B − 1 ∂ ∣ A ∣ ∂ A = ∣ A ∣ ⋅ A T \begin{aligned} \frac{\partial \text{tr }(AB)}{\partial A}&=B^{-1}\\\frac{\partial |A|}{\partial A}&=|A|\cdot A^{T}\end{aligned} ∂A∂tr (AB)∂A∂∣A∣=B−1=∣A∣⋅AT
图中圆圈代表正样本,叉号代表负样本,直线p(y = 1|x) = 0.5代表分类边界(decision boundary)。因为Σ相同所以两个形状相同,但是具有不同的μ 。
作者:张文翔
链接:Andrew Ng Stanford机器学习公开课 总结(5) - 张文翔的博客 | BY ZhangWenxiang (demmon-tju.github.io)