【白板推导系列笔记】线性分类-高斯判别分析(Gaussian Discriminant Analysis)-模型求解(求期望)&模型求解(求协方差)

L ( μ 1 , μ 2 , Σ , ϕ ) = ∑ i = 1 N [ log ⁡ N ( μ 1 , Σ ) y i ⏟ ( 1 ) + log ⁡ N ( μ 2 , Σ ) 1 − y i ⏟ ( 2 ) + log ⁡ ϕ y i ( 1 − ϕ ) 1 − y i ⏟ ( 3 ) ] L(\mu_{1},\mu_{2},\Sigma,\phi)=\sum\limits_{i=1}^{N}[\underbrace{\log N(\mu_{1},\Sigma)^{y_{i}}}_{(1)}+\underbrace{\log N(\mu_{2},\Sigma)^{1-y_{i}}}_{(2)}+\underbrace{\log \phi^{y_{i}}(1-\phi)^{1-y_{i}}}_{(3)}] L(μ1,μ2,Σ,ϕ)=i=1N[(1) logN(μ1,Σ)yi+(2) logN(μ2,Σ)1yi+(3) logϕyi(1ϕ)1yi]
ϕ \phi ϕ,显然只有 ( 3 ) (3) (3) ϕ \phi ϕ相关
( 3 ) = ∑ i = 1 N log ⁡ ϕ y i ( 1 − ϕ ) 1 − y i = ∑ i = 1 N [ y i log ⁡ ϕ + ( 1 − y i ) log ⁡ ( 1 − ϕ ) ] ∂ ( 3 ) ∂ ϕ = ∑ i = 1 N [ y i ⋅ 1 ϕ − ( 1 − y i ) 1 1 − ϕ ] = 0 0 = ∑ i = 1 N [ y i ⋅ ( 1 − ϕ ) − ( 1 − y i ) ϕ ] 0 = ∑ i = 1 N ( y i − y i ϕ − ϕ + y i ϕ ) 0 = ∑ i = 1 N ( y i − ϕ ) 0 = ∑ i = 1 N y i + N ϕ ϕ ^ = ∑ i = 1 N y i N \begin{aligned} (3)&=\sum\limits_{i=1}^{N}\log \phi^{y_{i}}(1-\phi)^{1-y_{i}}\\ &=\sum\limits_{i=1}^{N}[y_{i} \log \phi+(1-y_{i})\log(1-\phi)]\\ \frac{\partial (3)}{\partial \phi}&=\sum\limits_{i=1}^{N}\left[y_{i}\cdot \frac{1}{\phi}-\left(1-y_{i}\right) \frac{1}{1-\phi}\right]=0\\ 0&=\sum\limits_{i=1}^{N}[y_{i}\cdot (1-\phi)-(1-y_{i})\phi]\\ 0&=\sum\limits_{i=1}^{N}(y_{i}-y_{i}\phi-\phi+y_{i}\phi)\\ 0&=\sum\limits_{i=1}^{N}(y_{i}-\phi)\\ 0&=\sum\limits_{i=1}^{N}y_{i}+N \phi\\ \hat{\phi}&= \frac{\sum\limits_{i=1}^{N}y_{i}}{N} \end{aligned} (3)ϕ(3)0000ϕ^=i=1Nlogϕyi(1ϕ)1yi=i=1N[yilogϕ+(1yi)log(1ϕ)]=i=1N[yiϕ1(1yi)1ϕ1]=0=i=1N[yi(1ϕ)(1yi)ϕ]=i=1N(yiyiϕϕ+yiϕ)=i=1N(yiϕ)=i=1Nyi+=Ni=1Nyi
μ 1 \mu_{1} μ1,显然只有 ( 1 ) (1) (1) μ 1 \mu_{1} μ1相关。对于 μ 2 \mu_{2} μ2类似于 μ 1 \mu_{1} μ1,只需要 1 − y i 1-y_{i} 1yi替换 y i y_{i} yi即可
( 1 ) = ∑ i = 1 N log ⁡ N ( μ 1 , Σ ) y i = ∑ i = 1 N y i log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp [ − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] μ 1 = a r g m a x   μ 1 ( 1 ) = a r g m a x   μ 1 ∑ i = 1 N y i [ − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ] = a r g m a x   μ 1 − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 − μ 1 T Σ − 1 ) ( x i − μ 1 ) = a r g m a x   μ 1 − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i ⏟ ∈ R − x i T Σ − 1 μ 1 ⏟ 1 × 1 − μ 1 T Σ − 1 x i ⏟ 1 × 1 + μ 1 T Σ − 1 μ 1 ) = a r g m a x   μ 1 − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i − 2 μ 1 T Σ − 1 x i + μ 1 T Σ − 1 μ 1 ) ⏟ Δ ∂ Δ ∂ μ 1 = − 1 2 ∑ i = 1 N y i ( − 2 Σ − 1 x i + 2 Σ − 1 μ 1 ) = 0 0 = ∑ i = 1 N y i ( Σ − 1 μ 1 − Σ − 1 x i ) 0 = ∑ i = 1 N y i ( μ 1 − x i ) ∑ i = 1 N y i μ 1 = ∑ i = 1 N y i x i μ 1 ^ = ∑ i = 1 N y i x i ∑ i = 1 N y i \begin{aligned} (1)&=\sum\limits_{i=1}^{N}\log N(\mu_{1},\Sigma)^{y_{i}}\\ &=\sum\limits_{i=1}^{N}y_{i}\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\text{exp}\left[ - \frac{1}{2}(x_{i}-\mu_{1})^{T}\Sigma^{-1}(x_{i}-\mu_{1})\right]\\ \mu_{1}&=\mathop{argmax\space}\limits_{\mu_{1}}(1)\\ &=\mathop{argmax\space}\limits_{\mu_{1}}\sum\limits_{i=1}^{N}y_{i}\left[ - \frac{1}{2}(x_{i}-\mu_{1})^{T}\Sigma^{-1}(x_{i}-\mu_{1})\right]\\ &=\mathop{argmax\space}\limits_{\mu_{1}}- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(x_{i}^{T}\Sigma^{-1}-\mu_{1}^{T}\Sigma^{-1})(x_{i}-\mu_{1})\\ &=\mathop{argmax\space}\limits_{\mu_{1}}- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(\underbrace{x_{i}^{T}\Sigma^{-1}x_{i}}_{\in \mathbb{R}}-\underbrace{x_{i}^{T}\Sigma^{-1}\mu_{1}}_{1 \times 1}-\underbrace{\mu_{1}^{T}\Sigma^{-1}x_{i}}_{1 \times 1}+\mu_{1}^{T}\Sigma^{-1}\mu_{1})\\ &=\mathop{argmax\space}\limits_{\mu_{1}}\underbrace{- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(x_{i}^{T}\Sigma^{-1}x_{i}-2\mu_{1}^{T}\Sigma^{-1}x_{i}+\mu_{1}^{T}\Sigma^{-1}\mu_{1})}_{\Delta }\\ \frac{\partial \Delta }{\partial \mu_{1}}&=- \frac{1}{2}\sum\limits_{i=1}^{N}y_{i}(-2\Sigma^{-1}x_{i}+2\Sigma^{-1}\mu_{1})=0\\ 0&=\sum\limits_{i=1}^{N}y_{i}(\Sigma^{-1}\mu_{1}-\Sigma^{-1}x_{i})\\ 0&=\sum\limits_{i=1}^{N}y_{i}(\mu_{1}-x_{i})\\ \sum\limits_{i=1}^{N}y_{i}\mu_{1}&=\sum\limits_{i=1}^{N}y_{i}x_{i}\\ \hat{\mu_{1}}&=\frac{\sum\limits_{i=1}^{N}y_{i}x_{i}}{\sum\limits_{i=1}^{N}y_{i}} \end{aligned} (1)μ1μ1Δ00i=1Nyiμ1μ1^=i=1NlogN(μ1,Σ)yi=i=1Nyilog(2π)2p∣Σ211exp[21(xiμ1)TΣ1(xiμ1)]=μ1argmax (1)=μ1argmax i=1Nyi[21(xiμ1)TΣ1(xiμ1)]=μ1argmax 21i=1Nyi(xiTΣ1μ1TΣ1)(xiμ1)=μ1argmax 21i=1Nyi(R xiTΣ1xi1×1 xiTΣ1μ11×1 μ1TΣ1xi+μ1TΣ1μ1)=μ1argmax Δ 21i=1Nyi(xiTΣ1xi2μ1TΣ1xi+μ1TΣ1μ1)=21i=1Nyi(2Σ1xi+2Σ1μ1)=0=i=1Nyi(Σ1μ1Σ1xi)=i=1Nyi(μ1xi)=i=1Nyixi=i=1Nyii=1Nyixi
这里我们设
C 1 = { x i ∣ y i = 1 , i = 1 , 2 , ⋯   , N } , ∣ C 1 ∣ = N 1 C 0 = { x i ∣ y i = 0 , i = 1 , 2 , ⋯   , N } , ∣ C 0 ∣ = N 0 N = N 1 + N 0 \begin{aligned} C_{1}&=\left\{x_{i}|y_{i}=1,i=1,2,\cdots,N\right\},|C_{1}|=N_{1}\\ C_{0}&=\left\{x_{i}|y_{i}=0,i=1,2,\cdots,N\right\},|C_{0}|=N_{0}\\ N&=N_{1}+N_{0} \end{aligned} C1C0N={xiyi=1,i=1,2,,N},C1=N1={xiyi=0,i=1,2,,N},C0=N0=N1+N0
因此
μ 1 ^ = ∑ i = 1 N y i x i N 1 \hat{\mu_{1}}=\frac{\sum\limits_{i=1}^{N}y_{i}x_{i}}{N_{1}} μ1^=N1i=1Nyixi
再用 1 − y i 1-y_{i} 1yi替换 y i y_{i} yi μ 2 ^ \hat{\mu_{2}} μ2^
μ 2 ^ = ∑ i = 1 N ( 1 − y i ) x i ∑ i = 1 N ( 1 − y i ) = ∑ i = 1 N ( 1 − y i ) x i N − N 1 = ∑ i = 1 N ( 1 − y i ) x i N 0 \hat{\mu_{2}}=\frac{\sum\limits_{i=1}^{N}(1-y_{i})x_{i}}{\sum\limits_{i=1}^{N}(1-y_{i})}=\frac{\sum\limits_{i=1}^{N}(1-y_{i})x_{i}}{N-N_{1}}=\frac{\sum\limits_{i=1}^{N}(1-y_{i})x_{i}}{N_{0}} μ2^=i=1N(1yi)i=1N(1yi)xi=NN1i=1N(1yi)xi=N0i=1N(1yi)xi
Σ \Sigma Σ,显然只有 ( 1 ) , ( 2 ) (1),(2) (1),(2) Σ \Sigma Σ相关
( 1 ) + ( 2 ) = ∑ i = 1 N y i log ⁡ N ( μ 1 , Σ ) + ∑ i = 1 N ( 1 − y i ) log ⁡ N ( μ 2 , Σ ) = ∑ x i ∈ C 1 log ⁡ ( μ 1 , Σ ) + ∑ x i ∈ C 2 log ⁡ N ( μ 2 , Σ ) ∑ i = 1 N log ⁡ N ( μ , Σ ) = ∑ i = 1 N 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp [ − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ] = ∑ i = 1 N [ log ⁡ 1 ( 2 π ) p 2 + log ⁡ ∣ Σ ∣ 1 2 + ( − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) ] = ∑ i = 1 N [ C − 1 2 log ⁡ ∣ Σ ∣ − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ] = C − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) ⏟ ∈ R ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) = ∑ i = 1 N tr  [ ( x i − μ ) T Σ − 1 ( x i − μ ) ] = ∑ i = 1 N tr  [ ( x i − μ ) ( x i − μ ) T Σ − 1 ] = tr  [ ∑ i = 1 N ( x i − μ ) ( x i − μ ) T ⏟ x i 的方差 S Σ − 1 ] 设 S = 1 N ∑ i = 1 N ( x i − μ ) ( x i − μ ) T = N ⋅ tr  ( S Σ − 1 ) 带回 ∑ i = 1 N log ⁡ N ( μ , Σ ) ∑ i = 1 N log ⁡ N ( μ , Σ ) = C − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) = − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 N ⋅ tr  ( S ⋅ Σ − 1 ) + C 带回 ( 1 ) + ( 2 ) ( 1 ) + ( 2 ) = − 1 2 N 1 log ⁡ ∣ Σ ∣ − 1 2 N ⋅ tr  ( S ⋅ Σ − 1 ) − 1 2 N 2 log ⁡ ∣ Σ ∣ − 1 2 N ⋅ tr  ( S 2 Σ − 1 ) + C = − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 N ⋅ tr  ( S 2 Σ − 1 ) − 1 2 N ⋅ tr  ( S ⋅ Σ − 1 ) + C = − 1 2 [ N log ⁡ ∣ Σ ∣ + N 1 tr  ( S 1 Σ − 1 ) + N 2 tr  ( S 2 Σ − 1 ) ] + C ∂ ( 1 ) + ( 2 ) ∂ Σ = − 1 2 ( N ⋅ 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 − N 1 S 1 Σ − 1 Σ − 1 − N 2 S 2 Σ − 1 Σ − 1 ) = 0 N Σ − N 1 S 1 − N 2 S 2 = 0 Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \begin{aligned} (1)+(2)&=\sum\limits_{i=1}^{N}y_{i}\log N(\mu_{1},\Sigma)+\sum\limits_{i=1}^{N}(1-y_{i})\log N(\mu_{2},\Sigma)\\ &=\sum\limits_{x_{i}\in C_{1}}^{}\log(\mu_{1},\Sigma)+\sum\limits_{x_{i}\in C_{2}}^{}\log N(\mu_{2},\Sigma)\\ \sum\limits_{i=1}^{N}\log N(\mu,\Sigma)&=\sum\limits_{i=1}^{N} \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\text{exp}\left[- \frac{1}{2}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)\right]\\ &=\sum\limits_{i=1}^{N}\left[\log \frac{1}{\left(2\pi\right)^{\frac{p}{2}}}+ \log |\Sigma|^{\frac{1}{2}}+\left(- \frac{1}{2}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}- \mu)\right)\right]\\ &=\sum\limits_{i=1}^{N}\left[C - \frac{1}{2}\log|\Sigma|- \frac{1}{2}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)\right]\\ &=C- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}\underbrace{\sum\limits_{i=1}^{N}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)}_{\in \mathbb{R}}\\ \sum\limits_{i=1}^{N}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)&=\sum\limits_{i=1}^{N}\text{tr }[(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)]\\ &=\sum\limits_{i=1}^{N}\text{tr }[(x_{i}-\mu)(x_{i}-\mu)^{T}\Sigma^{-1}]\\ &=\text{tr }\left[\underbrace{\sum\limits_{i=1}^{N}(x_{i}-\mu)(x_{i}-\mu)^{T}}_{x_{i}的方差S}\Sigma^{-1}\right]\\ &设S= \frac{1}{N}\sum\limits_{i=1}^{N}(x_{i}-\mu)(x_{i}-\mu)^{T}\\ &=N \cdot \text{tr }(S \Sigma^{-1})\\ &带回\sum\limits_{i=1}^{N}\log N(\mu,\Sigma)\\ \sum\limits_{i=1}^{N}\log N(\mu,\Sigma)&=C- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}\sum\limits_{i=1}^{N}(x_{i}-\mu)^{T}\Sigma^{-1}(x_{i}-\mu)\\ &=- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S \cdot \Sigma^{-1})+C\\ &带回(1)+(2)\\ (1)+(2)&=- \frac{1}{2}N_{1}\log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S \cdot \Sigma^{-1})- \frac{1}{2}N_{2}\log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S_{2}\Sigma^{-1})+C\\ &=- \frac{1}{2}N \log|\Sigma|- \frac{1}{2}N \cdot \text{tr }(S_{2}\Sigma^{-1})- \frac{1}{2}N \cdot \text{tr }(S \cdot \Sigma^{-1})+C \\ &=- \frac{1}{2}[N \log|\Sigma|+ N_{1}\text{tr }(S_{1}\Sigma^{-1})+N_{2}\text{tr }(S_{2}\Sigma^{-1})]+C\\ \frac{\partial (1)+(2)}{\partial \Sigma}&=- \frac{1}{2}(N \cdot \frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}-N_{1}S_{1}\Sigma^{-1}\Sigma^{-1}-N_{2}S_{2}\Sigma^{-1}\Sigma^{-1})=0\\ N \Sigma-N_{1}S_{1}-N_{2}S_{2}&=0\\ \hat{\Sigma}&=\frac{1}{N}(N_{1}S_{1}+N_{2}S_{2}) \end{aligned} (1)+(2)i=1NlogN(μ,Σ)i=1N(xiμ)TΣ1(xiμ)i=1NlogN(μ,Σ)(1)+(2)Σ(1)+(2)NΣN1S1N2S2Σ^=i=1NyilogN(μ1,Σ)+i=1N(1yi)logN(μ2,Σ)=xiC1log(μ1,Σ)+xiC2logN(μ2,Σ)=i=1N(2π)2p∣Σ211exp[21(xiμ)TΣ1(xiμ)]=i=1N[log(2π)2p1+log∣Σ21+(21(xiμ)TΣ1(xiμ))]=i=1N[C21log∣Σ∣21(xiμ)TΣ1(xiμ)]=C21Nlog∣Σ∣21R i=1N(xiμ)TΣ1(xiμ)=i=1Ntr [(xiμ)TΣ1(xiμ)]=i=1Ntr [(xiμ)(xiμ)TΣ1]=tr  xi的方差S i=1N(xiμ)(xiμ)TΣ1 S=N1i=1N(xiμ)(xiμ)T=Ntr (SΣ1)带回i=1NlogN(μ,Σ)=C21Nlog∣Σ∣21i=1N(xiμ)TΣ1(xiμ)=21Nlog∣Σ∣21Ntr (SΣ1)+C带回(1)+(2)=21N1log∣Σ∣21Ntr (SΣ1)21N2log∣Σ∣21Ntr (S2Σ1)+C=21Nlog∣Σ∣21Ntr (S2Σ1)21Ntr (SΣ1)+C=21[Nlog∣Σ∣+N1tr (S1Σ1)+N2tr (S2Σ1)]+C=21(N∣Σ∣1∣Σ∣Σ1N1S1Σ1Σ1N2S2Σ1Σ1)=0=0=N1(N1S1+N2S2)

迹的性质
tr  ( A B ) = tr  ( B A ) tr  ( A B C ) = tr  ( C A B ) = tr  ( B C A ) \begin{aligned} \text{tr }(AB)&=\text{tr }(BA)\\\text{tr }(ABC)&=\text{tr }(CAB)=\text{tr }(BCA)\end{aligned} tr (AB)tr (ABC)=tr (BA)=tr (CAB)=tr (BCA)
矩阵求导
∂ tr  ( A B ) ∂ A = B − 1 ∂ ∣ A ∣ ∂ A = ∣ A ∣ ⋅ A T \begin{aligned} \frac{\partial \text{tr }(AB)}{\partial A}&=B^{-1}\\\frac{\partial |A|}{\partial A}&=|A|\cdot A^{T}\end{aligned} Atr (AB)AA=B1=AAT

图中圆圈代表正样本,叉号代表负样本,直线p(y = 1|x) = 0.5代表分类边界(decision boundary)。因为Σ相同所以两个形状相同,但是具有不同的μ 。
【白板推导系列笔记】线性分类-高斯判别分析(Gaussian Discriminant Analysis)-模型求解(求期望)&模型求解(求协方差)_第1张图片

作者:张文翔
链接:Andrew Ng Stanford机器学习公开课 总结(5) - 张文翔的博客 | BY ZhangWenxiang (demmon-tju.github.io)

你可能感兴趣的:(白板推导系列笔记)