【机器学习算法】高斯判别分析GDA

高斯判别分析

  高斯判别分析(Gaussian discriminative analysis )是一个较为直观的模型,属于生成模型的一种,采用一种软分类的思路,所谓软分类就是我们对一个样本决定它的类别时使用概率模型来决定,而不是直接由函数映射到某一类上。生成模型通过求解联合概率来求解 P ( y ∣ x ) P(y|x) P(yx)。它假设
y ∼ B e r n o u l l i ( ϕ ) x ∣ y = 1 ∼ N ( μ 1 , Σ ) x ∣ y = 0 ∼ N ( μ 2 , Σ ) y \sim Bernoulli(\phi) \\ x|y=1 \sim N(\mu_1,\Sigma) \\ x|y=0 \sim N(\mu_2,\Sigma) yBernoulli(ϕ)xy=1N(μ1,Σ)xy=0N(μ2,Σ)
  则有
P ( y ) = ϕ y ( 1 − ϕ ) 1 − y P ( x ∣ y ) = N ( μ 1 , Σ ) y ⋅ N ( μ 2 , Σ ) 1 − y \begin{aligned} &P(y)=\phi^y(1-\phi)^{1-y} \\ &P(x|y)=N(\mu_1,\Sigma)^y·N(\mu_2,\Sigma)^{1-y} \end{aligned} P(y)=ϕy(1ϕ)1yP(xy)=N(μ1,Σ)yN(μ2,Σ)1y
  模型的参数为
θ = ( μ 1 , μ 2 , Σ , ϕ ) \theta=(\mu_1,\mu_2,\Sigma,\phi) θ=(μ1,μ2,Σ,ϕ)
  对于生成模型,我们要求解的目标函数是
y ^ = arg ⁡ max ⁡ y ∈ { 0 , 1 } p ( y ∣ x ) = arg ⁡ max ⁡ y p ( y ) p ( x ∣ y ) \hat y=\arg \max_{y \in \{0,1\}}p(y|x)=\arg \max_yp(y)p(x|y) y^=argy{0,1}maxp(yx)=argymaxp(y)p(xy)
  定义似然函数,则
θ ^ = arg ⁡ max ⁡ θ l ( θ ) = arg ⁡ max ⁡ θ log ⁡ ∏ i = 1 N p ( x i , y i ) = arg ⁡ max ⁡ θ log ⁡ ∏ i = 1 N p ( y i ) p ( x i ∣ y i ) = arg ⁡ max ⁡ θ ∑ i = 1 N ( log ⁡ N ( μ 1 , Σ ) y i + log ⁡ N ( μ 2 , Σ ) 1 − y i + log ⁡ ϕ y i ( 1 − ϕ ) 1 − y i ) \begin{aligned} \hat \theta &=\arg \max_\theta l(\theta) \\ &=\arg \max_\theta \log \prod_{i=1}^Np(x_i,y_i) \\ &=\arg \max_\theta \log \prod_{i=1}^Np(y_i)p(x_i|y_i) \\ &=\arg \max_\theta \sum_{i=1}^N(\log N(\mu_1,\Sigma)^{y_i} \\&+\log N(\mu_2,\Sigma)^{1-y_i}+\log \phi^{y_i}(1-\phi)^{1-y_i})\\ \end{aligned} θ^=argθmaxl(θ)=argθmaxlogi=1Np(xi,yi)=argθmaxlogi=1Np(yi)p(xiyi)=argθmaxi=1N(logN(μ1,Σ)yi+logN(μ2,Σ)1yi+logϕyi(1ϕ)1yi)

  • ϕ \phi ϕ
    ∂ l ( θ ) ∂ ϕ = ∑ i = 1 N y i 1 ϕ − ( 1 − y i ) 1 1 − ϕ = 0    ⟺    ∑ i = 1 N y i ( 1 − ϕ ) − ( 1 − y i ) ϕ = 0    ⟺    ∑ i = 1 N ( y i − ϕ ) = 0    ⟺    ∑ i = 1 N y i − N ϕ = 0    ⟺    ϕ ^ = 1 N ∑ i = 1 N y i = N 1 N \begin{aligned} &\frac{\partial l(\theta)}{\partial \phi}=\sum_{i=1}^Ny_i\frac{1}{ \phi}-(1-y_i)\frac{1}{1-\phi} = 0 \\ &\iff \sum_{i=1}^Ny_i(1-\phi)-(1-y_i)\phi=0 \\ &\iff \sum_{i=1}^N(y_i-\phi)=0 \\ &\iff \sum_{i=1}^Ny_i-N\phi=0 \\ &\iff \hat \phi =\frac{1}{N}\sum_{i=1}^Ny_i =\frac{N_1}{N}\\ \end{aligned} ϕl(θ)=i=1Nyiϕ1(1yi)1ϕ1=0i=1Nyi(1ϕ)(1yi)ϕ=0i=1N(yiϕ)=0i=1NyiNϕ=0ϕ^=N1i=1Nyi=NN1
  • μ 1 , μ 2 \mu_1,\mu_2 μ1,μ2
      两个的求解过程其实是相同的,所以我们直接求解 μ 1 \mu_1 μ1,由于我们只对 μ 1 \mu_1 μ1求解,所以原式可以化简为
    ∑ i = 1 N y i log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) = ∑ i = 1 N y i log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x i T Σ − 1 − μ 1 T Σ − 1 ) ( x i − μ 1 ) ) = ∑ i = 1 N y i log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x i T Σ − 1 x i − 2 μ 1 T Σ − 1 x i + μ 1 T Σ − 1 μ 1 ) ) \begin{aligned} &\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) \\ &=\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i^T\Sigma^{-1}-\mu_1^T\Sigma^{-1})(x_i-\mu_1))\\ &=\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i^T\Sigma^{-1}x_i-2\mu_1^T\Sigma^{-1}x_i+\mu_1^T\Sigma^{-1}\mu_1)) \end{aligned} i=1Nyilog(2π)2pΣ211exp(21(xiμ1)TΣ1(xiμ1))=i=1Nyilog(2π)2pΣ211exp(21(xiTΣ1μ1TΣ1)(xiμ1))=i=1Nyilog(2π)2pΣ211exp(21(xiTΣ1xi2μ1TΣ1xi+μ1TΣ1μ1))
      对上式求导并令导数为0,有
    − 1 2 ∑ i = 1 N y i ( − 2 Σ − 1 x i + 2 Σ − 1 μ 1 ) = 0    ⟺    ∑ i = 1 N y i ( Σ − 1 μ 1 − Σ − 1 x i ) = 0    ⟺    ∑ i = 1 N y i ( μ 1 − x i ) = 0    ⟺    ∑ i = 1 N y i μ 1 = ∑ i = 1 N y i x i    ⟺    μ ^ 1 = ∑ i = 1 N y i x i ∑ i = 1 N y i = ∑ i = 1 N y i x i N 1 \begin{aligned} &-\frac{1}{2}\sum_{i=1}^Ny_i(-2\Sigma^{-1}x_i+2\Sigma^{-1}\mu_1)=0 \\ &\iff \sum_{i=1}^Ny_i(\Sigma^{-1}\mu_1-\Sigma^{-1}x_i)=0 \\ &\iff \sum_{i=1}^Ny_i(\mu_1-x_i)=0 \\ &\iff \sum_{i=1}^Ny_i\mu_1=\sum_{i=1}^Ny_ix_i \\ &\iff \hat \mu_1=\frac{\sum\limits_{i=1}^Ny_ix_i}{\sum\limits_{i=1}^Ny_i}=\frac{\sum\limits_{i=1}^Ny_ix_i}{N_1} \\ \end{aligned} 21i=1Nyi(2Σ1xi+2Σ1μ1)=0i=1Nyi(Σ1μ1Σ1xi)=0i=1Nyi(μ1xi)=0i=1Nyiμ1=i=1Nyixiμ^1=i=1Nyii=1Nyixi=N1i=1Nyixi
      同理可得
    μ ^ 2 = ∑ i = 1 N ( 1 − y i ) x i ∑ i = 1 N ( 1 − y i ) = ∑ i = 1 N ( 1 − y i ) x i N 2 \hat \mu_2=\frac{\sum\limits_{i=1}^N(1-y_i)x_i}{\sum\limits_{i=1}^N(1-y_i)}=\frac{\sum\limits_{i=1}^N(1-y_i)x_i}{N_2} μ^2=i=1N(1yi)i=1N(1yi)xi=N2i=1N(1yi)xi
  • Σ \Sigma Σ:
      尝试对通项 log ⁡ N ( μ , Σ ) \log N(\mu,\Sigma) logN(μ,Σ)进行化简,有
    ∑ i = 1 N log ⁡ N ( μ , Σ ) = ∑ i = 1 N log ⁡ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) = ∑ i = 1 N ( log ⁡ 1 ( 2 π ) p 2 + ∣ Σ ∣ − 1 2 − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) = ∑ i = 1 N ( C − 1 2 log ⁡ ∣ Σ ∣ − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) = C − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 t r ( ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) ) = C − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 t r ( ∑ i = 1 N ( x i − μ ) ( x i − μ ) T Σ − 1 ) = − 1 2 N log ⁡ ∣ Σ ∣ − 1 2 t r ( S Σ − 1 ) + C \begin{aligned} \sum_{i=1}^N\log N(\mu,\Sigma) &=\sum_{i=1}^N \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp (-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu)) \\ &=\sum_{i=1}^N(\log \frac{1}{(2\pi)^{\frac{p}{2}}}+|\Sigma|^{-\frac{1}{2}}-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu)) \\ &=\sum_{i=1}^N(C-\frac{1}{2}\log|\Sigma|-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu))\\ &=C-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(\sum_{i=1}^N(x_i-\mu)^T\Sigma^{-1}(x_i-\mu))\\ &=C-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(\sum_{i=1}^N(x_i-\mu)(x_i-\mu)^T\Sigma^{-1})\\ &=-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(S\Sigma^{-1})+C\\ \end{aligned} i=1NlogN(μ,Σ)=i=1Nlog(2π)2pΣ211exp(21(xiμ)TΣ1(xiμ))=i=1N(log(2π)2p1+Σ2121(xiμ)TΣ1(xiμ))=i=1N(C21logΣ21(xiμ)TΣ1(xiμ))=C21NlogΣ21tr(i=1N(xiμ)TΣ1(xiμ))=C21NlogΣ21tr(i=1N(xiμ)(xiμ)TΣ1)=21NlogΣ21tr(SΣ1)+C
      由于只需要对 Σ \Sigma Σ求解,所以对似然函数化简为
    ∑ i = 1 N ( y i log ⁡ N ( μ 1 , Σ ) + ( 1 − y i ) log ⁡ N ( μ 2 , Σ ) ) = ∑ x i ∈ c 1 log ⁡ N ( μ 1 , Σ ) + ∑ x i ∈ c 2 log ⁡ N ( μ 2 , Σ ) = − 1 2 N 1 log ⁡ ∣ Σ ∣ − 1 2 t r ( S 1 Σ − 1 ) − 1 2 N 2 log ⁡ ∣ Σ ∣ − 1 2 N 2 t r ( S 2 Σ − 1 ) + C = − 1 2 ( N 1 log ⁡ ∣ Σ ∣ + N 1 t r ( S 1 Σ − 1 ) + N 2 log ⁡ ∣ Σ ∣ + N 2 t r ( S 2 Σ − 1 ) ) + C \begin{aligned} &\sum_{i=1}^N(y_i\log N(\mu_1,\Sigma) +(1-y_i)\log N(\mu_2,\Sigma) ) \\ &=\sum_{x_i \in c_1}\log N(\mu_1,\Sigma)+\sum_{x_i \in c_2}\log N(\mu_2,\Sigma) \\ &=-\frac{1}{2}N_1\log |\Sigma|-\frac{1}{2}tr(S_1\Sigma^{-1})-\frac{1}{2}N_2\log |\Sigma|-\frac{1}{2}N_2tr(S_2\Sigma^{-1})+C \\ &=-\frac{1}{2}(N_1\log |\Sigma|+N_1tr(S_1\Sigma^{-1})+N_2\log |\Sigma|+N_2tr(S_2\Sigma^{-1}))+C \\ \end{aligned} i=1N(yilogN(μ1,Σ)+(1yi)logN(μ2,Σ))=xic1logN(μ1,Σ)+xic2logN(μ2,Σ)=21N1logΣ21tr(S1Σ1)21N2logΣ21N2tr(S2Σ1)+C=21(N1logΣ+N1tr(S1Σ1)+N2logΣ+N2tr(S2Σ1))+C
      根据tr的求导公式
    ∂ t r ( A B ) ∂ A = B − 1 ∂ t r ( ∣ A ∣ ) ∂ A = ∣ A ∣ ⋅ A − 1 t r ( A B ) = t r ( B A ) \begin{aligned} &\frac{\partial tr(AB)}{\partial A}=B^{-1}\\ &\frac{\partial tr(|A|)}{\partial A}=|A|·A^{-1} \\ &tr(AB)=tr(BA) \end{aligned} Atr(AB)=B1Atr(A)=AA1tr(AB)=tr(BA)
      对上面化简后的式子进行求导并令导数为0,有
    − 1 2 ( N 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 + N 1 ∂ t r ( Σ − 1 S 1 ) ∂ Σ − 1 ∂ t r ( Σ − 1 ) ∂ Σ + N 2 ∂ t r ( Σ − 1 S 2 ) ∂ Σ − 1 ∂ t r ( Σ − 1 ) ∂ Σ ) = 0    ⟺    N 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 − N 1 S 1 T Σ − 2 − N 1 S 2 T Σ − 2 = 0    ⟺    N Σ − 1 − N 1 S 1 Σ − 2 − N 1 S 2 Σ − 2 = 0    ⟺    N Σ − N 1 S 1 − N 1 S 2 = 0    ⟺    N Σ − N 1 S 1 − N 1 S 2 = 0    ⟺    Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \begin{aligned} &-\frac{1}{2}(N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}+N_1\frac{\partial tr(\Sigma^{-1}S_1)}{\partial \Sigma^{-1}}\frac{\partial tr(\Sigma^{-1})}{\partial \Sigma}+N_2\frac{\partial tr(\Sigma^{-1}S_2)}{\partial \Sigma^{-1}}\frac{\partial tr(\Sigma^{-1})}{\partial \Sigma})=0 \\ &\iff N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}-N_1S_1^T\Sigma^{-2}-N_1S_2^T\Sigma^{-2}=0 \\ &\iff N\Sigma^{-1}-N_1S_1\Sigma^{-2}-N_1S_2\Sigma^{-2}=0\\ &\iff N\Sigma-N_1S_1-N_1S_2=0 \\ &\iff N\Sigma-N_1S_1-N_1S_2=0 \\ &\iff \hat \Sigma =\frac{1}{N}(N_1S_1+N_2S_2) \\ \end{aligned} 21(NΣ1ΣΣ1+N1Σ1tr(Σ1S1)Σtr(Σ1)+N2Σ1tr(Σ1S2)Σtr(Σ1))=0NΣ1ΣΣ1N1S1TΣ2N1S2TΣ2=0NΣ1N1S1Σ2N1S2Σ2=0NΣN1S1N1S2=0NΣN1S1N1S2=0Σ^=N1(N1S1+N2S2)

你可能感兴趣的:(机器学习之旅)