为了解决高斯模型的单峰性的问题,引入多个高斯模型的加权平均来拟合多峰数据:
p ( x ) = ∑ k = 1 K α k N ( μ k , Σ k ) p(x)=\sum\limits_{k=1}^K\alpha_k\mathcal{N}(\mu_k,\Sigma_k) p(x)=k=1∑KαkN(μk,Σk)
引入隐变量 z z z,表示对应样本 x x x 属于哪一个高斯分布,该变量为离散随机变量:
Z | C 1 C_1 C1 | C 2 C_2 C2 | ⋯ \cdots ⋯ | C K C_K CK |
---|---|---|---|---|
P | p 1 p_1 p1 | p 2 p_2 p2 | p K p_K pK |
p ( z = i ) = p i , ∑ i = 1 K p ( z = i ) = 1 p(z=i)=p_i,\sum\limits_{i=1}^Kp(z=i)=1 p(z=i)=pi,i=1∑Kp(z=i)=1
高斯混合模型是生成式模型,通过隐变量 z z z 的分布来生成样本,概率图表示如下:
x x x 就是 z z z 生成的高斯分布样本,样本 ( x 1 , z 1 ) (x_1,z_1) (x1,z1) ( x 2 , z 2 ) (x_2,z_2) (x2,z2) ⋯ \cdots ⋯ ( x K , z K ) (x_K,z_K) (xK,zK)相互独立的,对于 p ( X ) p(X) p(X): p ( X ) = ∑ z p ( X , z ) = ∑ k = 1 K p ( X , z = k ) = ∑ k = 1 K p ( z = k ) p ( X ∣ z = k ) p(X)=\sum\limits_zp(X,z)=\sum\limits_{k=1}^Kp(X,z=k)=\sum\limits_{k=1}^Kp(z=k)p(X|z=k) p(X)=z∑p(X,z)=k=1∑Kp(X,z=k)=k=1∑Kp(z=k)p(X∣z=k) 因此: p ( X ) = ∑ k = 1 K p k N ( X ∣ μ k , Σ k ) p(X)=\sum\limits_{k=1}^Kp_k\mathcal{N}(X |\mu_k,\Sigma_k) p(X)=k=1∑KpkN(X∣μk,Σk)
观测样本: X = ( x 1 , x 2 , ⋯ , x N ) X=(x_1,x_2,\cdots,x_N) X=(x1,x2,⋯,xN)
模型参数: θ = { p 1 , p 2 , ⋯ , p K , μ 1 , μ 2 , ⋯ , μ K , Σ 1 , Σ 2 , ⋯ , Σ K } \theta=\{p_1,p_2,\cdots,p_K,\mu_1,\mu_2,\cdots,\mu_K,\Sigma_1,\Sigma_2,\cdots,\Sigma_K\} θ={p1,p2,⋯,pK,μ1,μ2,⋯,μK,Σ1,Σ2,⋯,ΣK}
完全样本应该是: ( X , Z ) (X,Z) (X,Z) ,用极大似然估计参数 θ \theta θ :
θ M L E = a r g m a x θ log p ( X ) = a r g m a x θ log ∏ i = 1 N p ( x i ) ⏟ 样本相互独立 = a r g m a x θ ∑ i = 1 N log p ( x i ) = a r g m a x θ ∑ i = 1 N log ∑ k = 1 K p k N ( x i ∣ μ k , Σ k ) \begin{aligned} \theta_{MLE}&=\mathop{argmax}\limits_{\theta}\log {\color{blue}p(X)}=\mathop{argmax}\limits_{\theta}\log \underbrace{\color{blue}\prod\limits_{i=1}^Np(x_i)}_{\color{blue}\text{样本相互独立}}\\ &=\mathop{argmax}\limits_{\theta}\sum\limits_{i=1}^N\log p(x_i)\\ &=\mathop{argmax}\limits_\theta\sum\limits_{i=1}^N\log \sum\limits_{k=1}^Kp_k\mathcal{N}(x_i|\mu_k,\Sigma_k) \end{aligned} θMLE=θargmaxlogp(X)=θargmaxlog样本相互独立 i=1∏Np(xi)=θargmaxi=1∑Nlogp(xi)=θargmaxi=1∑Nlogk=1∑KpkN(xi∣μk,Σk)
上式无法直接通过求导得到解析解,可以使用 EM 算法进行迭代求解。
EM 算法的迭代公式为: θ t + 1 = a r g m a x θ E z ∣ x , θ t [ p ( x , z ∣ θ ) ] ⏟ Q ( θ , θ t ) \theta^{t+1}=\mathop{argmax}\limits_{\theta}\underbrace{\mathbb{E}_{z|x,\theta_t}[p(x,z|\theta)]}_{\color{blue}Q(\theta,\theta^t)} θt+1=θargmaxQ(θ,θt) Ez∣x,θt[p(x,z∣θ)]
将 GMM 表达式代入得到 :
Q ( θ , θ t ) = ∑ z [ log ∏ i = 1 N p ( x i , z i ∣ θ ) ] ∏ i = 1 N p ( z i ∣ x i , θ t ) = ∑ z [ ∑ i = 1 N log p ( x i , z i ∣ θ ) ] ∏ i = 1 N p ( z i ∣ x i , θ t ) = ∑ z [ log p ( x 1 , z 1 ∣ θ ) + ⋯ + log p ( x N , z N ∣ θ ) ] ⏟ ∑ i = 1 N log p ( x i , z i ∣ θ ) ∏ i = 1 N p ( z i ∣ x i , θ t ) \begin{aligned} Q(\theta,\theta^t)&=\sum\limits_z[\log\prod\limits_{i=1}^Np(x_i,z_i|\theta)]\prod \limits_{i=1}^Np(z_i|x_i,\theta^t)\\ &=\sum\limits_z[\sum\limits_{i=1}^N\log p(x_i,z_i|\theta)]\prod \limits_{i=1}^Np(z_i|x_i,\theta^t) \\ &=\sum\limits_z\underbrace{[{\color{blue}\log p(x_1,z_1|\theta)}+\cdots+\log p(x_N,z_N|\theta)]}_{\color{blue}\sum\limits_{i=1}^N\log p(x_i,z_i|\theta)}\prod \limits_{i=1}^Np(z_i|x_i,\theta^t) \end{aligned} Q(θ,θt)=z∑[logi=1∏Np(xi,zi∣θ)]i=1∏Np(zi∣xi,θt)=z∑[i=1∑Nlogp(xi,zi∣θ)]i=1∏Np(zi∣xi,θt)=z∑i=1∑Nlogp(xi,zi∣θ) [logp(x1,z1∣θ)+⋯+logp(xN,zN∣θ)]i=1∏Np(zi∣xi,θt)
将上式第二个求和符号展开,并取出第一项观察其规律:
∑ z log p ( x 1 , z 1 ∣ θ ) ∏ i = 1 N p ( z i ∣ x i , θ t ) = ∑ z log p ( x 1 , z 1 ∣ θ ) p ( z 1 ∣ x 1 , θ t ) ∏ i = 2 N p ( z i ∣ x i , θ t ) = ∑ z 1 log p ( x 1 , z 1 ∣ θ ) p ( z 1 ∣ x 1 , θ t ) ∑ z 2 , ⋯ , z N ∏ i = 2 N p ( z i ∣ x i , θ t ) = ∑ z 1 log p ( x 1 , z 1 ∣ θ ) p ( z 1 ∣ x 1 , θ t ) ∑ z 2 p ( z 2 ∣ x 2 , θ t ) ⏟ 1 ⋯ ∑ z N p ( z N ∣ x N , θ t ) ⏟ 1 = ∑ z 1 log p ( x 1 , z 1 ∣ θ ) p ( z 1 ∣ x 1 , θ t ) \begin{aligned} & \sum\limits_z\log p(x_1,z_1|\theta)\prod\limits_{i=1}^Np(z_i|x_i,\theta^t)\\ &=\sum\limits_z\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t)\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ &=\sum\limits_{z_1}\log p(x_1,z_1|\theta) p(z_1|x_1,\theta^t)\sum\limits_{z_2,\cdots,z_N}\prod\limits_{i=2}^Np(z_i|x_i,\theta^t)\\ &=\sum\limits_{z_1}\log p(x_1,z_1|\theta) p(z_1|x_1,\theta^t) \underbrace{\sum\limits_{z_2}p(z_2|x_2,\theta^t)}_{\color{blue}1}\cdots \underbrace{\sum\limits_{z_N}p(z_N|x_N,\theta^t)}_{\color{blue}1} \\ &=\sum\limits_{z_1}\log p(x_1,z_1|\theta)p(z_1|x_1,\theta^t) \end{aligned} z∑logp(x1,z1∣θ)i=1∏Np(zi∣xi,θt)=z∑logp(x1,z1∣θ)p(z1∣x1,θt)i=2∏Np(zi∣xi,θt)=z1∑logp(x1,z1∣θ)p(z1∣x1,θt)z2,⋯,zN∑i=2∏Np(zi∣xi,θt)=z1∑logp(x1,z1∣θ)p(z1∣x1,θt)1 z2∑p(z2∣x2,θt)⋯1 zN∑p(zN∣xN,θt)=z1∑logp(x1,z1∣θ)p(z1∣x1,θt)
则 Q Q Q 可以写成以下形式:
Q ( θ , θ t ) = ∑ i = 1 N ∑ z i log p ( x i , z i ∣ θ ) p ( z i ∣ x i , θ t ) {\color{blue}Q(\theta,\theta^t)}=\sum\limits_{i=1}^N\sum\limits_{\color{blue}z_i}\log p(x_i,z_i|\theta)p(z_i|x_i,\theta^t) Q(θ,θt)=i=1∑Nzi∑logp(xi,zi∣θ)p(zi∣xi,θt)
其中 p ( x , z ∣ θ ) p(x,z|\theta) p(x,z∣θ):
p ( x , z ∣ θ ) = p ( z ∣ θ ) p ( x ∣ z , θ ) = p z N ( x ∣ μ z , Σ z ) p(x,z|\theta)=p(z|\theta)p(x|z,\theta)=p_z\mathcal{N}(x|\mu_z,\Sigma_z) p(x,z∣θ)=p(z∣θ)p(x∣z,θ)=pzN(x∣μz,Σz)
其中 p ( z ∣ x , θ t ) p(z|x,\theta^t) p(z∣x,θt):
p ( z ∣ x , θ t ) = p ( x , z ∣ θ t ) p ( x ∣ θ t ) = p z t N ( x ∣ μ z t , Σ z t ) ∑ k p k t N ( x ∣ μ k t , Σ k t ) p(z|x,\theta^t)=\frac{p(x,z|\theta^t)}{p(x|\theta^t)}=\frac{p_z^t\mathcal{N}(x|\mu_z^t,\Sigma_z^t)}{\sum\limits_kp_k^t\mathcal{N}(x|\mu_k^t,\Sigma_k^t)} p(z∣x,θt)=p(x∣θt)p(x,z∣θt)=k∑pktN(x∣μkt,Σkt)pztN(x∣μzt,Σzt)
E − s t e p \color{blue}E-step E−step 求期望 E z ∣ x , θ t \mathbb{E}_{z|x,\theta_t} Ez∣x,θt, p ( z ∣ x , θ t ) p(z|x,\theta^t) p(z∣x,θt) 是上一次迭代的结果。将上式代入 Q Q Q 中得到:
Q = ∑ i = 1 N ∑ z i log p z i N ( x i ∣ μ z i , Σ z i ) p z i t N ( x i ∣ μ z i t , Σ z i t ) ∑ k p k t N ( x i ∣ μ k t , Σ k t ) ⏟ 第 t 步中确定的常数项 Q=\sum\limits_{i=1}^N\sum\limits_{z_i}\log p_{z_i}\mathcal{N(x_i|\mu_{z_i},\Sigma_{z_i})}\underbrace{\frac{p_{z_i}^t\mathcal{N}(x_i|\mu_{z_i}^t,\Sigma_{z_i}^t)}{\sum\limits_kp_k^t\mathcal{N}(x_i|\mu_k^t,\Sigma_k^t)}}_{\color{blue}\text{第$t$步中确定的常数项}} Q=i=1∑Nzi∑logpziN(xi∣μzi,Σzi)第t步中确定的常数项 k∑pktN(xi∣μkt,Σkt)pzitN(xi∣μzit,Σzit)
M − s t e p \color{blue}M-step M−step 最大化 Q Q Q 求模型参数 μ k t + 1 , Σ k t + 1 , p k t + 1 \color{blue}\mu^{t+1}_k,\ \Sigma^{t+1}_k,\color{blue}p_k^{t+1} μkt+1, Σkt+1,pkt+1:
Q = ∑ i = 1 N ∑ z i log p z i N ( x i ∣ μ z i , Σ z i ) p ( z i ∣ x i , θ t ) = ∑ z i ∑ i = 1 N log p z i N ( x i ∣ μ z i , Σ z i ) p ( z i ∣ x i , θ t ) = ∑ k = 1 K ∑ i = 1 N [ log p k + log N ( x i ∣ μ k , Σ k ) ] p ( z i = C k ∣ x i , θ t ) \begin{aligned} Q&=\sum\limits_{i=1}^N\sum\limits_{\color{blue}z_i}\log p_{z_i}\mathcal{N(x_i|\mu_{z_i},\Sigma_{z_i})}p(z_i|x_i,\theta^t)\\ &=\sum\limits_{\color{blue}z_i}\sum\limits_{i=1}^N\log p_{z_i}\mathcal{N(x_i|\mu_{z_i},\Sigma_{z_i})}p(z_i|x_i,\theta^t)\\ &=\sum\limits_{\color{blue}k=1}^{\color{blue}K}\sum\limits_{i=1}^N[\log p_k+\log \mathcal{N}(x_i|\mu_k,\Sigma_k)]p(z_i=C_k|x_i,\theta^t) \end{aligned} Q=i=1∑Nzi∑logpziN(xi∣μzi,Σzi)p(zi∣xi,θt)=zi∑i=1∑NlogpziN(xi∣μzi,Σzi)p(zi∣xi,θt)=k=1∑Ki=1∑N[logpk+logN(xi∣μk,Σk)]p(zi=Ck∣xi,θt)
先求参数 p k t + 1 \color{blue}\boxed{p_k^{t+1}} pkt+1,取出式中 p k p_k pk 相关项:
{ p k t + 1 = a r g m a x p k ∑ k = 1 K ∑ i = 1 N log p k p ( z i = C k ∣ x i , θ t ) s . t . ∑ k = 1 K p k = 1 \left\{\begin{aligned} &p_k^{t+1}=\mathop{argmax}\limits_{p_k}\sum\limits_{k=1}^K\sum\limits_{i=1}^N\log p_kp(z_i=C_k|x_i,\theta^t)\\ &s.t.\ \sum\limits_{k=1}^Kp_k=1 \end{aligned}\right. ⎩⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎧pkt+1=pkargmaxk=1∑Ki=1∑Nlogpkp(zi=Ck∣xi,θt)s.t. k=1∑Kpk=1
引入 Lagrange 乘子:
L ( p k , λ ) = ∑ k = 1 K ∑ i = 1 N log p k p ( z i = C k ∣ x i , θ t ) − λ ( 1 − ∑ k = 1 K p k ) \begin{aligned} L(p_k,\lambda)=\sum\limits_{k=1}^K\sum\limits_{i=1}^N\log p_kp(z_i=C_k|x_i,\theta^t)-\lambda(1-\sum\limits_{k=1}^Kp_k) \end{aligned} L(pk,λ)=k=1∑Ki=1∑Nlogpkp(zi=Ck∣xi,θt)−λ(1−k=1∑Kpk)
所以:
∂ ∂ p k L = ∑ i = 1 N 1 p k p ( z i = C k ∣ x i , θ t ) + λ = 0 ⟹ ∑ k ∑ i = 1 N 1 p k p ( z i = C k ∣ x i , θ t ) + λ ∑ k p k = 0 ⟹ λ = − N \begin{aligned} &\frac{\partial}{\partial p_k}L=\sum\limits_{i=1}^N\frac{1}{p_k}p(z_i=C_k|x_i,\theta^t)+\lambda=0\\ &\Longrightarrow \sum\limits_k\sum\limits_{i=1}^N\frac{1}{p_k}p(z_i=C_k|x_i,\theta^t)+\lambda\sum\limits_kp_k=0\\ &\Longrightarrow\lambda=-N \end{aligned} ∂pk∂L=i=1∑Npk1p(zi=Ck∣xi,θt)+λ=0⟹k∑i=1∑Npk1p(zi=Ck∣xi,θt)+λk∑pk=0⟹λ=−N
于是有:
p k t + 1 = 1 N ∑ i = 1 N p ( z i = C k ∣ x i , θ t ) {\color{blue}p_k^{t+1}}=\frac{1}{N}\sum\limits_{i=1}^Np(z_i=C_k|x_i,\theta^t) pkt+1=N1i=1∑Np(zi=Ck∣xi,θt)
参数 μ k t + 1 , Σ k t + 1 \color{blue}\mu^{t+1}_k,\ \Sigma^{t+1}_k μkt+1, Σkt+1 是无约束的,直接求偏导即可。
【1】高斯混合模型
【2】手把手教你实现一个高斯混合模型
【3】详解EM算法与混合高斯模型(Gaussian mixture model, GMM)