混合高斯模型(GMM)实现



混合高斯模型(Mixtures of Gaussians

多值高斯分布,用到了期望最大化算法(Expectation-Maximization)来进行密度估计。

说一下EM的思想:

第一步猜测隐含类别变量,第二步是更新调整参数,以获得最大的似然估计。

关于算法的详细思想和推导,请见本文的参考文献。

 

以下是程序的代码:

function varargout = gmm(X, K_or_centroids)
%============================================================
%Expectation-Maximization iteration implementation of
% Gaussian Mixture Model.
%
% PX = GMM(X,K_OR_CENTROIDS)
% [PX MODEL] = GMM(X,K_OR_CENTROIDS)
%
% - X: N-by-D data matrix.
% - K_OR_CENTROIDS: either K indicating thenumber of
%      components or a K-by-D matrix indicatingthe
%      choosing of the initial K centroids.
%
% - PX: N-by-K matrix indicating theprobability of each
%      component generating each point.
% - MODEL: a structure containing theparameters for a GMM:
%      MODEL.Miu: a K-by-D matrix.
%      MODEL.Sigma: a D-by-D-by-K matrix.
%      MODEL.Pi: a 1-by-K vector.
%============================================================

   threshold = 1e-15;
   [N, D] = size(X);
 
   if isscalar(K_or_centroids)
       K = K_or_centroids;
       % randomly pick centroids
       rndp = randperm(N);
       centroids = X(rndp(1:K), :);
   else
       K = size(K_or_centroids, 1);
       centroids = K_or_centroids;
   end

   %initial values
   [pMiu pPi pSigma] = init_params();
 
   Lprev = -inf;
   while true
       Px = calc_prob();

       % new value for pGamma
       pGamma = Px .* repmat(pPi, N, 1);
       pGamma = pGamma ./ repmat(sum(pGamma,2), 1, K);

       % new value for parameters of each Component
       Nk = sum(pGamma, 1);
       pMiu = diag(1./Nk) * pGamma' * X;
       pPi = Nk/N;
       for kk = 1:K
           Xshift = X-repmat(pMiu(kk, :), N,1);
           pSigma(:, :, kk) = (Xshift' *...
               (diag(pGamma(:, kk)) * Xshift))/ Nk(kk);
       end

       % check for convergence
       L = sum(log(Px*pPi'));
       if L-Lprev < threshold
           break;
       end
       Lprev = L;
   end
   if nargout == 1
       varargout = {Px};
   else
       model = [];
       model.Miu = pMiu;
       model.Sigma = pSigma;
       model.Pi = pPi;
       varargout = {Px, model};
   end

   function [pMiu pPi pSigma] = init_params()
       pMiu = centroids;
       pPi = zeros(1, K);
       pSigma = zeros(D, D, K);
 
       % hard assign x to each centroids
       distmat = repmat(sum(X.*X, 2), 1, K) +...
           repmat(sum(pMiu.*pMiu, 2)', N, 1) -...
           2*X*pMiu';
       [dummy labels] = min(distmat, [], 2);
 
       for k=1:K
           Xk = X(labels == k, :);
           pPi(k) = size(Xk, 1)/N;
           pSigma(:, :, k) = cov(Xk);
       end
   end

 

   function Px = calc_prob()
       Px = zeros(N, K);
       for k = 1:K
           Xshift = X-repmat(pMiu(k, :), N,1);
           inv_pSigma = inv(pSigma(:, :, k));
           tmp = sum((Xshift*inv_pSigma) .*Xshift, 2);
           coef = (2*pi)^(-D/2) *sqrt(det(inv_pSigma));
           Px(:, k) = coef * exp(-0.5*tmp);
       end
   end
end

      

以下是驱动程序:

% MATLAB自带混合高斯模型函数
% gm =gmdistribution.fit(X,2,'Options',options);
mu1 = [1 2];
sigma1 = [3 .2; .2 2];
mu2 = [-1 -2];
sigma2 = [2 0; 0 1];

X =[mvnrnd(mu1,sigma1,200);mvnrnd(mu2,sigma2,100)];

scatter(X(:,1),X(:,2),10,'ko');

PX = gmm(X,2);

[~,idx]=max(PX,[],2);

cluster1 = (idx == 1);
cluster2 = (idx == 2);

scatter(X(cluster1,1),X(cluster1,2),10,'r+');

hold on
scatter(X(cluster2,1),X(cluster2,2),10,'bo');
legend('Cluster 1','Cluster 2','Location','NW')

 

以下是测试样例:

混合高斯模型(GMM)实现_第1张图片

 

参考文献:

http://cs229.stanford.edu/

http://www.cnblogs.com/jerrylead/archive/2012/05/08/2489725.html

http://www.cnblogs.com/CBDoctor/archive/2011/11/06/2236286.html

 

你可能感兴趣的:(机器学习)