密度聚类DBSCAN的matlab代码

密度聚类DBSCAN的matlab代码

说明:本文给出密度聚类中的DBSCAN(Density-Based Spatial Clustering of Applications with Noise)算法参考资料及matlab代码

1.参考资料

1.《机器学习》周志华P211页
2. 原作者的代码下载地址DBSCAN Clustering Algorithm
3. 哔哩哔哩 https://www.bilibili.com/video/BV1fi4y1K7Ke?p=3

2.matlab代码

% Copyright (c) 2015, Yarpiz (www.yarpiz.com)
% All rights reserved. Please read the "license.txt" for license terms.
%
% Project Code: YPML110
% Project Title: Implementation of DBSCAN Clustering in MATLAB
% Publisher: Yarpiz (www.yarpiz.com)
% Developer: S. Mostapha Kalami Heris (Member of Yarpiz Team)
% Contact Info: sm.kalami@gmail.com, info@yarpiz.com
%------------------------------------------------------------------------
%DBSCAN is one of denesity-based clustering named "Density-Based Spatial
%Clustering of Applications with Noise"
% [IDX,isnoise]=DBSCAN(data,epsilon,MinPts,distmethon)
% input agruments:
%   data    : m-by-n matrix of m n-dimensional data points.
%   epsilon :
%   MinPts  : Number of allowed minimum points in the neighborhood
% output agruments:
%   IDX     : the labels of sample points 
%   isnoise : if 'isnoise' is ture, the corresponding Sample point is noise point
%------------------------------------------------------------------------------
% Revised and notes added by Qinming Zhang on 21 April 2021;
% ChangChun University of Science and Technology,ChangChun,130022,China
% If you have any questions, contact me at 907353999@qq.com.

function [IDX,isnoise]=DBSCAN(data,epsilon,MinPts,distmethon)

    if nargin==3
        distmethon='euclidean';     %  'euclidean'--- Euclidean distance (default)
    elseif nargin<3
        error('Insufficient Input Parameters!!!') 
    elseif nargin>4
        error('Too many input parameters!!!') 
    end

    C=0;
    n=size(data,1);
    IDX=zeros(n,1);
    %Calculation of distance matrix.
    D=pdist2(data,data,distmethon);     
    %If 'visited' is ture, the corresponding Sample point has been visited
    visited=false(n,1); 
    isnoise=false(n,1);
    
    for i=1:n
        if ~visited(i)
            visited(i)=true;
            
            Neighbors=RegionQuery(i);   % return the indexs of sample points which are in the neighbor of i-th sample point 
            if numel(Neighbors)<MinPts  % N = NUMEL(A) returns the number of elements, N, in array A
                % X(i,:) is NOISE
                isnoise(i)=true;        % if 'isnoise' is ture, the corresponding Sample point is not 'core object'
            else
                C=C+1;
                ExpandCluster(i,Neighbors,C);
            end
            
        end
    
    end
    
    function ExpandCluster(i,Neighbors,C)
        IDX(i)=C;
        
        k = 1;
        while true
            j = Neighbors(k);
            
            if ~visited(j)
                visited(j)=true;
                IDX(j)=C;
                Neighbors2=RegionQuery(j);
                if numel(Neighbors2)>=MinPts
                    Neighbors=[Neighbors Neighbors2];   %#ok
                end
            end
            
            k = k + 1;
            if k > numel(Neighbors)
                break;
            end
        end
    end
    
    function Neighbors=RegionQuery(i)
        Neighbors=find(D(i,:)<=epsilon);
    end

end

你可能感兴趣的:(聚类,机器学习,matlab)