Cluster Analysis--聚类分析

使用环境:MATLAB2016a、MATLAB2010a

先贴上维基的一段介绍:

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics and data compression.

聚类分析是一种无监督式学习,不像回归分析等需要Inputs 和 Targets ,聚类分析是只通过提供一组数据实现分类的算法。下面给出本人在学习聚类分析的MATLAB程序

x1=5*[randn(500,1)+5,randn(500,1)+5];
x2=5*[randn(500,1)+5,randn(500,1)-5];
x3=5*[randn(500,1)-5,randn(500,1)+5];
x4=5*[randn(500,1)-5,randn(500,1)-5];
x5=5*[randn(500,1),randn(500,1)];
all=[x1;x2;x3;x4;x5];  %生成2500*2的数组
plot(x1(:,1),x1(:,2),'r.');hold on
plot(x2(:,1),x2(:,2),'g.');...
plot(x3(:,1),x3(:,2),'k.');...
plot(x4(:,1),x4(:,2),'y.');...
plot(x5(:,1),x5(:,2),'b.');
IDX=kmeans(all,5); %这里用K-均值算法
for k=1:2500
    text(all(k,1),all(k,2),num2str(IDX(k)));
end
y=pdist(all);
z=linkage(y);
t=cluster(z,'cutoff',1.2);
for k=1:2500
    text(all(k,1),all(k,2),num2str(IDX(k)));
end

kemaes函数介绍

idx = kmeans(X,k) performs k-means clustering to partition the observations of the n-by-p data matrix X into k clusters, and returns an n-by-1 vector (idx) containing cluster indices of each observation. Rows of X correspond to points and columns correspond to variables.

运行结果如下:
Cluster Analysis--聚类分析_第1张图片

有结果可以看出输入数据被分成了5份,即5簇。

根据这个思想可以设计一段学生成绩分类的程序,假设A,B,C,D,E五位同学的成绩如下:

Subjects A B C D E
MATH 78 85 97 90 78
ENGLISH 85 79 91 91 81
C PROGRAMING 89 88 89 94 80
HISTORY 74 71 96 89 83
CIRCUITS 78 80 86 94 76
PHYSICS 84 83 90 90 78
MATLAB 83 77 85 86 88
A=[78    85    89    74    78    84    83];
B=[85    79    88    71    80    83    77];
C=[97    91    89    96    86    90    85];
D=[90    91    94    89    94    90    86];
E=[78    81    80    83    76    78    88];

all=[A;B;C;D;E];
IDX=kmeans(all,2)  % 在这里分成两类

得出的结果是:
idx =
     2
     2
     1
     1
     2

 在这里可以看到已经分成两类了,成绩较好的的IDX为1,稍逊的是2,

这是一个最简单的聚类应用

你可能感兴趣的:(matlab,clustering,聚类分析)