K-Means分群技术,能够将不同的数据进行很好的分类,并且它能够很快的收敛,从而使我们能够迅速的得到结果。
K-Means演算法:
(1)最初的设定依照分类的个数k,以random随机产生k个圆心坐标{c1(m),c2(m),c3(m)...ck(m)}
(2)计算所有培训数据与圆心坐标的距离,并进行第一次分群。计算距离的时候使采用欧几里得方法,进行第一次分群是根据距离来判断,选择与每个圆心坐标距离最小的,例如:第一个培训数据计算结果是与圆心3的距离最短,那么低一个培训数据属于群3,其它依次类推。
(3)重新计算圆心坐标,计算的方法举个例子,比如:计算的结果属于群3的,总共有3笔资料,这三个资料分布用a1,a2,a3代表,那么新圆心的横坐标就是a1,a2,a3三个资料的横坐标的和除以3(总共3笔资料),纵坐标是a1,a2,a3三个资料的纵坐标的和除以3(总共3笔资料),从而得到新的圆心的坐标。
(4)测试是否收敛。
训练数据如下:Pattern[1]=(0,0) Pattern[2]=(1,0) Pattern[3]=(0,1)
Pattern[4]=(2,1) Pattern[5]=(1,2) Pattern[6]=(2,2)
Pattern[7]=(2,0) Pattern[8]=(0,2) Pattern[9]=(7,6)
Pattern[10]=(7,7) Pattern[11]=(7,8) Pattern[12]=(8,6)
Pattern[13]=(8,7) Pattern[14]=(8,8) Pattern[15]=(8,9)
Pattern[16]=(9,7) Pattern[17]=(9,8) Pattern[18]=(9,9)
Step 1.随机产出三个圆心坐标:ClusterCenter[1]=(0,0) ClusterCenter[2]=(1,0) ClusterCenter[3]=(2,0)
Step 2.计算各训练样本与分类圆心座标的距离
循链范例 |
Cluster[1] |
Cluster[2] |
Cluster[3] |
指定聚类 |
Pattern[1] |
0.0 |
1.0 |
2.0 |
1 |
Pattern[2] |
1.0 |
0.0 |
1.0 |
2 |
Pattern[3] |
1.0 |
1.4 |
2.2 |
1 |
Pattern[4] |
2.2 |
1.4 |
1.0 |
3 |
Pattern[5] |
2.2 |
2.0 |
2.2 |
2 |
Pattern[6] |
2.8 |
2.2 |
2.0 |
3 |
Pattern[7] |
2.0 |
1.0 |
0.0 |
3 |
Pattern[8] |
2.0 |
2.2 |
2.8 |
1 |
Pattern[9] |
9.2 |
8.5 |
7.8 |
3 |
Pattern[10] |
9.9 |
9.2 |
8.6 |
3 |
Pattern[11] |
10.6 |
10.0 |
9.4 |
3 |
Pattern[12] |
10.0 |
9.2 |
8.5 |
3 |
Pattern[13] |
10.6 |
9.9 |
9.2 |
3 |
Pattern[14] |
11.3 |
10.6 |
10.0 |
3 |
Pattern[15] |
12.0 |
10.4 |
10.8 |
2 |
Pattern[16] |
11.4 |
10.6 |
9.9 |
3 |
Pattern[17] |
12.0 |
11.3 |
10.6 |
3 |
Pattern[18] |
12.7 |
12.0 |
11.4 |
3 |
重新计算聚类圆心坐标
ClusterCenter[1]=(0.0,1.0);
ClusterCenter[2]=(3.33,3.67);
ClusterCenter[3]=(6.5,5.75);
再次计算各训练样本与分类圆心座标的距离
循链范例 |
Cluster[1] |
Cluster[2] |
Cluster[3] |
指定聚类 |
Pattern[1] |
1.0 |
5.0 |
8.7 |
1 |
Pattern[2] |
1.4 |
4.3 |
8.0 |
1 |
Pattern[3] |
0.0 |
4.3 |
8.1 |
1 |
Pattern[4] |
2.0 |
3.0 |
6.5 |
1 |
Pattern[5] |
1.4 |
2.9 |
6.7 |
1 |
Pattern[6] |
2.2 |
2.1 |
5.9 |
2 |
Pattern[7] |
2.2 |
3.9 |
7.3 |
1 |
Pattern[8] |
1.0 |
3.7 |
7.5 |
1 |
Pattern[9] |
8.6 |
4.3 |
0.6 |
3 |
Pattern[10] |
9.2 |
5.0 |
1.3 |
3 |
Pattern[11] |
9.9 |
5.7 |
2.3 |
3 |
Pattern[12] |
9.4 |
5.2 |
1.5 |
3 |
Pattern[13] |
10.0 |
5.7 |
2.0 |
3 |
Pattern[14] |
10.6 |
6.4 |
2.7 |
3 |
Pattern[15] |
11.3 |
7.1 |
3.6 |
3 |
Pattern[16] |
10.8 |
6.6 |
2.8 |
3 |
Pattern[17] |
11.4 |
7.1 |
3.4 |
3 |
Pattern[18] |
12.0 |
7.8 |
4.1 |
3 |
重新计算聚类圆心坐标
ClusterCenter[1]=(0.86,0.86);
ClusterCenter[2]=(2.0,2.0);
ClusterCenter[3]=(8.0,7.5);
再次计算各训练样本与分类圆心座标的距离
循链范例 |
Cluster[1] |
Cluster[2] |
Cluster[3] |
指定聚类 |
Pattern[1] |
1.2 |
2.8 |
11.0 |
1 |
Pattern[2] |
0.9 |
2.2 |
10.3 |
1 |
Pattern[3] |
0.9 |
2.2 |
10.3 |
1 |
Pattern[4] |
1.1 |
1.0 |
8.8 |
2 |
Pattern[5] |
1.1 |
1.0 |
8.9 |
2 |
Pattern[6] |
1.6 |
0.0 |
8.1 |
2 |
Pattern[7] |
1.4 |
2.0 |
9.6 |
1 |
Pattern[8] |
1.4 |
2.0 |
9.7 |
1 |
Pattern[9] |
8.0 |
6.4 |
1.8 |
3 |
Pattern[10] |
8.7 |
7.1 |
1.1 |
3 |
Pattern[11] |
9.4 |
7.8 |
1.1 |
3 |
Pattern[12] |
8.8 |
7.2 |
1.5 |
3 |
Pattern[13] |
9.4 |
7.8 |
0.5 |
3 |
Pattern[14] |
10.1 |
8.5 |
0.5 |
3 |
Pattern[15] |
10.8 |
9.2 |
1.5 |
3 |
Pattern[16] |
10.2 |
8.6 |
1.1 |
3 |
Pattern[17] |
10.8 |
9.2 |
1.1 |
3 |
Pattern[18] |
11.5 |
9.9 |
1.8 |
3 |
重新计算聚类圆心坐标
ClusterCenter[1]=(0.6,0.6);
ClusterCenter[2]=(1.67,1.67);
ClusterCenter[3]=(8.0,7.5);
循链范例 |
Cluster[1] |
Cluster[2] |
Cluster[3] |
指定聚类 |
Pattern[1] |
0.8 |
2.4 |
11.0 |
1 |
Pattern[2] |
0.7 |
1.8 |
10.3 |
1 |
Pattern[3] |
0.7 |
1.8 |
10.3 |
1 |
Pattern[4] |
1.5 |
0.7 |
8.8 |
2 |
Pattern[5] |
1.5 |
0.7 |
8.9 |
2 |
Pattern[6] |
2.0 |
0.5 |
8.1 |
2 |
Pattern[7] |
1.5 |
1.7 |
9.6 |
1 |
Pattern[8] |
1.5 |
1.7 |
9.7 |
1 |
Pattern[9] |
8.4 |
6.9 |
1.8 |
3 |
Pattern[10] |
9.1 |
7.5 |
1.1 |
3 |
Pattern[11] |
9.8 |
8.3 |
1.1 |
3 |
Pattern[12] |
9.2 |
7.7 |
1.5 |
3 |
Pattern[13] |
9.8 |
8.3 |
0.5 |
3 |
Pattern[14] |
10.5 |
9.0 |
0.5 |
3 |
Pattern[15] |
11.2 |
9.7 |
1.5 |
3 |
Pattern[16] |
10.6 |
9.1 |
1.1 |
3 |
Pattern[17] |
11.2 |
9.7 |
1.1 |
3 |
Pattern[18] |
11.9 |
10.4 |
1.8 |
3 |
重新计算聚类圆心坐标
ClusterCenter[1]=(0.6,0.6);
ClusterCenter[2]=(1.67,1.67);
ClusterCenter[3]=(8.0,7.5);
聚类圆心座标不再改变,演算法达到收敛,因此决定最后的聚类圆心座标为
ClusterCenter[1]=(0.6,0.6);
ClusterCenter[2]=(1.67,1.67);
ClusterCenter[3]=(8.0,7.5);