Weka数据挖掘——聚类

如果你渴望得到某样东西,你得让它自由,如果它回到你身边,它就是属于你的,如果它不回来,你就从未拥有过它。——大仲马《基督山伯爵》

生活是一面镜子,我们努力追求的第一件事,就是从中辨认出自己。——尼采

目录

      • 目录
      • 聚类概念
      • 聚类算法的介绍
        • 2-1 KMeansK均值
        • 2-2 EM期望最大化
        • 2-3 DBSCAN具有噪声的基于密度的聚类方法
      • Weka聚类案例
        • 3-1 SimpleKMeans算法
        • 3-2 EM算法
        • 3-3 DBSCAN具有噪声的基于密度的聚类方法

1 聚类概念

关于聚类的一些相关的概念请看这里。
聚类是对物理对象或者抽象对象的集合进行分组的过程,所生成的组称为簇,簇是数据对象的集合。簇内部两个对象之间应该具有较高的相似度,而对于不同簇的两个对象之间应该具有较高的相异度。相异度一般是根据描述对象的两个属性值进行计算,最常采用的度量指标是对象间的距离。

2 聚类算法的介绍

2-1 KMeans(K均值)

KMens是基于原型的、划分的聚类技术,试图划分用户指定 个数k 的簇。
K-means算法的基本思想是:以空间中k个点为中心进行聚类,对最靠近他们的对象归类。通过迭代的方法,逐次更新各聚类中心的值,直至得到最好的聚类结果。

算法:

选择k个点作为初始质心
repeat 
    将每个点指派给最近的质心,形成k个簇
    重新计算每个簇的质心
until 质心不再发生变化

相似度的计算可以使用欧氏距离或者曼哈顿距离。

考虑临近度是欧氏距离的数据,通常使用误差平方和SSE(Sum of the Qquares Error)作为度量聚类质量的目标函数。SSE的定义如下所示:

SSE=i=1KxCidist(ci,x)

2-2 EM(期望最大化)

EM(Expectation Maximization)是KMeans方法的一个扩展,它不是把对象分配给一个确定的簇,而是根据对象与簇之间的隶属关系发生的概率来分配对象。EM算法是解决数据缺失问题的一种出色的算法。
EM算法使用两个步骤交替计算:
第一步是计算期望(E),利用对隐藏变量的现有估计值,计算其最大似然估计值;
第二步是最大化(M),最大化在 E 步上求得的最大似然值来计算参数的值。
然后将M 步上找到的参数估计值被用于下一个 E 步计算中,这个过程不断交替进行。
参考链接 从最大似然到EM算法浅解
比较复杂的概率理论知识…… 目前我还没有彻底理解。

2-3 DBSCAN(具有噪声的基于密度的聚类方法)

DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种基于密度的聚类算法,簇的个数由算法自动确定。将低密度区域中的点视为噪声而忽略,因此DBSCAN不产生完全聚类。
常用术语的定义:

  1. 半径(Eps):用户指定的距离
  2. 核心点(Core Point):位于基于密度的簇的内部。点的邻域由距离函数和用户指定的距离Eps共同决定。核心点的定义是,如果该点的给定邻域内的点的个数超过给定的阈值MinPts,MinPts由用户指定。
  3. 边界点(Border Point):边界点不是核心点,但是落在核心点的邻域内。
  4. 噪声点(Noise Point):既不是核心点也不是边界点的点称为噪声点。
    DBSCAN算法描述:
输入: 包含n个对象的数据库,半径e,最少数目MinPts;
输出:所有生成的簇,达到密度要求。
(1)Repeat
(2)从数据库中抽出一个未处理的点;
(3)IF抽出的点是核心点 THEN 找出所有从该点密度可达的对象,形成一个簇;
(4)ELSE 抽出的点是边缘点(非核心对象),跳出本次循环,寻找下一个点;
(5)UNTIL 所有的点都被处理。

DBSCAN对用户定义的参数很敏感,细微的不同都可能导致差别很大的结果,而参数的选择无规律可循,只能靠经验确定。

其伪代码描述如下:

//输入:数据对象集合D,半径Eps,密度阈值MinPts
//输出:聚类C

DBSCAN(D, Eps, MinPts){
 //未处理的当前集合
 unprocessSet=null;
 for each unvisited point p in D{
    mark p as visited; //将p标记为已访问

    N = getNeighbours (p, Eps);
    unprocessSet(N);//候选集合构建

    if sizeOf(N) < MinPts then
        mark p as Noise; //如果满足sizeOf(N) < MinPts,则将p标记为噪声
    else
        C= next cluster; //建立新簇C        
        ExpandCluster (p, N, C, Eps, MinPts,unprocessSet);
 }
}
//其中ExpandCluster算法伪码如下:
ExpandCluster(p, N, C, Eps, MinPts,unprocessSet){
    add p to cluster C; //首先将核心点加入C
    for each point p’ in unprocessSet N{
        mark p' as visited;//标记为已经访问
        N’ = getNeighbours (p’, Eps); //对N邻域内的所有点在进行半径检查
        if sizeOf(N’) >= MinPts then
            N = N+N’; //如果大于MinPts,就扩展N的数目
            //扩大候选集
            unprocessSet(N);
        //如果当前不属于任何的簇,那么就将这个对象添加到当前的簇中
        if p’ is not member of any cluster
            add p’ to cluster C; //将p' 加入簇C
    }
}

参考:百度百科:DBSCAN

3 Weka聚类案例

3-1 SimpleKMeans算法

weka.clusterers.SimpleKMeans
使用weather.numeric.arrf文件中的数据来测试运行结果如下:

=== Run information ===

Scheme:       weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation:     weather
Instances:    14
Attributes:   5
              outlook
              temperature
              humidity
              windy
              play
Test mode:    evaluate on training data


=== Clustering model (full training set) ===


kMeans
======

Number of iterations: 3
Within cluster sum of squared errors: 16.237456311387238

Initial starting points (random):

Cluster 0: rainy,75,80,FALSE,yes
Cluster 1: overcast,64,65,TRUE,yes

Missing values globally replaced with mean/mode

Final cluster centroids:
                           Cluster#
Attribute      Full Data          0          1
                  (14.0)      (9.0)      (5.0)
==============================================
outlook            sunny      sunny   overcast
temperature      73.5714    75.8889       69.4
humidity         81.6429    84.1111       77.2
windy              FALSE      FALSE       TRUE
play                 yes        yes        yes




Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       9 ( 64%)
1       5 ( 36%)

聚类结果以表格的形式显示,行对应属性名,列对应簇中心。如果是数值属性则显示平均值,如果是标称属性,则显示簇所在列对应的属性标签。

Attribute Full Data 0 1
- (14.0) (9.0) (5.0)
outlook sunny sunny overcast
temperature 73.5714 75.8889 69.4
humidity 81.6429 84.1111 77.2
windy FALSE FALSE TRUE
play yes yes yes

3-2 EM算法

与上面的不同的是,这里的表头并没有显示实例的数量,只是在表头的括号内显示其先验概率。表中单元格显示数值属性正态分布的参数或者是标称属性的频率计数。小数,揭示了EM算法的“Soft”的特性,任何实例都可以在若干个簇之间分割。在输出的最后,显示了模型的对数似然值,这是相对于训练数据。

运行结果如下:

=== Run information ===

Scheme:       weka.clusterers.EM -I 100 -N 2 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:     weather
Instances:    14
Attributes:   5
              outlook
              temperature
              humidity
              windy
              play
Test mode:    evaluate on training data


=== Clustering model (full training set) ===


EM
==

Number of clusters: 2
Number of iterations performed: 7


              Cluster
Attribute           0       1
               (0.35)  (0.65)
==============================
outlook
  sunny         3.8732  3.1268
  overcast      1.7746  4.2254
  rainy         2.1889  4.8111
  [total]       7.8368 12.1632
temperature
  mean         76.9173 71.8054
  std. dev.     5.8302  5.8566

humidity
  mean         90.1132 77.1719
  std. dev.     3.8066  9.1962

windy
  TRUE            3.14    4.86
  FALSE         3.6967  6.3033
  [total]       6.8368 11.1632
play
  yes           2.1227  8.8773
  no            4.7141  2.2859
  [total]       6.8368 11.1632


Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       4 ( 29%)
1      10 ( 71%)
//对数似然值
Log likelihood: -9.13037

3-3 DBSCAN(具有噪声的基于密度的聚类方法)

DBSCAN使用欧式距离度量,以确定哪些实例属于同一个簇。不同于划分的方法,DBSCAN可以自动的确定簇的数量,发现任意形状的簇,并引入离群的概念。在用户指定的最小距离 ε 和 簇的最小值minPts的约束下,完成聚簇。某些不属于任何簇的实例,称为离群值。
OPTICS算法是DBSCAN算法在层次聚类方面的扩展。OPTICS规定了实例的顺序,这些实例进行二维可视化,揭示簇的层次结构,排序过程根据距离度量,以及在列表中彼此相邻的位置,按照顺序排列彼此最接近的实例。

OPTICS算法最后的生成结果是有顺序的可以自由选择可达距离的聚簇方法。

/////OPTICS算法额外存储了每个对象的核心距离和可达距离。
////基于OPTICS产生的排序信息来提取类簇。
算法描述如下:
算法:OPTICS
输入:样本集D, 邻域半径E, 给定点在E领域内成为核心对象的最小领域点数MinPts
输出:具有可达距离信息的样本点输出排序
方法:
 1. 创建两个队列,有序队列和结果队列。(有序队列用来存储核心对象及其该核心对象的直接可达对象,并按可达距离升序排列;结果队列用来存储样本点的输出次序);

 2. 如果所有样本集D中所有点都处理完毕,则算法结束。否则,选择一个未处理(即不在结果队列中)且为核心对象的样本点,找到其所有直接密度可达样本点,如果该样本点不存在于结果队列中,则将其放入有序队列中,并按可达距离排序;
 3. 如果有序队列为空,则跳至步骤2,否则,从有序队列中取出第一个样本点(即可达距离最小的样本点)进行拓展,并将取出的样本点保存至结果队列中,如果它不存在结果队列当中的话.
    3.1 判断该拓展点是否是核心对象,如果不是,回到步骤3,否则找到该拓展点所有的直接密度可达点;
    3.2 判断该直接密度可达样本点是否已经存在结果队列,是则不处理,否则下一步;
    3.3 如果有序队列中已经存在该直接密度可达点,如果此时新的可达距离小于旧的可达距离,则用新可达距离取代旧可达距离,有序队列重新排序;
    3.4 如果有序队列中不存在该直接密度可达样本点,则插入该点,并对有序队列
   重新排序;
4. 算法结束,输出结果队列中的有序样本点。

OPTICS的WEKA执行结果

=== Run information ===

Scheme:       weka.clusterers.OPTICS -E 0.2 -M 5 -A "weka.core.EuclideanDistance -R first-last" -db-output .
Relation:     iris
Instances:    150
Attributes:   5
              sepallength
              sepalwidth
              petallength
              petalwidth
Ignored:
              class
Test mode:    evaluate on training data


=== Clustering model (full training set) ===

OPTICS clustering results
============================================================================================

Clustered DataObjects: 150
Number of attributes: 4
Epsilon: 0.2; minPoints: 5
Write results to file: no
Distance-type: 
Number of generated clusters: 0
Elapsed time: .02

(  0.) 5.1,3.5,1.4,0.2                           -->  c_dist: 0.05         r_dist: UNDEFINED   
( 17.) 5.1,3.5,1.4,0.3                           -->  c_dist: 0.061        r_dist: 0.05
( 39.) 5.1,3.4,1.5,0.2                           -->  c_dist: 0.05         r_dist: 0.05
(  4.) 5,3.6,1.4,0.2                             -->  c_dist: 0.071        r_dist: 0.05
( 27.) 5.2,3.5,1.5,0.2                           -->  c_dist: 0.053        r_dist: 0.05
( 28.) 5.2,3.4,1.4,0.2                           -->  c_dist: 0.058        r_dist: 0.05
(  7.) 5,3.4,1.5,0.2                             -->  c_dist: 0.058        r_dist: 0.05
( 40.) 5,3.5,1.3,0.3                             -->  c_dist: 0.068        r_dist: 0.053
( 49.) 5,3.3,1.4,0.2                             -->  c_dist: 0.069        r_dist: 0.053
( 11.) 4.8,3.4,1.6,0.2                           -->  c_dist: 0.077        r_dist: 0.058
( 35.) 5,3.2,1.2,0.2                             -->  c_dist: 0.083        r_dist: 0.069
( 26.) 5,3.4,1.6,0.4                             -->  c_dist: 0.085        r_dist: 0.073
( 20.) 5.4,3.4,1.7,0.2                           -->  c_dist: 0.09         r_dist: 0.075
( 24.) 4.8,3.4,1.9,0.2                           -->  c_dist: 0.107        r_dist: 0.077
(  6.) 4.6,3.4,1.4,0.3                           -->  c_dist: 0.103        r_dist: 0.077
( 34.) 4.9,3.1,1.5,0.1                           -->  c_dist: 0.053        r_dist: 0.083
( 12.) 4.8,3,1.4,0.1                             -->  c_dist: 0.053        r_dist: 0.053
( 37.) 4.9,3.1,1.5,0.1                           -->  c_dist: 0.053        r_dist: 0.053
(  9.) 4.9,3.1,1.5,0.1                           -->  c_dist: 0.053        r_dist: 0.053
( 30.) 4.8,3.1,1.6,0.2                           -->  c_dist: 0.053        r_dist: 0.053
( 29.) 4.7,3.2,1.6,0.2                           -->  c_dist: 0.053        r_dist: 0.053
(  2.) 4.7,3.2,1.3,0.2                           -->  c_dist: 0.071        r_dist: 0.053
(  3.) 4.6,3.1,1.5,0.2                           -->  c_dist: 0.06         r_dist: 0.053
( 47.) 4.6,3.2,1.4,0.2                           -->  c_dist: 0.058        r_dist: 0.053
(  1.) 4.9,3,1.4,0.2                             -->  c_dist: 0.06         r_dist: 0.053
( 42.) 4.4,3.2,1.3,0.2                           -->  c_dist: 0.083        r_dist: 0.058
( 25.) 5,3,1.6,0.2                               -->  c_dist: 0.067        r_dist: 0.06
( 45.) 4.8,3,1.4,0.3                             -->  c_dist: 0.083        r_dist: 0.06
( 38.) 4.4,3,1.3,0.2                             -->  c_dist: 0.083        r_dist: 0.077
( 13.) 4.3,3,1.1,0.1                             -->  c_dist: 0.123        r_dist: 0.083
(  8.) 4.4,2.9,1.4,0.2                           -->  c_dist: 0.126        r_dist: 0.083
( 23.) 5.1,3.3,1.7,0.5                           -->  c_dist: 0.128        r_dist: 0.085
( 48.) 5.3,3.7,1.5,0.2                           -->  c_dist: 0.088        r_dist: 0.088
( 10.) 5.4,3.7,1.5,0.2                           -->  c_dist: 0.1          r_dist: 0.088
( 19.) 5.1,3.8,1.5,0.3                           -->  c_dist: 0.081        r_dist: 0.088
( 21.) 5.1,3.7,1.5,0.4                           -->  c_dist: 0.095        r_dist: 0.081
( 44.) 5.1,3.8,1.9,0.4                           -->  c_dist: 0.099        r_dist: 0.081
( 46.) 5.1,3.8,1.6,0.2                           -->  c_dist: 0.095        r_dist: 0.081
( 36.) 5.5,3.5,1.3,0.2                           -->  c_dist: 0.095        r_dist: 0.09
( 31.) 5.4,3.4,1.5,0.4                           -->  c_dist: 0.103        r_dist: 0.09
( 43.) 5,3.5,1.6,0.6                             -->  c_dist: 0.132        r_dist: 0.093
(  5.) 5.4,3.9,1.7,0.4                           -->  c_dist: 0.108        r_dist: 0.099
( 18.) 5.7,3.8,1.7,0.3                           -->  c_dist: 0.129        r_dist: 0.108
( 16.) 5.4,3.9,1.3,0.4                           -->  c_dist: 0.123        r_dist: 0.108
( 22.) 4.6,3.6,1,0.2                             -->  c_dist: 0.143        r_dist: 0.115
( 14.) 5.8,4,1.2,0.2                             -->  c_dist: 0.168        r_dist: 0.129
( 32.) 5.2,4.1,1.5,0.1                           -->  c_dist: 0.164        r_dist: 0.136
( 33.) 5.5,4.2,1.4,0.2                           -->  c_dist: 0.154        r_dist: 0.154
( 15.) 5.7,4.4,1.5,0.4                           -->  c_dist: UNDEFINED    r_dist: 0.154
(100.) 6.3,3.3,6,2.5                             -->  c_dist: 0.153        r_dist: UNDEFINED   
(115.) 6.4,3.2,5.3,2.3                           -->  c_dist: 0.119        r_dist: 0.153
(136.) 6.3,3.4,5.6,2.4                           -->  c_dist: 0.127        r_dist: 0.119
(140.) 6.7,3.1,5.6,2.4                           -->  c_dist: 0.095        r_dist: 0.119
(120.) 6.9,3.2,5.7,2.3                           -->  c_dist: 0.108        r_dist: 0.095
(143.) 6.8,3.2,5.9,2.3                           -->  c_dist: 0.103        r_dist: 0.095
(145.) 6.7,3,5.2,2.3                             -->  c_dist: 0.114        r_dist: 0.095
(144.) 6.7,3.3,5.7,2.5                           -->  c_dist: 0.122        r_dist: 0.095
(124.) 6.7,3.3,5.7,2.1                           -->  c_dist: 0.13         r_dist: 0.103
(139.) 6.9,3.1,5.4,2.1                           -->  c_dist: 0.11         r_dist: 0.108
(102.) 7.1,3,5.9,2.1                             -->  c_dist: 0.144        r_dist: 0.11
(112.) 6.8,3,5.5,2.1                             -->  c_dist: 0.106        r_dist: 0.11
(104.) 6.5,3,5.8,2.2                             -->  c_dist: 0.114        r_dist: 0.106
(147.) 6.5,3,5.2,2                               -->  c_dist: 0.11         r_dist: 0.106
(141.) 6.9,3.1,5.1,2.3                           -->  c_dist: 0.11         r_dist: 0.11
(110.) 6.5,3.2,5.1,2                             -->  c_dist: 0.132        r_dist: 0.11
(116.) 6.5,3,5.5,1.8                             -->  c_dist: 0.11         r_dist: 0.11
(103.) 6.3,2.9,5.6,1.8                           -->  c_dist: 0.128        r_dist: 0.11
( 77.) 6.7,3,5,1.7                               -->  c_dist: 0.133        r_dist: 0.11
(137.) 6.4,3.1,5.5,1.8                           -->  c_dist: 0.119        r_dist: 0.11
(128.) 6.4,2.8,5.6,2.1                           -->  c_dist: 0.119        r_dist: 0.114
(132.) 6.4,2.8,5.6,2.2                           -->  c_dist: 0.141        r_dist: 0.114
(111.) 6.4,2.7,5.3,1.9                           -->  c_dist: 0.11         r_dist: 0.119
(123.) 6.3,2.7,4.9,1.8                           -->  c_dist: 0.123        r_dist: 0.11
(146.) 6.3,2.5,5,1.9                             -->  c_dist: 0.163        r_dist: 0.11
(126.) 6.2,2.8,4.8,1.8                           -->  c_dist: 0.117        r_dist: 0.117
(127.) 6.1,3,4.9,1.8                             -->  c_dist: 0.102        r_dist: 0.117
(138.) 6,3,4.8,1.8                               -->  c_dist: 0.1          r_dist: 0.102
(149.) 5.9,3,5.1,1.8                             -->  c_dist: 0.128        r_dist: 0.1
( 70.) 5.9,3.2,4.8,1.8                           -->  c_dist: 0.131        r_dist: 0.1
(148.) 6.2,3.4,5.4,2.3                           -->  c_dist: 0.175        r_dist: 0.119
( 83.) 6,2.7,5.1,1.6                             -->  c_dist: 0.129        r_dist: 0.12
(133.) 6.3,2.8,5.1,1.5                           -->  c_dist: 0.13         r_dist: 0.129
(134.) 6.1,2.6,5.6,1.4                           -->  c_dist: 0.193        r_dist: 0.129
( 54.) 6.5,2.8,4.6,1.5                           -->  c_dist: 0.103        r_dist: 0.13
( 58.) 6.6,2.9,4.6,1.3                           -->  c_dist: 0.097        r_dist: 0.103
( 74.) 6.4,2.9,4.3,1.3                           -->  c_dist: 0.106        r_dist: 0.097
( 75.) 6.6,3,4.4,1.4                             -->  c_dist: 0.083        r_dist: 0.097
( 65.) 6.7,3.1,4.4,1.4                           -->  c_dist: 0.103        r_dist: 0.083
( 86.) 6.7,3.1,4.7,1.5                           -->  c_dist: 0.099        r_dist: 0.083
( 76.) 6.8,2.8,4.8,1.4                           -->  c_dist: 0.136        r_dist: 0.097
( 52.) 6.9,3.1,4.9,1.5                           -->  c_dist: 0.11         r_dist: 0.099
( 51.) 6.4,3.2,4.5,1.5                           -->  c_dist: 0.11         r_dist: 0.099
( 50.) 7,3.2,4.7,1.4                             -->  c_dist: 0.148        r_dist: 0.102
( 97.) 6.2,2.9,4.3,1.3                           -->  c_dist: 0.084        r_dist: 0.106
( 63.) 6.1,2.9,4.7,1.4                           -->  c_dist: 0.093        r_dist: 0.084
( 71.) 6.1,2.8,4,1.3                             -->  c_dist: 0.112        r_dist: 0.084
( 91.) 6.1,3,4.6,1.4                             -->  c_dist: 0.097        r_dist: 0.084
( 78.) 6,2.9,4.5,1.5                             -->  c_dist: 0.106        r_dist: 0.093
( 73.) 6.1,2.8,4.7,1.2                           -->  c_dist: 0.123        r_dist: 0.093
( 61.) 5.9,3,4.2,1.5                             -->  c_dist: 0.108        r_dist: 0.097
( 66.) 5.6,3,4.5,1.5                             -->  c_dist: 0.11         r_dist: 0.108
( 96.) 5.7,2.9,4.2,1.3                           -->  c_dist: 0.066        r_dist: 0.108
( 55.) 5.7,2.8,4.5,1.3                           -->  c_dist: 0.106        r_dist: 0.066
( 88.) 5.6,3,4.1,1.3                             -->  c_dist: 0.094        r_dist: 0.066
( 95.) 5.7,3,4.2,1.2                             -->  c_dist: 0.106        r_dist: 0.066
( 99.) 5.7,2.8,4.1,1.3                           -->  c_dist: 0.073        r_dist: 0.066
( 82.) 5.8,2.7,3.9,1.2                           -->  c_dist: 0.09         r_dist: 0.073
( 94.) 5.6,2.7,4.2,1.3                           -->  c_dist: 0.086        r_dist: 0.073
( 90.) 5.5,2.6,4.4,1.2                           -->  c_dist: 0.107        r_dist: 0.086
( 92.) 5.8,2.6,4,1.2                             -->  c_dist: 0.095        r_dist: 0.088
( 67.) 5.8,2.7,4.1,1                             -->  c_dist: 0.114        r_dist: 0.09
( 89.) 5.5,2.5,4,1.3                             -->  c_dist: 0.094        r_dist: 0.094
( 53.) 5.5,2.3,4,1.3                             -->  c_dist: 0.141        r_dist: 0.094
( 69.) 5.6,2.5,3.9,1.1                           -->  c_dist: 0.089        r_dist: 0.094
( 80.) 5.5,2.4,3.8,1.1                           -->  c_dist: 0.099        r_dist: 0.089
( 81.) 5.5,2.4,3.7,1                             -->  c_dist: 0.141        r_dist: 0.089
( 79.) 5.7,2.6,3.5,1                             -->  c_dist: 0.119        r_dist: 0.094
( 64.) 5.6,2.9,3.6,1.3                           -->  c_dist: 0.12         r_dist: 0.094
( 84.) 5.4,3,4.5,1.5                             -->  c_dist: 0.144        r_dist: 0.11
( 56.) 6.3,3.3,4.7,1.6                           -->  c_dist: 0.146        r_dist: 0.11
( 59.) 5.2,2.7,3.9,1.4                           -->  c_dist: 0.154        r_dist: 0.126
( 72.) 6.3,2.5,4.9,1.5                           -->  c_dist: 0.145        r_dist: 0.13
( 85.) 6,3.4,4.5,1.6                             -->  c_dist: 0.181        r_dist: 0.131
(142.) 5.8,2.7,5.1,1.9                           -->  c_dist: 0.135        r_dist: 0.135
(101.) 5.8,2.7,5.1,1.9                           -->  c_dist: 0.135        r_dist: 0.135
(113.) 5.7,2.5,5,2                               -->  c_dist: 0.172        r_dist: 0.135
(121.) 5.6,2.8,4.9,2                             -->  c_dist: 0.148        r_dist: 0.135
( 68.) 6.2,2.2,4.5,1.5                           -->  c_dist: UNDEFINED    r_dist: 0.145
( 87.) 6.3,2.3,4.4,1.3                           -->  c_dist: 0.17         r_dist: 0.145
(130.) 7.4,2.8,6.1,1.9                           -->  c_dist: 0.155        r_dist: 0.148
(108.) 6.7,2.5,5.8,1.8                           -->  c_dist: UNDEFINED    r_dist: 0.151
(119.) 6,2.2,5,1.5                               -->  c_dist: UNDEFINED    r_dist: 0.151
(125.) 7.2,3.2,6,1.8                             -->  c_dist: 0.181        r_dist: 0.154
(105.) 7.6,3,6.6,2.1                             -->  c_dist: 0.164        r_dist: 0.155
(107.) 7.3,2.9,6.3,1.8                           -->  c_dist: 0.158        r_dist: 0.155
(122.) 7.7,2.8,6.7,2                             -->  c_dist: 0.16         r_dist: 0.155
(129.) 7.2,3,5.8,1.6                             -->  c_dist: 0.184        r_dist: 0.158
( 93.) 5,2.3,3.3,1                               -->  c_dist: 0.16         r_dist: 0.16
( 57.) 4.9,2.4,3.3,1                             -->  c_dist: 0.18         r_dist: 0.16
( 60.) 5,2,3.5,1                                 -->  c_dist: UNDEFINED    r_dist: 0.16
( 98.) 5.1,2.5,3,1.1                             -->  c_dist: 0.18         r_dist: 0.16
(118.) 7.7,2.6,6.9,2.3                           -->  c_dist: UNDEFINED    r_dist: 0.16
(135.) 7.7,3,6.1,2.3                             -->  c_dist: UNDEFINED    r_dist: 0.164
( 62.) 6,2.2,4,1                                 -->  c_dist: 0.173        r_dist: 0.17
(114.) 5.8,2.8,5.1,2.4                           -->  c_dist: UNDEFINED    r_dist: 0.179
(109.) 7.2,3.6,6.1,2.5                           -->  c_dist: UNDEFINED    r_dist: 0.199
(106.) 4.9,2.5,4.5,1.7                           -->  c_dist: UNDEFINED    r_dist: 0.2
(117.) 7.7,3.8,6.7,2.2                           -->  c_dist: UNDEFINED    r_dist: UNDEFINED   
(131.) 7.9,3.8,6.4,2                             -->  c_dist: UNDEFINED    r_dist: UNDEFINED   
( 41.) 4.5,2.3,1.3,0.3                           -->  c_dist: UNDEFINED    r_dist: UNDEFINED   



Time taken to build model (full training data) : 0.17 seconds

=== Model and evaluation on training set ===

Clustered Instances


Unclustered instances : 150

可以比DBSCAN传递出更多的层次化聚类的信息。

转载于:https://www.cnblogs.com/mrzhang123/p/5365813.html

你可能感兴趣的:(Weka数据挖掘——聚类)