机器学习实战ByMatlab(四)二分K-means算法

机器学习实战ByMatlab(四)二分K-means算法

http://blog.csdn.net/llp1992/article/details/45096063
前面我们在是实现K-means算法的时候,提到了它本身存在的缺陷:

1.可能收敛到局部最小值 
2.在大规模数据集上收敛较慢

对于上一篇博文最后说的,当陷入局部最小值的时候,处理方法就是多运行几次K-means算法,然后选择畸变函数J较小的作为最佳聚类结果。这样的说法显然不能让我们接受,我们追求的应该是一次就能给出接近最优的聚类结果。

其实K-means的缺点的根本原因就是:对K个质心的初始选取比较敏感。质心选取得不好很有可能就会陷入局部最小值。

基于以上情况,有人提出了二分K-means算法来解决这种情况,也就是弱化初始质心的选取对最终聚类效果的影响。

二分K-means算法

在介绍二分K-means算法之前我们先说明一个定义:SSE(Sum of Squared Error),也就是误差平方和,它是用来度量聚类效果的一个指标。其实SSE也就是我们在K-means算法中所说的畸变函数:


SSE计算的就是一个cluster中的每个点到质心的平方差,它可以度量聚类的好坏。显然SSE越小,说明聚类效果越好。

二分K-means算法的主要思想: 
首先将所有点作为一个簇,然后将该簇一分为二。之后选择能最大程度降低聚类代价函数(也就是误差平方和)的簇划分为两个簇。以此进行下去,直到簇的数目等于用户给定的数目k为止。

二分k均值算法的伪代码如下:

将所有数据点看成一个簇

    当簇数目小于k时

      对每一个簇

          计算总误差

          在给定的簇上面进行k-均值聚类(k=2)

          计算将该簇一分为二后的总误差

      选择使得误差最小的那个簇进行划分操作

Matlab 实现

<code class="language-matlab hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> <span class="hljs-title" style="box-sizing: border-box;">bikMeans</span></span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%%</span>
clc
clear
close all
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%%</span>
biK = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>;
biDataSet = load(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'testSet.txt'</span>);
<span class="hljs-matrix" style="box-sizing: border-box;">[row,col]</span> = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">size</span>(biDataSet);
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 存储质心矩阵</span>
biCentSet = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">zeros</span>(biK,col);
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 初始化设定cluster数量为1</span>
numCluster = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>;
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%第一列存储每个点被分配的质心,第二列存储点到质心的距离</span>
biClusterAssume = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">zeros</span>(row,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>);
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%初始化质心</span>
biCentSet(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,:) = mean(biDataSet)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span> = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:row
    biClusterAssume(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) = numCluster;
    biClusterAssume(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>) = distEclud(biDataSet(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>,:),biCentSet(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,:));
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">while</span> numCluster < biK
    minSSE = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span>;
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%寻找对哪个cluster进行划分最好,也就是寻找SSE最小的那个cluster</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span> = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:numCluster
        curCluster = biDataSet(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">find</span>(biClusterAssume(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) == <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span>),:);
        <span class="hljs-matrix" style="box-sizing: border-box;">[spiltCentSet,spiltClusterAssume]</span> = kMeans(curCluster,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>);
        spiltSSE = sum(spiltClusterAssume(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>));
        noSpiltSSE = sum(biClusterAssume(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">find</span>(biClusterAssume(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>)~=<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span>),<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>));
        curSSE = spiltSSE + noSpiltSSE;
        fprintf(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'第%d个cluster被划分后的误差为:%f \n'</span> , <span class="hljs-matrix" style="box-sizing: border-box;">[j, curSSE]</span>)
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (curSSE < minSSE)
            minSSE = curSSE;
            bestClusterToSpilt = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span>;
            bestClusterAssume = spiltClusterAssume;
            bestCentSet = spiltCentSet;
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
    bestClusterToSpilt
    bestCentSet
     <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%更新cluster的数目</span>
    numCluster = numCluster + <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>;
    bestClusterAssume(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">find</span>(bestClusterAssume(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>),<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) = bestClusterToSpilt;
    bestClusterAssume(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">find</span>(bestClusterAssume(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>),<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) = numCluster;
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 更新和添加质心坐标</span>
    biCentSet(bestClusterToSpilt,:) = bestCentSet(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,:);
    biCentSet(numCluster,:) = bestCentSet(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,:);
    biCentSet
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 更新被划分的cluster的每个点的质心分配以及误差</span>
    biClusterAssume(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">find</span>(biClusterAssume(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) == bestClusterToSpilt),:) = bestClusterAssume;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>

figure
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%scatter(dataSet(:,1),dataSet(:,2),5)</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span> = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:biK
    pointCluster = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">find</span>(biClusterAssume(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) == <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>);
    scatter(biDataSet(pointCluster,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>),biDataSet(pointCluster,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>),<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>)
    hold on
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">%hold on</span>
scatter(biCentSet(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>),biCentSet(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>),<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">300</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'+'</span>)
hold off

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 计算欧式距离</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> <span class="hljs-title" style="box-sizing: border-box;">dist</span> = <span class="hljs-title" style="box-sizing: border-box;">distEclud</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(vecA,vecB)</span></span>
    dist  = sum(power((vecA-vecB),<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>));
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% K-means算法</span>
<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">function</span> <span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">[centSet,clusterAssment]</span> = <span class="hljs-title" style="box-sizing: border-box;">kMeans</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(dataSet,K)</span></span>

<span class="hljs-matrix" style="box-sizing: border-box;">[row,col]</span> = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">size</span>(dataSet);
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 存储质心矩阵</span>
centSet = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">zeros</span>(K,col);
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 随机初始化质心</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>= <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:col
    minV = min(dataSet(:,<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>));
    rangV = max(dataSet(:,<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>)) - minV;
    centSet(:,<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>) = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">repmat</span>(minV,<span class="hljs-matrix" style="box-sizing: border-box;">[K,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]</span>) + rangV*<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">rand</span>(K,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>);
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 用于存储每个点被分配的cluster以及到质心的距离</span>
clusterAssment = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">zeros</span>(row,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>);
clusterChange = true;
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">while</span> clusterChange
    clusterChange = false;
    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 计算每个点应该被分配的cluster</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span> = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:row
        <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 这部分可能可以优化</span>
        minDist = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span>;
        minIndex = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>;
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span> = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:K
            distCal = distEclud(dataSet(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>,:) , centSet(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span>,:));
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> (distCal < minDist)
                minDist = distCal;
                minIndex = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span>;
            <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> minIndex ~= clusterAssment(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>)            
            clusterChange = true;
        <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
        clusterAssment(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) = minIndex;
        clusterAssment(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">i</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>) = minDist;
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">% 更新每个cluster 的质心</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span> = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:K
        simpleCluster = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">find</span>(clusterAssment(:,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) == <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span>);
        centSet(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">j</span>,:) = mean(dataSet(<span class="hljs-transposed_variable" style="box-sizing: border-box;">simpleCluster'</span>,:));
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li></ul>

算法迭代过程如下

biCentSet =

-0.1036    0.0543
     0         0
     0         0
     0         0

第1个cluster被划分后的误差为:792.916857

bestClusterToSpilt =

 1

bestCentSet =

   -0.2897   -2.8394
    0.0825    2.9480

biCentSet =

   -0.2897   -2.8394
    0.0825    2.9480
     0         0
     0         0

第1个cluster被划分后的误差为:409.871545 
第2个cluster被划分后的误差为:532.999616

bestClusterToSpilt =

 1

bestCentSet =

   -3.3824   -2.9473
    2.8029   -2.7315

biCentSet =

   -3.3824   -2.9473
    0.0825    2.9480
    2.8029   -2.7315
     0         0

第1个cluster被划分后的误差为:395.669052 
第2个cluster被划分后的误差为:149.954305 
第3个cluster被划分后的误差为:393.431098

bestClusterToSpilt =

 2

bestCentSet =

2.6265    3.1087
-2.4615    2.7874

biCentSet =

   -3.3824   -2.9473
    2.6265    3.1087
    2.8029   -2.7315
   -2.4615    2.7874

最终效果图


机器学习实战ByMatlab(四)二分K-means算法_第1张图片

运用二分K-means算法进行聚类的时候,不同的初始质心聚类结果还是会稍微有点不同,因为实际上这也只是弱化随机质心对聚类结果的影响而已,并不能消除其影响,不过最终还是能收敛到全局最小。

你可能感兴趣的:(机器学习)