均值漂移聚类算法_聚类算法-均值漂移算法

均值漂移聚类算法_聚类算法-均值漂移算法_第1张图片

均值漂移聚类算法

聚类算法-均值漂移算法 (Clustering Algorithms - Mean Shift Algorithm)

均值漂移算法简介 (Introduction to Mean-Shift Algorithm)

As discussed earlier, it is another powerful clustering algorithm used in unsupervised learning. Unlike K-means clustering, it does not make any assumptions; hence it is a non-parametric algorithm.

如前所述,它是在无监督学习中使用的另一种强大的聚类算法。 与K均值聚类不同,它没有做任何假设; 因此它是一种非参数算法。

Mean-shift algorithm basically assigns the datapoints to the clusters iteratively by shifting points towards the highest density of datapoints i.e. cluster centroid.

均值漂移算法基本上是通过将数据点移向最高密度的数据点(即群集质心)来迭代地将数据点分配给群集。

The difference between K-Means algorithm and Mean-Shift is that later one does not need to specify the number of clusters in advance because the number of clusters will be determined by the algorithm w.r.t data.

K-Means算法和Mean-Shift算法之间的区别在于,后一种算法无需提前指定聚类数,因为聚类数将由算法的数据确定。

均值漂移算法的工作 (Working of Mean-Shift Algorithm)

We can understand the working of Mean-Shift clustering algorithm with the help of following steps −

通过以下步骤,我们可以了解Mean-Shift聚类算法的工作原理:

  • Step 1 − First, start with the data points assigned to a cluster of their own.

    步骤1-首先,从分配给它们自己的群集的数据点开始。

  • Step 2 − Next, this algorithm will compute the centroids.

    步骤2-接下来,此算法将计算质心。

  • Step 3 − In this step, location of new centroids will be updated.

    步骤3-在此步骤中,新质心的位置将被更新。

  • Step 4 − Now, the process will be iterated and moved to the higher density region.

    步骤4-现在,该过程将被迭代并移至更高密度的区域。

  • Step 5 − At last, it will be stopped once the centroids reach at position from where it cannot move further.

    步骤5-最后,一旦质心到达无法继续移动的位置,它将停止。

用Python实现 (Implementation in Python)

It is a simple example to understand how Mean-Shift algorithm works. In this example, we are going to first generate 2D dataset containing 4 different blobs and after that will apply Mean-Shift algorithm to see the result.

这是一个了解均值漂移算法工作原理的简单示例。 在此示例中,我们将首先生成包含4个不同Blob的2D数据集,然后将应用Mean-Shift算法查看结果。


%matplotlib inline
import numpy as np
from sklearn.cluster import MeanShift
import matplotlib.pyplot as plt
from matplotlib import style
style.use("ggplot")
from sklearn.datasets.samples_generator import make_blobs
centers = [[3,3,3],[4,5,5],[3,10,10]]
X, _ = make_blobs(n_samples = 700, centers = centers, cluster_std = 0.5)
plt.scatter(X[:,0],X[:,1])
plt.show()

均值漂移聚类算法_聚类算法-均值漂移算法_第2张图片

ms = MeanShift()
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_
print(cluster_centers)
n_clusters_ = len(np.unique(labels))
print("Estimated clusters:", n_clusters_)
colors = 10*['r.','g.','b.','c.','k.','y.','m.']
for i in range(len(X)):
    plt.plot(X[i][0], X[i][1], colors[labels[i]], markersize = 3)
plt.scatter(cluster_centers[:,0],cluster_centers[:,1],
    marker=".",color='k', s=20, linewidths = 5, zorder=10)
plt.show()

Output

输出量


[[ 2.98462798 9.9733794 10.02629344]
[ 3.94758484 4.99122771 4.99349433]
[ 3.00788996 3.03851268 2.99183033]]
Estimated clusters: 3

均值漂移聚类算法_聚类算法-均值漂移算法_第3张图片

的优点和缺点 (Advantages and Disadvantages)

优点 (Advantages)

The following are some advantages of Mean-Shift clustering algorithm −

以下是Mean-Shift聚类算法的一些优点-

  • It does not need to make any model assumption as like in K-means or Gaussian mixture.

    它不需要像K-means或高斯混合中那样做出任何模型假设。

  • It can also model the complex clusters which have nonconvex shape.

    它还可以对具有非凸形状的复杂簇进行建模。

  • It only needs one parameter named bandwidth which automatically determines the number of clusters.

    它只需要一个名为带宽的参数即可自动确定群集数。

  • There is no issue of local minima as like in K-means.

    像K-means一样,没有局部最小值的问题。

  • No problem generated from outliers.

    异常值不会产生任何问题。

缺点 (Disadvantages)

The following are some disadvantages of Mean-Shift clustering algorithm −

以下是Mean-Shift聚类算法的一些缺点-

Mean-shift algorithm does not work well in case of high dimension, where number of clusters changes abruptly.

在集群数量突然变化的高维情况下,均值漂移算法不能很好地工作。

  • We do not have any direct control on the number of clusters but in some applications, we need a specific number of clusters.

    我们无法直接控制集群的数量,但是在某些应用程序中,我们需要特定数量的集群。

  • It cannot differentiate between meaningful and meaningless modes.

    它无法区分有意义的模式和无意义的模式。

翻译自: https://www.tutorialspoint.com/machine_learning_with_python/clustering_algorithms_mean_shift_algorithm.htm

均值漂移聚类算法

你可能感兴趣的:(算法,聚类,python,java,机器学习)