13. Clustering

Clustering

Unsupervised learing introduction

K-means algorithm

Input:

  • K (number of clusters)
  • Training set

K-means algroithm

Randomly initialize K cluster centroids

Repeat {
for to m
:=index (from 1 to K) of cluster centroid closest to
for to K
:=average (mean) of points assigned to cluster
}

K-means for non-separated clusters

Optimization Objective

K-means optimization objective

= index of cluster (1,2,...,K) to which example is currently assigned.
= cluster centriod ()
= cluster centroid of cluster to which example has been assigned

Distortion:

Random Initalization

  • Randomly pick K trianing examples.
  • Set equal to these K examples.

Random initialization many times:

For i=1 to 100 {
Randomly initialize K-means.
Run K-means. Get
Compute cost function (distortion)

}

Pick clustering that gave lowest cost J.

, if K is very large, Randomly initilization many times may help less.

Choosing the number of clusters

  • by hand (see the data set's figure)
  • Elbow method: compute cost J form 1-..., and draw a figure of it. choose the number at the elbow.
  • Evaluate K-means based on a metric for how well it performs for that later purpose. Choose a K that serve the purpose.

你可能感兴趣的:(13. Clustering)