Mahout: An overview of clustering techniques

Different kinds of clustering problems

  • EXCLUSIVE CLUSTERING In exclusive clustering, an item belongs exclusively to one cluster, not several.
  • OVERLAPPING CLUSTERING What if we wanted to do non-exclusive clustering; that is, put Harry Potter not only in fiction but also in a young adult cluster as well as under fantasy. An overlapping clustering algorithm like fuzzy k-means achieves this easily. Moreover, fuzzy k-means also indicates the degree with which an object is associated with a cluster.


Mahout: An overview of clustering techniques_第1张图片
 

  • HIERARCHICAL CLUSTERING  Now, assume a situation where we have two clusters of books, one for fantasy and the other for space travel. Harry Potter is in the cluster of fantasy books, but these two clusters, space travel and fantasy, could be visualized as subclusters of fiction. Hence, we can construct a fiction cluster by merging these and other similar clusters.


Mahout: An overview of clustering techniques_第2张图片
 

  • PROBABILISTIC CLUSTERING A probabilistic model is usually a characteristic shape or a type of distribution of a set of points in an n-dimensional plane.


Mahout: An overview of clustering techniques_第3张图片

 

Different clustering approaches

  • FIXED NUMBER OF CENTERS These clustering methods fix the number of clusters ahead of time.
  • BOTTOM-UP APPROACH: FROM POINTS TO CLUSTERS VIA GROUPING


Mahout: An overview of clustering techniques_第4张图片
 

  • TOP-DOWN APPROACH: SPLITTING THE GIANT CLUSTER


Mahout: An overview of clustering techniques_第5张图片

你可能感兴趣的:(cluster)