Intro to DBSCAN

Intro to DBSCAN

DBSCAN

Density-Based Spatial Clustering of Application with Noise

It can discover cluster of arbitrary shape

A cluster is defined as a maximal set of density-connected points

Two parameters

Eps: Maximun radius of the neighbourhood

MinPts: Minimum number of points in the Eps-Neighbourhood of a point.

Suppose we have a point q, with the pre-determined parameters. If the number of neighbourhood within the Eps, which is

, is larger than the value of MinPts, we say this point is a core.

Three types of points

Core point: dense neighborhood

Border point: neighbourhood is not dense(

less than MinPts) but in the cluster, or can be reached by other cluster(direct density reachable from a core point)

Noise/Outlier: not in a cluster and also cannot be reached by other cluster.

Directly density-reachable: A point p is directly density-reachable from q if:

p belongs to 

q itself is a core point: 

Density-reachable

A point p is density-reachable from a point q if there is a chain of points p1,...pn, s.t p1=q, pn=p and pi+1 is directly density-reachable from pi

Density-connected

A point is density-connected to a point q if there is a point o such that both p and q are density-reachable from o. Even if both p and q can be a border, they could be in the same cluster as long as there is a point o that it is density-reachable to p and q.

Algorithm

Arbitrarily select a point p.

Retrieve all points density-reachable from p under the constrain of Eps and MinPts.

if p is a core point, a cluster is formed that the border is also found.

if p is a border, no points are density-reachable from p. Then p is a noise or outlier, DBSCAN just skips to the next point.


Continue the process until all the points have been processed.

But DBSCAN is sensitive to the setting of Eps and MinPts.

你可能感兴趣的:(Intro to DBSCAN)