层次聚类的具体实现 Hierarchical clustering implementation

Single linkage(nearest neighbor):两个cluster中最近的对象的距离为cluster之间的距离;

Complete linkage (furthest neighbor):两个cluster中最远的对象的距离为cluster之间的距离;

Group average linkage:两个cluster中对象的平均距离为cluster之间的距离;

Single-Link下面以为例讲解具体的实现算法

一直迭代:

    找到距离最近的两个cluster(i,j);

    把行i设为行i和行j的最小值;

    把列i设为列i和列j的最小值;

    如果dmin[i']==j,更改dmin[i‘];

具体的代码:

package snippet;

import java.util.Arrays;
import java.util.Vector;

public class Snippet {
	public static void main(String[] args) {
		int M = StdIn.readInt();
		int N = StdIn.readInt();
		// read in N vectors of dimension M
		Vector[] vectors = new Vector[N];
		String[] names  = new String[N];
		for (int i = 0; i < N; i++) {
			names[i] = StdIn.readString();
			double[] d = new double[M];
			for (int j = 0; j < M; j++)
				d[j] = StdIn.readDouble();
			vectors[i] = new Vector(Arrays.asList(d));
		}
		double INFINITY = Double.POSITIVE_INFINITY;
		double[][] d = new double[N][N];
		int[] dmin = new int[N];
		for (int i = 0; i < N; i++) {
			for (int j = 0; j < N; j++) {
				if (i == j) d[i][j] = INFINITY;
				else  d[i][j] = vectors[i].distanceTo(vectors[j]);
				if (d[i][j] < d[i][dmin[i]]) dmin[i] = j;
			}
		}
		
		for (int s = 0; s < N-1; s++) {
			// find closest pair of clusters (i1, i2)
			int i1 = 0;
			for (int i = 0; i < N; i++)
				if (d[i][dmin[i]] < d[i1][dmin[i1]]) i1 = i;
			int i2 = dmin[i1];
			// overwrite row i1 with minimum of entries in row i1 and i2
			for (int j = 0; j < N; j++)
				if (d[i2][j] < d[i1][j]) d[i1][j] = d[j][i1] = d[i2][j];
			d[i1][i1] = INFINITY;
			// infinity-out old row i2 and column i2
			for (int i = 0; i < N; i++)
				d[i2][i] = d[i][i2] = INFINITY;
			// update dmin and replace ones that previous pointed to
			// i2 to point to i1
			for (int j = 0; j < N; j++) {
				if (dmin[j] == i2) dmin[j] = i1;
				if (d[i1][j] < d[i1][dmin[i1]]) dmin[i1] = j;
			} 
		}
	}
}

 参考资料:www.cs.princeton.edu/courses/archive/spring10/cos233/lectures/cos233-234-lecture7.pdf

你可能感兴趣的:(算法,聚类)