前面写的JAVA版的KMEANS比较恶心,现在补上一个简单的python版本。
#kmeans import math def doKmeansCluster(data, cnum, itnum): c = data[:cnum] for time in range(itnum): groups = [[] for i in range(len(c))] for d in data: min = distance(d,c[0]) index = 0 for i in range(len(c)): dis = distance(d,c[i]) index = [index,i][dis<min] min = [min,dis][dis<min] groups[index].append(d) c = [] for g in groups: print g #transport the matrix, make all measure of the same demision in one same list trans = [[r[col] for r in g] for col in range(len(g[0]))] #get new center by sum and divide avg = [float(sum(trans[i]))/float(len(trans[0])) for i in range(len(trans))] c.append(avg) def distance(a, b): return math.sqrt(sum([math.pow(a[i]-b[i],2) for i in range(len(a))]))
只简单的测试了一下,没有考虑太多的约束,如果有心的话可以自己改写。