K-means&PCA on handwritten digits

关键技术

  • PCA降维
  • K-means聚类
from time import time
import numpy as np
import matplotlib.pyplot as plt

from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

np.random.seed(7)

digits = load_digits()
data = scale(digits.data)
n_samples, n_features = data.shape
n_digits = len(np.unique(digits.target))
labels = digits.target
sample_size = 300

print("n_digits: %d, \t n_samples %d, \t n_features %d" % (n_digits, n_samples, n_features))

reduce_data = PCA(n_components=2).fit_transform(data)
kmeans = KMeans(init='k-means++', n_clusters=n_digits, n_init=10)
kmeans.fit(reduce_data)

plt.figure()
plt.clf()
colors = ['b', 'c', 'g', 'k', 'm', 'r', 'navy', 'y', 'darkorange', 'turquoise']
target_names = range(10)
centroids = kmeans.cluster_centers_
for (color, i, target_name) in zip(colors, target_names, target_names):
    plt.scatter(reduce_data[labels == i, 0], reduce_data[labels == i, 1], s=2, color=color, lw=2, label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.scatter(centroids[:, 0], centroids[:, 1], marker='v', s=100, linewidths=3, color=colors, zorder=10)
plt.show()

使用sklearn库发现非常简单
结果如下


clustering_on_hw_digits.png

虽然没有分的很开,但是相同数字还是聚合在一起的。可以使用其他聚类方法实现,效果会更好,后面介绍。

你可能感兴趣的:(K-means&PCA on handwritten digits)