机器学习——fashion_mnist

机器学习——fashion_mnist

这里写目录标题

  • 机器学习——fashion_mnist
  • 数据
  • PCA
  • model+predicted

数据

本实验使用Fashion-MNIST数据集,包括t-shirt(T恤),trouser(牛仔裤),pullover(套衫)等在内的10个类别的图像共计70000张。

#读取训练集、测试集的数据和标签
train_images, train_labels = load_mnist(r"E:\code\jupyter\fashion-mnist",kind='train')
test_images, test_labels = load_mnist(r'E:\code\jupyter\fashion-mnist',kind='t10k')
train_images = train_images / 255.0
test_images = test_images / 255.0

print("Train set: {} images".format(len(train_images)))
print("Val set: {} images\n".format(len(test_images)))

在这里插入图片描述

#特征值归一化处理
scaler = StandardScaler()
scaler.fit(train_images_pca)
train_images_pca = scaler.transform(train_images_pca)
scaler.fit(test_images_pca)
test_images_pca = scaler.transform(test_images_pca)
train_images = np.reshape(train_images, (train_images.shape[0], -1))
test_images = np.reshape(test_images, (test_images.shape[0], -1))
# 画出特征个数和所携带信息数的曲线图
explained_variance_ratio = []
for i in range(1,300): 
    pca = PCA(n_components=i).fit(train_images)
    explained_variance_ratio.append(pca.explained_variance_ratio_.sum())
plt.plot(range(1,300),explained_variance_ratio)
plt.show()

机器学习——fashion_mnist_第1张图片

PCA

#选择维度200,pca
pca = PCA(n_components = 200)
pca.fit(train_images)
train_images_pca = pca.transform(train_images)
test_images_pca = pca.transform(test_images)
#划分训练集和测试集
train_x, test_x, train_y, test_y = train_test_split(train_images_pca, train_labels, random_state=1, train_size = 0.8)

model+predicted

#使用svm、mlp和randomforest三种model进行训练
svc = svm.SVC(kernel = 'rbf', C = 1)
svc.fit(train_x, train_y)

mlp = MLPClassifier(solver = 'lbfgs', activation = 'tanh', alpha = 1e-5, hidden_layer_sizes = [100, 100])
mlp.fit(train_x, train_y)

rdf = RandomForestClassifier(n_estimators = 500, max_depth = 12, random_state = 2)
rdf.fit(train_x, train_y)
result = pd.DataFrame()
result['index'] = ['score_train', 'score_test', 'score_train_pca', 'score_test_pca']
result['svc'] = [svc.score(train_x, train_y), svc.score(test_x, test_y), svc.score(train_images_pca, train_labels), svc.score(test_images_pca, test_labels)]
result['mlp'] = [mlp.score(train_x, train_y), mlp.score(test_x, test_y), mlp.score(train_images_pca, train_labels), mlp.score(test_images_pca, test_labels)]
result['rdf'] = [rdf.score(train_x, train_y), rdf.score(test_x, test_y), rdf.score(train_images_pca, train_labels), rdf.score(test_images_pca, test_labels)]
result

机器学习——fashion_mnist_第2张图片
svc在train上自验结果最低,但是test结果最好;mlp在train上可能存在过拟合现象;rdf整体表现一般,但是运行速度快

你可能感兴趣的:(机器学习,tensorflow,python)