kaggle入门(一) knn解决mnist

开发平台 google colab + python3.6

package: panads,sklearn

panads 用来处理csv文件 教程链接

sklearn 是python 机器学习中常用的第三方模块 教程链接

knn 讲解以及使用 sklearn的教程链接

还是和以前一样 先处理colab的文件夹挂载问题


from google.colab import drive
drive.mount('/content/drive/')

import os
os.chdir("/content/drive/My Drive/kaggle")

再导入要用的package (matplotlib和seaborn没有用到,习惯性导入这些)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
#将那些用matplotlib绘制的图显示在页面里而不是弹出一个窗口
%matplotlib inline   

np.random.seed(2)

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
import itertools

导入数据

train_data = pd.read_csv("mnist/train.csv")
test_data = pd.read_csv("mnist/test.csv")

x_train = train_data.values[:,1:]
y_train = train_data.values[:,0]
test_value = test_data.values

定义knn算法

def knnClassfiyer(value,lable):
    knnclf = KNeighborsClassifier()
    knnclf.fit(value,np.ravel(lable))
    return knnclf

训练和预测

knnclf = knnClassfiyer(x_train,y_train)
test_label = knnclf.predict(test_value)

 

以下是获取到的test_label(predict时间很长)

保存模型(kaggle需要交csv 包含ImageId,label)


test_label = pd.Series(test_label,name="Label")

submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),test_label],axis = 1)

submission.to_csv("mnist/Result_sklearn_KNN.csv",index=False)

 

使用kaggle api 提交 (需要先将kaggle.json 放入root下,可以参考colab和kaggle使用) 

!cp /content/drive/'My Drive'/kaggle/kaggle.json /root/.kaggle
!kaggle competitions submit -c digit-recognizer -f mnist/Result_sklearn_KNN.csv -m "forth submit"

运行完会有Successfully submitted to Digit Recognizer,再到kaggle 中My submissions 中查看

 

 

你可能感兴趣的:(机器学习,kaggle)