CS231N2017_assignment1_knn作业记录

在下小白一枚,最近正在学习cs231n课程,希望能够一点点把课程配套的作业做完。本文中若有错误之处,欢迎大家指正。

准备工作

环境: anaconda、windows、python3
课程主页:http://cs231n.stanford.edu/syllabus.html
笔记、作业中文翻译:https://zhuanlan.zhihu.com/p/21930884
中文字幕课程视频:https://www.bilibili.com/video/av58778425
(因为课程在网易云下架,所以找了这个)

下载作业原代码:http://cs231n.github.io/assignments2017/assignment1/
在 Working locally下找到Get the code as a zip file here.点击即可。

开始做作业

1.载入数据集

1.1 下载CIFAR-10数据集(大小约163M):http://www.cs.toronto.edu/~kriz/cifar.html
解压到cs231n/datasets目录下。

1.2 打开anaconda prompt,输入 jupyter notebook ,回车。
CS231N2017_assignment1_knn作业记录_第1张图片
进入jupyter界面,分别打开knn.ipynb和k_nearest_neighbor.py.CS231N2017_assignment1_knn作业记录_第2张图片

2.实现代码分析

注:下文中正常表示knn.ipynb中代码,---->表示进入k_nearest_neighbor.py中代码。

2.1 加载数据,输出训练集和测试集信息
CS231N2017_assignment1_knn作业记录_第3张图片
显示结果:
在这里插入图片描述
训练集共50000张图片,尺寸为32×32×3;
测试集共10000张图片,尺寸为32×32×3.

2.2 从数据集中显示部分示例
CS231N2017_assignment1_knn作业记录_第4张图片
其中:
np.flatnonzero(a):返回数组a中非零元素的索引,上述代码中a为表达式 y_train==y.
函数举例:
CS231N2017_assignment1_knn作业记录_第5张图片
numpy.random.choice(a, size=None, replace=True, p=None):从一维数组a中以概率p抽取元素,形成size形状新数组,replace表示是否可以重用元素,默认为False.
函数举例:
CS231N2017_assignment1_knn作业记录_第6张图片
2.3 调整数据集大小
为了后面代码运行更有效率,随机采样数据集,训练集采样5000张,测试集采样500张:
CS231N2017_assignment1_knn作业记录_第7张图片
将32×32×3的图片reshape成一行:
在这里插入图片描述
显示结果:
在这里插入图片描述
其中:
numpy.reshape(a, newshape, order=‘C’):不改变数组元素,返回一个newshape形状的新数组,原数组不变。
若a的形状不确定,那么在想把a转换成1行但不知道设置多少列时,可以将列数用 -1 表示,即可将newshape传入(1,-1)。同理,想转换成一列,可传入(-1,1)。

2.4 导入k近邻分类器模块CS231N2017_assignment1_knn作业记录_第8张图片
2.5 用两层循环计算L2距离
L2距离计算公式为:
CS231N2017_assignment1_knn作业记录_第9张图片
在这里插入图片描述
---->
CS231N2017_assignment1_knn作业记录_第10张图片
其中若提示:
NoModuleError:No module named ‘past’,需要安装future包,在anaconda的environment里安装。或者直接用pip install future.

2.5 可视化距离矩阵
CS231N2017_assignment1_knn作业记录_第11张图片
矩阵中的值由2.4部分计算得到,若未修改代码则出现的是一幅全黑的图。

2.6 knn不需要训练,直接进行预测。
设置k=1
CS231N2017_assignment1_knn作业记录_第12张图片
---->
CS231N2017_assignment1_knn作业记录_第13张图片
显示结果:准确率 0.274000
其中:
numpy.argsort(a, axis=-1, kind=‘quicksort’, order=None)
排序,得到前k个近邻的标签,然后得到投票结果
numpy.bincount(x, weights=None, minlength=0)
对一维数组x统计各值出现的次数,x里必须是非负整数
numpy.argmax(a, axis=None, out=None)返回最大值

若出现:ValueError: object toodeep for desired array,关闭重启

设置k=5,进行比较
CS231N2017_assignment1_knn作业记录_第14张图片
显示结果:准确率 0.278000
准确率比k=1时稍有提升。

2.7 用一层循环计算L2距离
CS231N2017_assignment1_knn作业记录_第15张图片
---->
CS231N2017_assignment1_knn作业记录_第16张图片
2.8 不使用循环计算L2距离
CS231N2017_assignment1_knn作业记录_第17张图片
---->
CS231N2017_assignment1_knn作业记录_第18张图片
代码中的矩阵计算原理可参考:https://blog.csdn.net/geekmanong/article/details/51524402

np.dot(a,b):对a,b两个数组做点乘,即元素对应相乘。
np.newaxis:将数组转置。

2.9 比较三种计算L2距离的算法运算时间
CS231N2017_assignment1_knn作业记录_第19张图片
2.10 交叉验证
将测试集分成五等分,执行5次kNN算法,将其中一份数据作为验证集,其他作为测试集,同时观察不同的k下的分类精度。

CS231N2017_assignment1_knn作业记录_第20张图片
注:
关于np.stack、np.vstack、np.hstack函数的详细说明,请参照:https://blog.csdn.net/csdn15698845876/article/details/73380803

输出结果:
Got 263 / 500 correct => accuracy: 0.263000
Got 257 / 500 correct => accuracy: 0.257000
Got 264 / 500 correct => accuracy: 0.264000
Got 278 / 500 correct => accuracy: 0.278000
Got 266 / 500 correct => accuracy: 0.266000
Got 239 / 500 correct => accuracy: 0.239000
Got 249 / 500 correct => accuracy: 0.249000
Got 240 / 500 correct => accuracy: 0.240000
Got 266 / 500 correct => accuracy: 0.266000
Got 254 / 500 correct => accuracy: 0.254000
Got 248 / 500 correct => accuracy: 0.248000
Got 266 / 500 correct => accuracy: 0.266000
Got 280 / 500 correct => accuracy: 0.280000
Got 292 / 500 correct => accuracy: 0.292000
Got 280 / 500 correct => accuracy: 0.280000
Got 262 / 500 correct => accuracy: 0.262000
Got 282 / 500 correct => accuracy: 0.282000
Got 273 / 500 correct => accuracy: 0.273000
Got 290 / 500 correct => accuracy: 0.290000
Got 273 / 500 correct => accuracy: 0.273000
Got 265 / 500 correct => accuracy: 0.265000
Got 296 / 500 correct => accuracy: 0.296000
Got 276 / 500 correct => accuracy: 0.276000
Got 284 / 500 correct => accuracy: 0.284000
Got 280 / 500 correct => accuracy: 0.280000
Got 260 / 500 correct => accuracy: 0.260000
Got 295 / 500 correct => accuracy: 0.295000
Got 279 / 500 correct => accuracy: 0.279000
Got 283 / 500 correct => accuracy: 0.283000
Got 280 / 500 correct => accuracy: 0.280000
Got 252 / 500 correct => accuracy: 0.252000
Got 289 / 500 correct => accuracy: 0.289000
Got 278 / 500 correct => accuracy: 0.278000
Got 282 / 500 correct => accuracy: 0.282000
Got 274 / 500 correct => accuracy: 0.274000
Got 270 / 500 correct => accuracy: 0.270000
Got 279 / 500 correct => accuracy: 0.279000
Got 279 / 500 correct => accuracy: 0.279000
Got 282 / 500 correct => accuracy: 0.282000
Got 285 / 500 correct => accuracy: 0.285000
Got 271 / 500 correct => accuracy: 0.271000
Got 288 / 500 correct => accuracy: 0.288000
Got 278 / 500 correct => accuracy: 0.278000
Got 269 / 500 correct => accuracy: 0.269000
Got 266 / 500 correct => accuracy: 0.266000
Got 256 / 500 correct => accuracy: 0.256000
Got 270 / 500 correct => accuracy: 0.270000
Got 263 / 500 correct => accuracy: 0.263000
Got 256 / 500 correct => accuracy: 0.256000
Got 263 / 500 correct => accuracy: 0.263000
k = 1, accuracy = 0.263000
k = 1, accuracy = 0.257000
k = 1, accuracy = 0.264000
k = 1, accuracy = 0.278000
k = 1, accuracy = 0.266000
k = 3, accuracy = 0.239000
k = 3, accuracy = 0.249000
k = 3, accuracy = 0.240000
k = 3, accuracy = 0.266000
k = 3, accuracy = 0.254000
k = 5, accuracy = 0.248000
k = 5, accuracy = 0.266000
k = 5, accuracy = 0.280000
k = 5, accuracy = 0.292000
k = 5, accuracy = 0.280000
k = 8, accuracy = 0.262000
k = 8, accuracy = 0.282000
k = 8, accuracy = 0.273000
k = 8, accuracy = 0.290000
k = 8, accuracy = 0.273000
k = 10, accuracy = 0.265000
k = 10, accuracy = 0.296000
k = 10, accuracy = 0.276000
k = 10, accuracy = 0.284000
k = 10, accuracy = 0.280000
k = 12, accuracy = 0.260000
k = 12, accuracy = 0.295000
k = 12, accuracy = 0.279000
k = 12, accuracy = 0.283000
k = 12, accuracy = 0.280000
k = 15, accuracy = 0.252000
k = 15, accuracy = 0.289000
k = 15, accuracy = 0.278000
k = 15, accuracy = 0.282000
k = 15, accuracy = 0.274000
k = 20, accuracy = 0.270000
k = 20, accuracy = 0.279000
k = 20, accuracy = 0.279000
k = 20, accuracy = 0.282000
k = 20, accuracy = 0.285000
k = 50, accuracy = 0.271000
k = 50, accuracy = 0.288000
k = 50, accuracy = 0.278000
k = 50, accuracy = 0.269000
k = 50, accuracy = 0.266000
k = 100, accuracy = 0.256000
k = 100, accuracy = 0.270000
k = 100, accuracy = 0.263000
k = 100, accuracy = 0.256000
k = 100, accuracy = 0.263000

2.11 根据平均精度,画出k-精度图
CS231N2017_assignment1_knn作业记录_第21张图片
显示结果:
CS231N2017_assignment1_knn作业记录_第22张图片
可以看出,k=10时平均准确率最好。

2.12 选取最好的k值进行分类
令k=10
CS231N2017_assignment1_knn作业记录_第23张图片
输出结果:
在这里插入图片描述

3 总结

从上述结果中可以发现,即使是最好情况下, KNN算法的识别准确率也只有30%, 因而, 一般不用来做图像分类。

参考:
1.https://blog.csdn.net/u014485485/article/details/79433514
2.https://blog.csdn.net/zhyh1435589631/article/details/54236643
3.https://blog.csdn.net/csdn15698845876/article/details/73380803
4.https://blog.csdn.net/geekmanong/article/details/51524402

你可能感兴趣的:(CS231N2017_assignment1_knn作业记录)