作者: 寒小阳 &&龙心尘
时间:2015年11月。
出处:http://blog.csdn.net/han_xiaoyang/article/details/49949535
声明:版权所有,转载请注明出处,谢谢
1. 图像分类问题
这是很久以前就引起关注的一类图像相关问题。
对于一张输入的图片,要判定它属于给定的一些标签/类别中的哪一个。看似很简单的一个问题,这么多年却一直是计算机视觉的一个核心问题。应用场景也非常之多,它的重要性还体现在,其实其他的一些计算机视觉的问题(比如说物体识别、图像内容分割等)都可以基于它去完成。
举个例子说说这个问题哈。
计算机拿到如下的一张图片,然后需要给出它对应{猫,狗,帽子,杯子}4类的概率。人类是灰常牛逼的生物,我们一样就能看出这是猫。but对计算机而言,他们是没办法像人一样『看』到整张图片的。对它而言,这是一个3维的大矩阵,包含248*400个像素点,每个像素点又有红绿蓝(RGB)3个颜色通道的值(每个值在0/黑-255/白之间),计算机就得根据这248*400*3=297600个值去判定这张图片是『猫』
1.1 图像识别的困难
图像识别看似很直接。但实际上包含很多挑战,我们人类可是经过数亿年的进化才获得如此精准的视觉理解力的。图像识别可能有下面这样一些困难:
- 视角不同,每个事物旋转或者侧视最后的构图都完全不同
- 尺寸大小不统一,相同内容的图片也可大可小
- 变形,正所谓『千姿万态』,但都可能是一个东西
- 光影等干扰/幻象
- 背景干扰
- 同类内的差异(比如椅子有靠椅/吧椅/餐椅/躺椅…)
1.2 识别的途径
首先,大家想想就知道,这个算法并不像『对一个数组排序』『求有向图的最短路径』一样,是我们可以制定一个流程和规则直接解决的。让你定义一只猫,是一个很困难的事情。因此这类问题的解决途径和很多其他机器学习的方法是一样的,叫做『Data-driven approach/数据驱动法』
,我们每个类别都丢给计算机一些图片,让它去学习每一类的图片大概是长什么样的。就像小孩学习新鲜事物是一样的过程。就像下图中的猫/狗/杯子/帽子一样:
1.3图像分类的流程/Pipeline
整体的流程和普通机器学习一样,简单说来,也就下面三步:
* 输入:我们的给定K个类别的N张图片,作为计算机学习的训练集
* 学习:让计算机逐张图片地『观察』和『学习』
* 评估:就像我们上学学了东西要考试检测一样,我们也得考考计算机学得如何,于是我们给定一些计算机不知道类别的图片让它判别,然后再比对我们已知的正确答案。
2. 最近邻分类器(Nearest Neighbor Classifier)
先从简单的方法开始说,先提一提最近邻分类器/Nearest Neighbor Classifier,不过它和深度学习中的卷积神经网/Convolutional Neural Networks其实一点关系都没有,但是这是一个比较简单的实现方式。
2.1 CIFAR-10
CIFAR-10是一个非常常用的图像分类数据集。数据集包含60000张32*32像素的小图片,每张图片都有一个类别标注(总共有10类),分成了50000张的训练集和10000张的测试集。如下是一些图片示例:
上图中左边是十个类别和对应的一些示例图片,右边是给定一张图片后,根据像素距离计算出来的,最近的10张图片。
2.2 基于最近邻的简单图像类别判定
假如现在用CIFAR-10数据集做训练集,判断一张未知的图片属于CIFAR-10中的哪一类,应该怎么做呢。一个很直观的想法就是,既然我们现在有每个像素点的值,那我们就根据输入图片的这些值,计算和训练集中的图片距离,找最近的图片的类别,作为它的类别,不就行了吗。
恩,想法很直接哈,这就是『最近邻』的思想。但是,咳咳,需要提个醒的是,此场景下该方法准确度一般,比如说大家看看上图右边。其实只有3个图片的最近邻是正确的类目。
即使这样,这也是最常规的一个方法,还是说说他的实现。最简单的方式就是比对两个向量之间的l1距离(也叫曼哈顿距离/cityblock距离),公式如下:
<nobr><span class="math" id="MathJax-Span-1" style="width: 12.963em; display: inline-block;"><span style="display: inline-block; position: relative; width: 10.349em; height: 0px; font-size: 125%;"><span style="position: absolute; clip: rect(2.243em 1000em 4.909em -0.477em); top: -3.357em; left: 0.003em;"><span class="mrow" id="MathJax-Span-2"><span class="msubsup" id="MathJax-Span-3"><span style="display: inline-block; position: relative; width: 1.016em; height: 0px;"><span style="position: absolute; clip: rect(1.709em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-4" style="font-family: STIXGeneral-Italic;">d<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.536em;"><span class="mn" id="MathJax-Span-5" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">1</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-6" style="font-family: STIXGeneral-Regular;">(</span><span class="msubsup" id="MathJax-Span-7"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-8" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-9" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">1</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-10" style="font-family: STIXGeneral-Regular;">,</span><span class="msubsup" id="MathJax-Span-11" style="padding-left: 0.216em;"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-12" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-13" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-14" style="font-family: STIXGeneral-Regular;">)</span><span class="mo" id="MathJax-Span-15" style="font-family: STIXGeneral-Regular; padding-left: 0.323em;">=</span><span class="munderover" id="MathJax-Span-16" style="padding-left: 0.323em;"><span style="display: inline-block; position: relative; width: 1.336em; height: 0px;"><span style="position: absolute; clip: rect(1.869em 1000em 3.629em -0.371em); top: -2.984em; left: 0.003em;"><span class="mo" id="MathJax-Span-17" style="font-family: STIXSizeOneSym; vertical-align: -0.531em;">∑</span><span style="display: inline-block; width: 0px; height: 2.989em;"></span></span><span style="position: absolute; clip: rect(1.923em 1000em 2.829em -0.531em); top: -1.224em; left: 0.483em;"><span class="texatom" id="MathJax-Span-18"><span class="mrow" id="MathJax-Span-19"><span class="mi" id="MathJax-Span-20" style="font-size: 70.7%; font-family: STIXGeneral-Italic;">p</span></span></span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mrow" id="MathJax-Span-21" style="padding-left: 0.216em;"><span class="mo" id="MathJax-Span-22" style="vertical-align: 0.803em;"><span style="display: inline-block; position: relative; width: 0.269em; height: 0px;"><span style="position: absolute; font-family: STIXGeneral-Regular; top: -3.304em; left: 0.003em;">∣<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="position: absolute; font-family: STIXGeneral-Regular; top: -3.037em; left: 0.003em;">∣<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span></span></span><span class="msubsup" id="MathJax-Span-23"><span style="display: inline-block; position: relative; width: 0.963em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-24" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; clip: rect(1.923em 1000em 2.723em -0.531em); top: -2.877em; left: 0.483em;"><span class="mi" id="MathJax-Span-25" style="font-size: 70.7%; font-family: STIXGeneral-Italic;">p</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span><span style="position: absolute; clip: rect(1.763em 1000em 2.563em -0.424em); top: -2.077em; left: 0.376em;"><span class="mn" id="MathJax-Span-26" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">1</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-27" style="font-family: STIXGeneral-Regular; padding-left: 0.269em;">−</span><span class="msubsup" id="MathJax-Span-28" style="padding-left: 0.269em;"><span style="display: inline-block; position: relative; width: 0.963em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-29" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; clip: rect(1.923em 1000em 2.723em -0.531em); top: -2.877em; left: 0.483em;"><span class="mi" id="MathJax-Span-30" style="font-size: 70.7%; font-family: STIXGeneral-Italic;">p</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span><span style="position: absolute; clip: rect(1.763em 1000em 2.563em -0.477em); top: -2.077em; left: 0.376em;"><span class="mn" id="MathJax-Span-31" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-32" style="vertical-align: 0.803em;"><span style="display: inline-block; position: relative; width: 0.269em; height: 0px;"><span style="position: absolute; font-family: STIXGeneral-Regular; top: -3.304em; left: 0.003em;">∣<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="position: absolute; font-family: STIXGeneral-Regular; top: -3.037em; left: 0.003em;">∣<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span></span></span></span></span><span style="display: inline-block; width: 0px; height: 3.363em;"></span></span></span><span style="border-left-width: 0.003em; border-left-style: solid; display: inline-block; overflow: hidden; width: 0px; height: 3.07em; vertical-align: -1.797em;"></span></span></nobr>
<script type="math/tex; mode=display" id="MathJax-Element-1">d_1 (I_1, I_2) = \sum_{p} \left| I^p_1 - I^p_2 \right|</script>
其实就是计算了所有像素点之间的差值,然后做了加法,直观的理解如下图:
我们先把数据集读进内存:
import os
import sys
import numpy as np
def load_CIFAR_batch(filename):
""" cifar-10数据集是分batch存储的,这是载入单个batch @参数 filename: cifar文件名 @r返回值: X, Y: cifar batch中的 data 和 labels """
with open(filename, 'r') as f:
datadict=pickle.load(f)
X=datadict['data']
Y=datadict['labels']
X=X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
Y=np.array(Y)
return X, Y
def load_CIFAR10(ROOT):
""" 读取载入整个 CIFAR-10 数据集 @参数 ROOT: 根目录名 @return: X_train, Y_train: 训练集 data 和 labels X_test, Y_test: 测试集 data 和 labels """
xs=[]
ys=[]
for b in range(1,6):
f=os.path.join(ROOT, "data_batch_%d" % (b, ))
X, Y=load_CIFAR_batch(f)
xs.append(X)
ys.append(Y)
X_train=np.concatenate(xs)
Y_train=np.concatenate(ys)
del X, Y
X_test, Y_test=load_CIFAR_batch(os.path.join(ROOT, "test_batch"))
return X_train, Y_train, X_test, Y_test
X_train, Y_train, X_test, Y_test = load_CIFAR10('data/cifar10/')
Xtr_rows = X_train.reshape(X_train.shape[0], 32 * 32 * 3)
Xte_rows = X_test.reshape(X_test.shape[0], 32 * 32 * 3)
下面我们实现最近邻的思路:
class NearestNeighbor:
def __init__(self):
pass
def train(self, X, y):
""" 这个地方的训练其实就是把所有的已有图片读取进来 -_-|| """
self.Xtr = X
self.ytr = y
def predict(self, X):
""" 所谓的预测过程其实就是扫描所有训练集中的图片,计算距离,取最小的距离对应图片的类目 """
num_test = X.shape[0]
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
for i in xrange(num_test):
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
min_index = np.argmin(distances)
Ypred[i] = self.ytr[min_index]
return Ypred
nn = NearestNeighbor()
nn.train(Xtr_rows, Y_train)
Yte_predict = nn.predict(Xte_rows)
print 'accuracy: %f' % ( np.mean(Yte_predict == Y_test) )
最近邻的思想在CIFAR上得到的准确度为38.6%,我们知道10各类别,我们随机猜测的话准确率差不多是1/10=10%,所以还是有识别效果的。但是这距离人的识别准确率(94%)和深度学习/卷积神经网最新的识别结果(95%)还是要低很多。
2.3 关于最近邻的距离准则
我们这里用的距离准则是l1距离,实际上除掉l1距离,我们还有很多其他的距离准则。
- 比如说l2距离(也就是大家熟知的欧氏距离)的计算准则如下:
<nobr><span class="math" id="MathJax-Span-33" style="width: 15.416em; display: inline-block;"><span style="display: inline-block; position: relative; width: 12.323em; height: 0px; font-size: 125%;"><span style="position: absolute; clip: rect(1.069em 1000em 4.429em -0.477em); top: -2.557em; left: 0.003em;"><span class="mrow" id="MathJax-Span-34"><span class="msubsup" id="MathJax-Span-35"><span style="display: inline-block; position: relative; width: 1.016em; height: 0px;"><span style="position: absolute; clip: rect(1.709em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-36" style="font-family: STIXGeneral-Italic;">d<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.536em;"><span class="mn" id="MathJax-Span-37" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-38" style="font-family: STIXGeneral-Regular;">(</span><span class="msubsup" id="MathJax-Span-39"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-40" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-41" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">1</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-42" style="font-family: STIXGeneral-Regular;">,</span><span class="msubsup" id="MathJax-Span-43" style="padding-left: 0.216em;"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-44" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-45" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-46" style="font-family: STIXGeneral-Regular;">)</span><span class="mo" id="MathJax-Span-47" style="font-family: STIXGeneral-Regular; padding-left: 0.323em;">=</span><span class="msqrt" id="MathJax-Span-48" style="padding-left: 0.323em;"><span style="display: inline-block; position: relative; width: 7.203em; height: 0px;"><span style="position: absolute; clip: rect(2.776em 1000em 5.549em -0.371em); top: -3.997em; left: 1.069em;"><span class="mrow" id="MathJax-Span-49"><span class="munderover" id="MathJax-Span-50"><span style="display: inline-block; position: relative; width: 1.336em; height: 0px;"><span style="position: absolute; clip: rect(1.869em 1000em 3.629em -0.371em); top: -2.984em; left: 0.003em;"><span class="mo" id="MathJax-Span-51" style="font-family: STIXSizeOneSym; vertical-align: -0.531em;">∑</span><span style="display: inline-block; width: 0px; height: 2.989em;"></span></span><span style="position: absolute; clip: rect(1.923em 1000em 2.829em -0.531em); top: -1.224em; left: 0.483em;"><span class="texatom" id="MathJax-Span-52"><span class="mrow" id="MathJax-Span-53"><span class="mi" id="MathJax-Span-54" style="font-size: 70.7%; font-family: STIXGeneral-Italic;">p</span></span></span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="msubsup" id="MathJax-Span-55" style="padding-left: 0.216em;"><span style="display: inline-block; position: relative; width: 4.536em; height: 0px;"><span style="position: absolute; clip: rect(1.976em 1000em 3.523em -0.317em); top: -2.984em; left: 0.003em;"><span class="mrow" id="MathJax-Span-56"><span class="mo" id="MathJax-Span-57" style="vertical-align: -0.211em;"><span style="font-family: STIXSizeOneSym;">(</span></span><span class="msubsup" id="MathJax-Span-58"><span style="display: inline-block; position: relative; width: 0.963em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-59" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; clip: rect(1.923em 1000em 2.723em -0.531em); top: -2.877em; left: 0.483em;"><span class="mi" id="MathJax-Span-60" style="font-size: 70.7%; font-family: STIXGeneral-Italic;">p</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span><span style="position: absolute; clip: rect(1.763em 1000em 2.563em -0.424em); top: -2.077em; left: 0.376em;"><span class="mn" id="MathJax-Span-61" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">1</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-62" style="font-family: STIXGeneral-Regular; padding-left: 0.269em;">−</span><span class="msubsup" id="MathJax-Span-63" style="padding-left: 0.269em;"><span style="display: inline-block; position: relative; width: 0.963em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-64" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; clip: rect(1.923em 1000em 2.723em -0.531em); top: -2.877em; left: 0.483em;"><span class="mi" id="MathJax-Span-65" style="font-size: 70.7%; font-family: STIXGeneral-Italic;">p</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span><span style="position: absolute; clip: rect(1.763em 1000em 2.563em -0.477em); top: -2.077em; left: 0.376em;"><span class="mn" id="MathJax-Span-66" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-67" style="vertical-align: -0.211em;"><span style="font-family: STIXSizeOneSym;">)</span></span></span><span style="display: inline-block; width: 0px; height: 2.989em;"></span></span><span style="position: absolute; top: -2.984em; left: 4.109em;"><span class="mn" id="MathJax-Span-68" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span></span><span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="position: absolute; clip: rect(3.043em 1000em 3.416em -0.477em); top: -4.477em; left: 1.069em;"><span style="display: inline-block; position: relative; width: 6.136em; height: 0px;"><span style="position: absolute; font-family: STIXGeneral-Regular; top: -3.997em; left: 0.003em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="position: absolute; font-family: STIXGeneral-Regular; top: -3.997em; left: 5.656em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 0.429em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 0.856em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 1.283em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 1.709em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 2.136em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 2.563em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 3.043em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 3.469em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 3.896em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 4.323em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 4.749em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="font-family: STIXGeneral-Regular; position: absolute; top: -3.997em; left: 5.176em;">‾<span style="display: inline-block; width: 0px; height: 4.003em;"></span></span></span><span style="display: inline-block; width: 0px; height: 4.003em;"></span></span><span style="position: absolute; clip: rect(1.336em 1000em 4.749em -0.371em); top: -2.824em; left: 0.003em;"><span style="font-family: STIXSizeThreeSym;">√</span><span style="display: inline-block; width: 0px; height: 4.056em;"></span></span></span></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span></span><span style="border-left-width: 0.003em; border-left-style: solid; display: inline-block; overflow: hidden; width: 0px; height: 4.003em; vertical-align: -2.197em;"></span></span></nobr>
<script type="math/tex; mode=display" id="MathJax-Element-2">d_2 (I_1, I_2) = \sqrt{\sum_{p} \left( I^p_1 - I^p_2 \right)^2}</script>
<nobr><span class="math" id="MathJax-Span-69" style="width: 7.683em; display: inline-block;"><span style="display: inline-block; position: relative; width: 6.136em; height: 0px; font-size: 125%;"><span style="position: absolute; clip: rect(1.069em 1000em 3.576em -0.371em); top: -2.557em; left: 0.003em;"><span class="mrow" id="MathJax-Span-70"><span class="mn" id="MathJax-Span-71" style="font-family: STIXGeneral-Regular;">1</span><span class="mo" id="MathJax-Span-72" style="font-family: STIXGeneral-Regular; padding-left: 0.269em;">−</span><span class="mfrac" id="MathJax-Span-73" style="padding-left: 0.269em;"><span style="display: inline-block; position: relative; width: 4.163em; height: 0px; margin-right: 0.109em; margin-left: 0.109em;"><span style="position: absolute; clip: rect(1.763em 1000em 2.883em -0.477em); top: -3.251em; left: 50%; margin-left: -1.224em;"><span class="mrow" id="MathJax-Span-74"><span class="msubsup" id="MathJax-Span-75"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-76" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-77" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">1</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="mo" id="MathJax-Span-78" style="font-family: STIXGeneral-Regular; padding-left: 0.269em;">⋅</span><span class="msubsup" id="MathJax-Span-79" style="padding-left: 0.269em;"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-80" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-81" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; clip: rect(1.709em 1000em 2.883em -0.424em); top: -1.864em; left: 50%; margin-left: -2.024em;"><span class="mrow" id="MathJax-Span-82"><span class="texatom" id="MathJax-Span-83"><span class="mrow" id="MathJax-Span-84"><span class="texatom" id="MathJax-Span-85"><span class="mrow" id="MathJax-Span-86"><span class="mo" id="MathJax-Span-87" style="font-family: STIXGeneral-Regular;">|</span></span></span><span class="texatom" id="MathJax-Span-88"><span class="mrow" id="MathJax-Span-89"><span class="mo" id="MathJax-Span-90" style="font-family: STIXGeneral-Regular;">|</span></span></span><span class="msubsup" id="MathJax-Span-91"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-92" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-93" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">1</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="texatom" id="MathJax-Span-94"><span class="mrow" id="MathJax-Span-95"><span class="mo" id="MathJax-Span-96" style="font-family: STIXGeneral-Regular;">|</span></span></span><span class="texatom" id="MathJax-Span-97"><span class="mrow" id="MathJax-Span-98"><span class="mo" id="MathJax-Span-99" style="font-family: STIXGeneral-Regular;">|</span></span></span></span></span><span class="mo" id="MathJax-Span-100" style="font-family: STIXGeneral-Regular; padding-left: 0.269em;">⋅</span><span class="texatom" id="MathJax-Span-101" style="padding-left: 0.269em;"><span class="mrow" id="MathJax-Span-102"><span class="texatom" id="MathJax-Span-103"><span class="mrow" id="MathJax-Span-104"><span class="mo" id="MathJax-Span-105" style="font-family: STIXGeneral-Regular;">|</span></span></span><span class="texatom" id="MathJax-Span-106"><span class="mrow" id="MathJax-Span-107"><span class="mo" id="MathJax-Span-108" style="font-family: STIXGeneral-Regular;">|</span></span></span><span class="msubsup" id="MathJax-Span-109"><span style="display: inline-block; position: relative; width: 0.803em; height: 0px;"><span style="position: absolute; clip: rect(1.763em 1000em 2.723em -0.477em); top: -2.557em; left: 0.003em;"><span class="mi" id="MathJax-Span-110" style="font-family: STIXGeneral-Italic;">I<span style="display: inline-block; overflow: hidden; height: 1px; width: 0.056em;"></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; top: -2.237em; left: 0.376em;"><span class="mn" id="MathJax-Span-111" style="font-size: 70.7%; font-family: STIXGeneral-Regular;">2</span><span style="display: inline-block; width: 0px; height: 2.403em;"></span></span></span></span><span class="texatom" id="MathJax-Span-112"><span class="mrow" id="MathJax-Span-113"><span class="mo" id="MathJax-Span-114" style="font-family: STIXGeneral-Regular;">|</span></span></span><span class="texatom" id="MathJax-Span-115"><span class="mrow" id="MathJax-Span-116"><span class="mo" id="MathJax-Span-117" style="font-family: STIXGeneral-Regular;">|</span></span></span></span></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span><span style="position: absolute; clip: rect(0.856em 1000em 1.229em -0.477em); top: -1.277em; left: 0.003em;"><span style="border-left-width: 4.163em; border-left-style: solid; display: inline-block; overflow: hidden; width: 0px; height: 1.25px; vertical-align: 0.003em;"></span><span style="display: inline-block; width: 0px; height: 1.069em;"></span></span></span></span></span><span style="display: inline-block; width: 0px; height: 2.563em;"></span></span></span><span style="border-left-width: 0.003em; border-left-style: solid; display: inline-block; overflow: hidden; width: 0px; height: 2.87em; vertical-align: -1.13em;"></span></span></nobr>
<script type="math/tex; mode=display" id="MathJax-Element-3">1 - \frac{I_1 \cdot I_2} {{||I_1||} \cdot {||I_2||}}</script>
更多的距离准则可以参见scipy相关计算页面.
3. K最近邻分类器(K Nearest Neighbor Classifier)
这是对最近邻的思想的一个调整。其实我们在使用最近邻分类器分类,扫描CIFAR训练集的时候,会发现,有时候不一定距离最近的和当前图片是同类,但是最近的一些里有很多和当前图片是同类。所以我们自然而然想到,把最近邻扩展为最近的N个临近点,然后统计一下这些点的类目分布,取最多的那个类目作为自己的类别。
恩,这就是KNN的思想。
KNN其实是一种特别常用的分类算法。但是有个问题,我们的K值应该取多少呢。换句话说,我们找多少邻居来投票,比较靠谱呢?
3.1 交叉验证与参数选择
在现在的场景下,假如我们确定使用KNN来完成图片类别识别问题。那很明显有一些参数是会影响最后的识别结果的,比如距离的选择(l1,l2,cos等等),比如近邻个数K的取值。其实我们可以认为每个参数组产生一个新的model,这就是模型选择/model selection问题。而对于模型选择问题,最常用的办法就是在交叉验证集上实验。
测试集是很宝贵的数据,是用来评价一个机器学习方法在这个场景下的效果的,如果我们在test data上做模型参数选择,又用它做效果评估,显然不是那么合理,应该我们的模型参数很有可能是在test data上过拟合的,不能很公正地评估结果。
所以我们通常会把训练数据分为两个部分,一大部分作为训练用,另外一部分就是所谓的cross validation数据集,用来进行模型参数选择的。比如说我们有50000训练图片,我们可以把它分为49000的训练集和1000的交叉验证集。
Xval_rows = Xtr_rows[:1000, :]
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :]
Ytr = Ytr[1000:]
validation_accuracies = []
for k in [1, 3, 5, 7, 10, 20, 50, 100]:
nn = NearestNeighbor()
nn.train(Xtr_rows, Ytr)
Yval_predict = nn.predict(Xval_rows, k = k)
acc = np.mean(Yval_predict == Yval)
print 'accuracy: %f' % (acc,)
validation_accuracies.append((k, acc))
这里提一个在很多地方会看到的概念,叫做k-fold cross-validation,意思其实就是把原始数据分成k份,轮流使用其中k-1份作为训练数据,而剩余的1份作为交叉验证数据(因此其实对于k-fold cross-validation我们会得到k个accuracy)。以下是5-fold cross-validation的一个示例:
以下是我们使用5-fold cross-validation,取不同的k值时,得到的accuracy曲线(补充一下,因为是5-fold cross-validation,所以在每个k值上有5个取值,我们取其均值作为此时的准确度)
可以看出大概在k=7左右有最佳的准确度。
3.2 最近邻方法的优缺点
K最近邻的优点大家都看出来了,思路非常简单清晰,而且完全不需要训练…不过也正因为如此,最后的predict过程非常耗时,因为要和全部训练集中的图片比对一遍。
实际应用中,我们其实更加关心实施predict所消耗的时间,如果有一个图像识别app返回结果要半小时一小时,你一定第一时间把它卸了。我们反倒不那么在乎训练时长,训练时间长一点没关系,只要最后应用的时候识别速度快,就很赞。后面会提到的深度神经网络就是这样,训练其实是一个很耗时间的过程,但是识别的过程非常快。
另外,不得不多说一句的是,优化计算K最近邻时间问题,实际上依旧到现在都是一个非常热门的问题。Approximate Nearest Neighbor (ANN)算法是牺牲掉一小部分的准确度,而提高很大程度的速度,能比较快地找到近似的K最近邻,现在已经有很多这样的库,比如说FLANN.
最后,我们用一张图来说明一下,用图片像素级别的距离,来实现图像类别识别有其不足之处,我们用一个叫做t-SNE的技术把CIFAR-10的所有图片按两个维度平铺出来,靠得越近的图片表示其像素级别的距离越接近。然而我们瞄一眼,发现,其实靠得最近的并不一定是同类别的。
其实观察一下,你就会发现,像素级别接近的图片,在整张图的颜色分布上,有很大的共性,然而并不一定是相同类别的。
<script type="text/javascript"> $(function () { $('pre.prettyprint code').each(function () { var lines = $(this).text().split('\n').length; var $numbering = $('<ul/>').addClass('pre-numbering').hide(); $(this).addClass('has-numbering').parent().append($numbering); for (i = 1; i <= lines; i++) { $numbering.append($('<li/>').text(i)); }; $numbering.fadeIn(1700); }); }); </script>