Lecture 2: Image Classification pipeline
Image Classification:a core task in Computer Vision
The Problem:Semantic Gap
This idea of a cat ,or this label of a cat is a semantic label that we're assigning to this image and there's this huge gap between the semantic idea of a cat and these pixel values that the computer is actually seeing.
The challenges:
-ViewPoint variation
-Illumination
-Deformation
-Occlusion
-Backgound Clutter
-Intraclass variation
Method
1.An image claaaifier
Find edges -Find corners
2.Data -Driven Approach
1.Collect a dataset of images and labels
2.Use Machine Learning to train an emage classifier
3.Evalute the classifier on a withheld set of test images
2.1 First classifier:Neareat Neighbor Classifier
2.1.1 Distance Metric to compare images
Manhattan distance
2.2 K-Nearest Neighbors
Instead of copying label from nearest neighbor,take majority vote from K closrt points.
2.2.1 Distance Metric to compare images
2.2.2 Hyperparameters
The best value of k to use The best distance to use
choices about the algorithm that we set rather than learn
cross validation
Linear Classification
Lecture 3: Loss functions an Optimization
Loss Function
quantify for any given value of W
1 SVM
1.1 Regularization
2 Softmax Classifier
Optimization
slope
gradient
Image Features
以上都是听课的时候为了集中注意力随便记的,现在开始边看笔记边进行更仔细地理解
笔记是https://zhuanlan.zhihu.com/p/20900216?refer=intelligentunit
图像分类
什么是图像分类问题呢
图像分类问题就是,有一个固定的标签的集合,输入一张图片,分配给该张图片对应的标签。
图像分类问题是计算机视觉中的核心问题之一,很多问题如物体检测和分割等,都可以看成图像分类问题。
举个例子
读取这张图片,并生成该图片属于集合 {cat, dog, hat, mug}中各个标签的概率。
对于计算机来说,图象是一个由数字组成的巨大的3维数组。
图像分类的任务就是,把上百万的数字变成一个简单的标签,比如猫。
挑战
对于人来说图像分类是非常简单的事情,当你看到一张图片时脑袋里会很自然地生成对应的概念。但是对于计算机来说,这是几百万个三维数组。下面列举一些常见的图像分类的困难。
1.视角变化(Viewpoint variation)
2.大小变化(Scale variation)
3.变形(Deformation)
4.遮挡(Occlusion)
5.光照条件(Illumination)
6.背景干扰(Background clutter)
7.类内差异(Intra-class variation)
如何进行图像分类呢-数据驱动
与排序算法不同,我们很难直接写出一个根据特征识别各类物体的算法,我们采取的是一种类似教小孩子认东西的方式:我们为每一类标签提供大量的示例图片,然后计算机通过学习算法,学习这些数据集,学习到各类物体的外观特征。即数据驱动方法。
图像分类流程
1.输入(Input):输入训练集Traning set(包含N个图像的集合,每个图像的标签是K种分类标签中的一种)。
2.学习(Learning):用训练集(Training set)来学习每个类的外观特征。这个过程叫训练分类器(Training a classifier),或者学习模型(Learning a model)。
3.评价(Evaluation):让训练好的分类器去预测从没见过的图像的分类标签。
下面介绍两种分类器
1. 近邻分类器 Nearest Neighbor Classifier
Nearest Neighor Classifier将测试图片与训练集中的每个图片去比对,将差别最小的那个类的标签,作为预测的结果。
不过从右侧图片,可以看出,准确率不是很高
如何比较图片?
1.1 L1 distance
即将两张图片的像素矩阵相减,得出的差值矩阵求值,
下图是公式和更为直观的过程
代码实现
1将CIFAR-10数据加载到内存中
并分成4个数组:训练数据/标签 测试数据/标签
Xtr:traning set Ytr: traning labels
Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
# flatten out all images to be one-dimensional
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072
2训练并评价分类器
accuracy准确率
train(X,y):使用训练集的数据和标签来进行训练
predict(X):预测输入的新数据的标签
nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )
3.Nearest Neighbor 分类器的实现
import numpy as np
class NearestNeighbor(object):
def __init__(self):
pass
def train(self, X, y):
""" X is N x D where each row is an example. Y is 1-dimension of size N """
# the nearest neighbor classifier simply remembers all the training data
self.Xtr = X
self.ytr = y
def predict(self, X):
""" X is N x D where each row is an example we wish to predict label for """
num_test = X.shape[0]
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
return Ypred
1.2 L2 distance
公式如下:
同L1相比,也是计算像素间插值,只是先求平方然后把这些平方全部加起来,对平方和开方
1.3 L1和L2比较
面对图片之间的差异,因为L2采取了平方,放大了差异,所以在面对向量之间的差异的时候,L2比L1更不能容忍差异。
2 K-Nearest Neighbor Classifier
K的意思是,我们不只去选择那一张训练集中最接近的图片,而是选择k张最接近的图片。
下图中
不同的颜色区域代表着决策边界(decision boundaries)
白色代表着分类模糊的区域
可以看出NN classifier 中有很多异常的数据点
5-NN的边缘更平滑,泛化(generalization)能力更好
2.1用于超参数调优的验证集 Validation sets for Hyperparameter tuning
超参数(Hyperparameter):K-Nearest 分类器中的K,哪一种的距离选择,这类数据都属于超参方法数
2.2调优超参数的方法
不能用测试集调优,可能会出现过拟合
2.2.1 验证集(validation set)
从训练集中取出一部分数据调优。
代码如下:
# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before
# recall Xtr_rows is 50,000 x 3072 matrix
Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train
Ytr = Ytr[1000:]
# find hyperparameters that work best on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]:
# use a particular value of k and evaluation on validation data
nn = NearestNeighbor()
nn.train(Xtr_rows, Ytr)
# here we assume a modified NearestNeighbor class that can take a k as input
Yval_predict = nn.predict(Xval_rows, k = k)
acc = np.mean(Yval_predict == Yval)
print 'accuracy: %f' % (acc,)
# keep track of what works on the validation set
validation_accuracies.append((k, acc))
2.2.2交叉验证
如将训练集平均分成五份,每次取一份来验证,4份来训练,然后循环取其他四份来验证,将验证结果的平均值作为结果
2.2.3实际情况
。一般直接把训练集按照50%-90%的比例分成训练集和验证集