一张图用三个维度表示:长度、宽度、通道 w*h*c个number
每个number ranges from 0 (black) to 255 (white)
对于人类来说,识别图片是非常简单的,所以在计算机视觉算法的角度上这个挑战很有价值。以下是一些challenges:
l Viewpoint variation角度变化
l Scale variation尺度变化
l Deformation形变Many objects of interest are not rigid bodies(刚体) and can be deformed in extreme ways.
l Occlusion遮挡
l Illumination conditions光照条件
l Background clutter背景干扰The objects of interest may blend into(融入) their environment, making them hard to identify.
l Intra-class variation类内变化The classes of interest can often be relatively broad, such as chair. There are many different types of these objects, each with their own appearance.
l Input: training set.
l Learning: training a classifier, or learning a model.
l Evaluation: In the end, we evaluate the quality of the classifier by asking it to predict labels for a new set of images that it has never seen before.
60000张图片:10000张test,50000张train,大小为32*32 pixel,共10类。
本质是,输入一张test图片,找出训练集里与它的像素差最低的一张训练图(x中的一张图),直接认为其所对应的y就是答案。
核心实现功能(使用L1距离):
nn = NearestNeighbor() # create a Nearest Neighbor classifier class nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels Yte_predict = nn.predict(Xte_rows) # predict labels on the test images# and now print the classification accuracy, which is the average number# of examples that are correctly predicted (i.e. label matches) print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) |
import numpy as np class NearestNeighbor(object): def __init__(self): pass
def train(self, X, y): """ X is N x D where each row is an example. Y is 1-dimension of size N """ # the nearest neighbor classifier simply remembers all the training data self.Xtr = X self.ytr = y
def predict(self, X): """ X is N x D where each row is an example we wish to predict label for """ num_test = X.shape[0] # lets make sure that the output type matches the input type Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows for i in xrange(num_test): # find the nearest training image to the i'th test image # using the L1 distance (sum of absolute value differences) distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1) min_index = np.argmin(distances) # get the index with smallest distance Ypred[i] = self.ytr[min_index] # predict the label of the nearest example
return Ypred |
L2距离(欧氏距离):
L2得到的结果(35.4%)比L1( 38.6% )低一些
The L2 distance prefers many medium disagreements to one big one.
本质是查找training data中最相近k张图(NN中为1张图,即k=1时),然后对这k个答案进行投票。
Higher values of k have a smoothing effect that makes the classifier more resistant to outliers.
K越高,分类越平滑,因此可以更好地抵抗离群点。
K的值应该如何确定?
不要用测试集去调整参数,容易使得你的模型过拟合。
将training data中的一小部分分割成validation data,这里将5万张train中1千张作为验证集,来寻找最优的k(accuracy最高的k)。
# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before # recall Xtr_rows is 50,000 x 3072 matrix Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation Yval = Ytr[:1000] Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train Ytr = Ytr[1000:] # find hyperparameters that work best on the validation set validation_accuracies = [] for k in [1, 3, 5, 10, 20, 50, 100]:
# use a particular value of k and evaluation on validation data nn = NearestNeighbor() nn.train(Xtr_rows, Ytr) # here we assume a modified NearestNeighbor class that can take a k as input Yval_predict = nn.predict(Xval_rows, k = k) acc = np.mean(Yval_predict == Yval) print 'accuracy: %f' % (acc,)
# keep track of what works on the validation set validation_accuracies.append((k, acc)) |
N折交叉验证法
数据集较小的情况下。可以把training data划分成n份,依次用其中1份做验证集,那么对于每次对某个参数的实验,均可以获得n个accuracy。
以k值选择为例,取均值后,可以做出k与accuracy的折线图,图中发现k取7左右时accuracy最高(虽然我们并没有做k=7的实验)。
N取值越高,下图中的折线会更加平滑,有利于我们选择出最优的k来。
不使用交叉验证法的原因:计算代价较高,要计算n次。
如果要调整的参数较多,更倾向于50%-90% of the training data for training and rest for validation.
但参数较少时,交叉验证法更加保险。
一般取3-fold, 5-fold or 10-fold cross-validation.
训练时间非常少,但是测试时间非常久。
In pracitice, 我们希望反过来,而深度神经网络是符合的。
有一些NN的优化算法,如ANN等,是在训练阶段将图片索引化(kdtree或者用k-means先聚类一下)
低维度下NN是一个很好的选择,但是对于图像分类并不适用。
NN并不能体现图像之间的语义差别,更多的是图像的背景,色彩的差异。
presumably 大概;可能 adv.
It presumably due to the strong black background.
subtracted 减掉的,除去的 v.
elementwise 分别作用
stretched 拉伸
euclidean distance 欧氏距离
monotonic function 单调函数
preserve 保持
metric 指标 matrices 矩阵的复数
unforgiving 无情的
Induce 诱导
dot products 点积
tweaking hyperparameters 调整参数
deploy 部署
proxy 代理
generalization of your classifier 分类器的泛化能力
arbitrarily 任意地
error bars 误差线
Standard deviation 标准差
in favor of 有利于
desirable 可取的
trade off 权衡
counter-intuitive 与正常预期相反的
perceptual 感知的
Inadequate 不足的
embed 嵌入
embark 开始着手做
As a rule of thumb 作为经验法则