课程note:http://cs231n.github.io/classification/
这一节课程主要介绍以下几个部分:图像识别的基本概念、最近邻分类器、验证集/交叉验证集。
关于图像识别的基本概念,学过图像处理课程的同学们,基本上都知道了。
我这里重点记录一下,图像识别中面临的几个挑战:
1. Viewpoint variation. A single instance of an object can be oriented in many ways with respect to the camera.
2. Scale variation. Visual classes often exhibit variation in their size (size in the real world, not only in terms of their extent in the image).
3. Deformation. Many objects of interest are not rigid bodies and can be deformed in extreme ways.
4. Occlusion. The objects of interest can be occluded. Sometimes only a small portion of an object (as little as few pixels) could be visible.
5. Illumination conditions. The effects of illumination are drastic on the pixel level.
6. Background clutter. The objects of interest may blend into their environment, making them hard to identify.
7. Intra-class variation. The classes of interest can often be relatively broad, such as chair. There are many different types of these objects, each with their own appearance.
import os
import numpy as np
import cPickle
class NearestNeighbor(object):
def __init__(self):
pass
def train(self, X, y):
self.Xtr = X
self.ytr = y
def predict(self, X):
num_test = X.shape[0]
Ypred = np.zeros(num_test, dtype=self.ytr.dtype)
for i in xrange(num_test):
distances = np.sum(np.abs(self.Xtr.astype(int) - X[i, :].astype(int)), axis=1)
min_index = np.argmin(distances)
Ypred[i] = self.ytr[min_index]
return Ypred
def load_CIFAR10(file_path):
file_names = os.listdir(file_path)
train_list = list()
train_data = list()
train_label = list()
for fname in file_names:
if fname.startswith('data_batch'):
full_path = os.path.join(file_path, fname)
fo = open(full_path, 'rb')
train_dict = cPickle.load(fo)
train_data.extend(train_dict['data'])
train_label.extend(train_dict['labels'])
fo.close()
if fname.startswith('test_batch'):
full_path = os.path.join(file_path, fname)
fo = open(full_path, 'rb')
test_dict = cPickle.load(fo)
test_data = test_dict['data']
test_label = test_dict['labels']
fo.close()
return np.array(train_data), np.array(train_label), test_data, np.array(test_label)
if __name__ == "__main__":
Xtr, Ytr, Xte, Yte = load_CIFAR10('/home/jeremy/Data/cifar-10-batches-py/')
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32*32*3)
Xte_rows = Xte.reshape(Xte.shape[0], 32*32*3)
nn = NearestNeighbor()
nn.train(Xtr_rows, Ytr)
Yte_predict = nn.predict(Xte_rows)
print "accuracy: %f" % (np.mean(Yte_predict==Yte))
L1 vs. L2.
It is interesting to consider differences between the two metrics. In particular, the L2 distance is much more unforgiving than the L1 distance when it comes to differences between two vectors. That is, the L2 distance prefers many medium disagreements to one big one. L1 and L2 distances (or equivalently the L1/L2 norms of the differences between a pair of images) are the most commonly used special cases of a p-norm.
上面的算法设计中,我们是利用Nearest Neighbor分类器中最相似的label作为结果,不过,这里,我们其实还有一个改进方法:那就是考虑前k个最相似的结果的labels,然后利用这些labels进行投票得到最佳的结果。这是一个自然的改进方法,不过由此我们面临了一个问题,那就是k的取值我们要怎么设定??
其实,不仅仅是k的值需要设定,我们该选择哪种距离度量方式(L1 or L2)也是需要我们设定的。
我们统一将这些需要设定的参数称为: Hyperparameters
These choices are called hyperparameters and they come up very often in the design of many Machine Learning algorithms that learn from data. It’s often not obvious what values/settings one should choose.
You might be tempted to suggest that we should try out many different values and see what works best. That is a fine idea and that’s indeed what we will do, but this must be done very carefully. In particular, we cannot use the test set for the purpose of tweaking hyperparameters. Whenever you’re designing Machine Learning algorithms, you should think of the test set as a very precious resource that should ideally never be touched until one time at the very end.
if you only use the test set once at end, it remains a good proxy for measuring the generalization of your classifier (we will see much more discussion surrounding generalization later in the class).
Evaluate on the test set only a single time, at the very end.
虽然我们不能使用Test set,但是我们可以使用一个替代的fake test dataset,那就是Validation set。这个Validation set由原始的Train set
的一小部分组成:
Original Train set = Train set + Validation set
In practice. In practice, people prefer to avoid cross-validation in favor of having a single validation split, since cross-validation can be computationally expensive. The splits people tend to use is between 50%-90% of the training data for training and rest for validation. However, this depends on multiple factors: For example if the number of hyperparameters is large you may prefer to use bigger validation splits. If the number of examples in the validation set is small (perhaps only a few hundred or so), it is safer to use cross-validation. Typical number of folds you can see in practice would be 3-fold, 5-fold or 10-fold cross-validation.
In summary: