这一系列主要记录cs231n课程对应的笔记。每学习一个课程,写下来,看看自己理解了多少。
cs231n是大牛lifeifei在斯坦福开设的深度学习与计算机视觉课程。
课程地址:
http://vision.stanford.edu/teaching/cs231n/syllabus.html
课程ppt:
https://github.com/autoliuweijie/DeepLearning/tree/master/cs231n/Slides
知乎专栏有人专门翻译了笔记:
https://zhuanlan.zhihu.com/p/21930884?refer=intelligentunit
1、Image classification
图像分类,其实并不是那么的好实现,因为在实际过程中,会遇到各种各样的问题,如:如相机角度(旋转平移等),光照亮度,识别物变形,遮挡,背景干扰,不同类别(如不同品种的猫)等等。
尽管有各种各样的问题,但深度学习的方法却可以做到十分鲁棒。
2、数据驱动的分类方法
在深度学习的方法中,其实是采用的数据驱动的分类方法。数据驱动,即在准备阶段,会收集很多的数据。根据这些样本数据,来训练对应的模型,然后根据这个模型,对输入的图像进行分类。
3、KNN分类器
KNN分类器就是在一大堆数据样本中,找到跟测试数据最为接近的K个样本,然后根据这K个样本的类别分布,找到占比最大的类别作为测试数据的类别。
这里就有个问题,如何衡量是否接近?
这里采用的是L1 distance 或者 L2 distance,采用哪一种度量方法,其实这可以视为一个超参数。
另外,K也可以视为超参数。
那么,超参数要怎么选择?
是否需要一个个的尝试,然后找到最好的参数。一个个尝试的方法是不可取的,根据测试样本集找到最好的参数,往往容易过拟合。
一般采用多折交叉验证的方法,如五折交叉验证。把样本集分成5个子集,每个子集均做一次测试集,其余的作为训练集。交叉验证重复5次,每次选择一个子集作为测试集,并将k次的平均交叉验证识别正确率作为结果。
作业中提到的KNN具体步骤:
We would now like to classify the test data with the kNN classifier.
Recall that we can break down this process into two steps:
- First we must compute the distances between all test examples and all train examples.
- Given these distances, for each test example we find the k nearest examples and have them vote for the label
Lets begin with computing the distance matrix between all training and
test examples. For example, if there are Ntr training examples and Nte
test examples, this stage should result in a Nte x Ntr matrix where
each element (i,j) is the distance between the i-th test and j-th
train example.
最终得到的是NtexNtr的矩阵。
这里采用的是L2 distance,公式如下:
先用最简单的两次循环完成dist的计算。
1、two loops
def compute_distances_two_loops(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in xrange(num_test):
for j in xrange(num_train):
dists[i,j] = np.sqrt(np.sum(np.square(X[i,:]-self.X_train[j,:])))
pass
return dists
得到dists之后,我们需要对该dists判断,根据K个最相近的图像来判断类别。
2、 predict labels
def predict_labels(self, dists, k=1):
num_test = dists.shape[0]
y_pred = np.zeros(num_test)
for i in xrange(num_test):
closest_y = []
labels = self.y_train[np.argsort(dists[i,:])]
closest_y = labels[0:k]
return y_pred
np.argsort表示返回list中按照从小到大排序的索引值。
3、one loop
def compute_distances_one_loop(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in xrange(num_test):
dists[i,:] = np.sqrt(np.sum(np.square(self.X_train-X[i,:]),axis=1))
return dists
关于axis的用法:
需要记住,axis=1,表示在向量内部进行操作。
axis=0表示横向进行操作,把每个向量视为列向量。
>>> np.sum([[0, 1], [0, 5]], axis=0)
array([0, 6])
>>> np.sum([[0, 1], [0, 5]], axis=1)
array([1, 5])
4、matrix
def compute_distances_no_loops(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
M = np.dot(X, self.X_train.T)
te = np.square(X).sum(axis = 1)
tr = np.square(self.X_train).sum(axis = 1)
dists = np.sqrt(-2*M+tr+np.matrix(te).T)
return dists
这里是先把公式进行分解变换,然后再求取对应的值。
5、cross validation
num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
X_train_folds = []
y_train_folds = []
X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
k_to_accuracies = {}
for k in k_choices:
k_to_accuracies[k] = []
for k in k_choices:
print 'evaluating k=%d' % k
for j in range(num_folds):
X_train_cv = np.vstack(X_train_folds[0:j]+X_train_folds[j+1:])
X_test_cv = X_train_folds[j]
#print len(y_train_folds), y_train_folds[0].shape
y_train_cv = np.hstack(y_train_folds[0:j]+y_train_folds[j+1:])
y_test_cv = y_train_folds[j]
#print 'Training data shape: ', X_train_cv.shape
#print 'Training labels shape: ', y_train_cv.shape
#print 'Test data shape: ', X_test_cv.shape
#print 'Test labels shape: ', y_test_cv.shape
classifier.train(X_train_cv, y_train_cv)
dists_cv = classifier.compute_distances_no_loops(X_test_cv)
#print 'predicting now'
y_test_pred = classifier.predict_labels(dists_cv, k)
num_correct = np.sum(y_test_pred == y_test_cv)
accuracy = float(num_correct) / num_test
k_to_accuracies[k].append(accuracy)
# Print out the computed accuracies
for k in sorted(k_to_accuracies):
for accuracy in k_to_accuracies[k]:
print 'k = %d, accuracy = %f' % (k, accuracy)