CS231n 2018作业1-svm

Multiclass Support Vector Machine exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights
In [2]:
# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

from __future__ import print_function

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

CIFAR-10 Data Loading and Preprocessing

In [3]:
# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')

X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)
In [4]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        if i == 0:
CS231n 2018作业1-svm_第1张图片
In [5]:
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)
In [6]:
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)
Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)
In [7]:
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10]) # print a few of the elements
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
[130.64189796 135.98173469 132.47391837 130.05569388 135.34804082
 131.75402041 130.96055102 136.14328571 132.47636735 131.48467347]
CS231n 2018作业1-svm_第2张图片
In [8]:
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image
In [9]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)
(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.

In [10]:
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))
loss: 9.024059

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:

In [11]:
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad)
numerical: 9.652329 analytic: 9.652329, relative error: 2.631778e-12
numerical: 3.027077 analytic: 3.027077, relative error: 2.894781e-11
numerical: -1.506732 analytic: -1.506732, relative error: 1.506061e-11
numerical: -16.596870 analytic: -16.596870, relative error: 5.994837e-12
numerical: 6.837730 analytic: 6.837730, relative error: 2.072676e-11
numerical: -4.684541 analytic: -4.684541, relative error: 1.920132e-11
numerical: -9.329935 analytic: -9.329935, relative error: 1.519624e-11
numerical: -33.272571 analytic: -33.272571, relative error: 8.557246e-12
numerical: -40.044622 analytic: -40.044622, relative error: 1.399815e-11
numerical: -25.729769 analytic: -25.729769, relative error: 1.288088e-11
numerical: -10.181922 analytic: -10.181922, relative error: 7.999948e-12
numerical: -0.695167 analytic: -0.695167, relative error: 3.820667e-10
numerical: 6.849284 analytic: 6.849284, relative error: 1.342249e-11
numerical: 34.644642 analytic: 34.644642, relative error: 9.415071e-12
numerical: -9.204112 analytic: -9.204112, relative error: 8.060574e-12
numerical: 6.290410 analytic: 6.290410, relative error: 2.208406e-11
numerical: -31.186315 analytic: -31.186315, relative error: 5.067459e-12
numerical: -6.503792 analytic: -6.503792, relative error: 1.688283e-11
numerical: 34.955946 analytic: 34.967654, relative error: 1.674422e-04
numerical: -8.008497 analytic: -8.008497, relative error: 1.713785e-11

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? How would change the margin affect of the frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: fill this in.合页损失在转折点并非可导,因此损失函数是非连续可导函数,在转折点会出现上诉情况。减小dh的值可以减少这个情况的出现概率。

In [23]:
# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# The losses should match but your vectorized implementation should be much faster.
print('difference: %f' % (loss_naive - loss_vectorized))
Naive loss: 9.024059e+00 computed in 0.163455s
Vectorized loss: 9.024059e+00 computed in 0.008023s
difference: 0.000000
In [24]:
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss and gradient: computed in %fs' % (toc - tic))

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('difference: %f' % difference)
Naive loss and gradient: computed in 0.142701s
Vectorized loss and gradient: computed in 0.007845s
difference: 0.000000

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.

In [25]:
# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))
iteration 0 / 1500: loss 17.422355
iteration 100 / 1500: loss 8.242404
iteration 200 / 1500: loss 6.312624
iteration 300 / 1500: loss 5.756854
iteration 400 / 1500: loss 5.143419
iteration 500 / 1500: loss 5.038155
iteration 600 / 1500: loss 5.276090
iteration 700 / 1500: loss 5.720771
iteration 800 / 1500: loss 5.172697
iteration 900 / 1500: loss 5.215311
iteration 1000 / 1500: loss 5.389784
iteration 1100 / 1500: loss 5.734771
iteration 1200 / 1500: loss 5.009499
iteration 1300 / 1500: loss 5.207837
iteration 1400 / 1500: loss 5.394275
That took 10.697339s
In [26]:
# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
CS231n 2018作业1-svm_第3张图片
In [27]:
# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))
training accuracy: 0.372388
validation accuracy: 0.368000
In [33]:
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.4 on the validation set.
learning_rates = [2e-7, 5e-7]
regularization_strengths = [2.5e-5, 5e-5]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
# Your code
itertimes = 1201
for learningrate in learning_rates:
    for reg in regularization_strengths:
        clf = LinearSVM()
        clf.train(X_train, y_train,
            learning_rate=learningrate,reg=reg,num_iters=itertimes,batch_size= 2000,verbose=True)
        y_test_pred = clf.predict(X_val)
        acc_val = np.mean(y_val == y_test_pred,dtype=np.float32)
        y_val_pred = clf.predict(X_train)
        acc_train = np.mean(y_train == y_val_pred,dtype=np.float32)
        results[(learningrate,reg)] = acc_train,acc_val
        if acc_val > best_val:
            best_val = acc_val
            best_svm = clf
#                              END OF YOUR CODE                                #
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
print('best validation accuracy achieved during cross-validation: %f' % best_val)
iteration 0 / 1201: loss 9.396445
iteration 100 / 1201: loss 4.804635
iteration 200 / 1201: loss 4.668281
iteration 300 / 1201: loss 4.400242
iteration 400 / 1201: loss 4.233463
iteration 500 / 1201: loss 4.461487
iteration 600 / 1201: loss 4.312132
iteration 700 / 1201: loss 4.153705
iteration 800 / 1201: loss 4.215334
iteration 900 / 1201: loss 4.230379
iteration 1000 / 1201: loss 4.337701
iteration 1100 / 1201: loss 4.384940
iteration 1200 / 1201: loss 3.968596
iteration 0 / 1201: loss 9.306032
iteration 100 / 1201: loss 4.679320
iteration 200 / 1201: loss 4.513356
iteration 300 / 1201: loss 4.513817
iteration 400 / 1201: loss 4.465918
iteration 500 / 1201: loss 4.291443
iteration 600 / 1201: loss 4.231527
iteration 700 / 1201: loss 4.133021
iteration 800 / 1201: loss 4.179305
iteration 900 / 1201: loss 4.263181
iteration 1000 / 1201: loss 4.392984
iteration 1100 / 1201: loss 4.393957
iteration 1200 / 1201: loss 4.160691
iteration 0 / 1201: loss 9.299597
iteration 100 / 1201: loss 4.578729
iteration 200 / 1201: loss 4.287941
iteration 300 / 1201: loss 4.235768
iteration 400 / 1201: loss 4.336439
iteration 500 / 1201: loss 4.198212
iteration 600 / 1201: loss 4.095109
iteration 700 / 1201: loss 4.378531
iteration 800 / 1201: loss 4.099602
iteration 900 / 1201: loss 3.984169
iteration 1000 / 1201: loss 4.101741
iteration 1100 / 1201: loss 4.026497
iteration 1200 / 1201: loss 4.092931
iteration 0 / 1201: loss 8.793367
iteration 100 / 1201: loss 4.408832
iteration 200 / 1201: loss 4.427092
iteration 300 / 1201: loss 4.164823
iteration 400 / 1201: loss 4.229861
iteration 500 / 1201: loss 4.046057
iteration 600 / 1201: loss 4.185190
iteration 700 / 1201: loss 4.040508
iteration 800 / 1201: loss 3.923125
iteration 900 / 1201: loss 4.109174
iteration 1000 / 1201: loss 4.003199
iteration 1100 / 1201: loss 4.039926
iteration 1200 / 1201: loss 4.274549
lr 2.000000e-07 reg 2.500000e-05 train accuracy: 0.414184 val accuracy: 0.399000
lr 2.000000e-07 reg 5.000000e-05 train accuracy: 0.412571 val accuracy: 0.400000
lr 5.000000e-07 reg 2.500000e-05 train accuracy: 0.424224 val accuracy: 0.419000
lr 5.000000e-07 reg 5.000000e-05 train accuracy: 0.417592 val accuracy: 0.401000
best validation accuracy achieved during cross-validation: 0.419000
In [34]:
# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
CS231n 2018作业1-svm_第4张图片
In [35]:
# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)
linear SVM on raw pixels final test set accuracy: 0.385000
In [36]:
# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
CS231n 2018作业1-svm_第5张图片

Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: 从信号检测来看,这是相关性检测,看图片与哪一个类的相关系数最大。相关检测。
