CS231n Spring 2019 Assignment 1—two_layer_net/features

到目前为止,作业1(assignment1)里面就剩两个ipynb作业了:two_layer_net.ipynb和features.ipynb。有了前面的基础,这两个作业并不难完成,虽然课程官网上有三个关于神经网络的笔记,但是实际做作业的时候好像没太用上里面的东西:

  • Neural Networks Part 1: Setting up the Architecture
  • Neural Networks Part 2: Setting up the Data and the Loss
  • Neural Networks Part 3: Learning and Evaluation

鉴于里面内容还是非常重要的,涉及预处理,激活函数,权重初始化,batch normalization,正则化,优化方法,所以之后有机会会集中写一篇总结一下里面的要点。主要还是看一下lecture 4就好了,里面最大的收获就是学会用计算图来进行反向传播的推导。

two_layer_net

针对之前的线性模型,神经网络模型(在这节里面就是全连接神经网络)又是一种更新的方法,主要是增加了非线性成分,能够解决非线性问题,在做作业之前可以看这一篇教程笔记:Putting it together: Minimal Neural Network Case Study,能够看到对于一些非线性问题不能用一条直线或一个超平面将其分开,而神经网络模型就派上用场了。一个2层全连接神经网络模型如下(我们不把输入层算在层数内):

CS231n Spring 2019 Assignment 1—two_layer_net/features_第1张图片
一个两层神经网络模型示意图

其实各个层之间也是一个线性映射,只是之后会进行一次激活而已,比如在这次的 two_layer_net.ipynb里面模型架构就是:
input - fully connected layer - ReLU - fully connected layer - softmax
作业跟前面的softmax差不多,只是多了一层的计算,反向传播的时候只要看着上面这个架构,一个一个节点得计算就行了,涉及到矩阵对矩阵的梯度,还是看教程给出的辅助文档 Backpropagation for a Linear Layer,看完就理解了。 TwoLayerNet loss部分为:

def loss(self, X, y=None, reg=0.0):
    """
    Compute the loss and gradients for a two layer fully connected neural
    network.

    Inputs:
    - X: Input data of shape (N, D). Each X[i] is a training sample.
    - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
      an integer in the range 0 <= y[i] < C. This parameter is optional; if it
      is not passed then we only return scores, and if it is passed then we
      instead return the loss and gradients.
    - reg: Regularization strength.

    Returns:
    If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
    the score for class c on input X[i].

    If y is not None, instead return a tuple of:
    - loss: Loss (data loss and regularization loss) for this batch of training
      samples.
    - grads: Dictionary mapping parameter names to gradients of those parameters
      with respect to the loss function; has the same keys as self.params.
    """
    # Unpack variables from the params dictionary
    W1, b1 = self.params['W1'], self.params['b1']
    W2, b2 = self.params['W2'], self.params['b2']
    N, D = X.shape

    # Compute the forward pass
    scores = None
    #############################################################################
    # TODO: Perform the forward pass, computing the class scores for the input. #
    # Store the result in the scores variable, which should be an array of      #
    # shape (N, C).                                                             #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    # layer 1
    hidden_in = X.dot(W1) + b1
    # out is pass the activation function:ReLU
    hidden_out = np.maximum(hidden_in, 0)
    # layer 2
    scores = hidden_out.dot(W2) + b2



    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    # If the targets are not given then jump out, we're done
    if y is None:
        return scores

    # Compute the loss
    loss = 0.0
    #############################################################################
    # TODO: Finish the forward pass, and compute the loss. This should include  #
    # both the data loss and L2 regularization for W1 and W2. Store the result  #
    # in the variable loss, which should be a scalar. Use the Softmax           #
    # classifier loss.                                                          #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    # to keep numerical calculate stablly,minus maximum
    scores = scores - np.max(scores,axis=1).reshape(-1,1)
    exp_scores = np.exp(scores)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    correct_logprobs = -np.log(probs[range(N),y])
    data_loss = np.sum(correct_logprobs) / N
    # strange by code test in .ipynb, here no need to multiply by 0.5
    reg_loss = reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
    loss = data_loss + reg_loss


    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    # Backward pass: compute gradients
    grads = {}
    #############################################################################
    # TODO: Compute the backward pass, computing the derivatives of the weights #
    # and biases. Store the results in the grads dictionary. For example,       #
    # grads['W1'] should store the gradient on W1, and be a matrix of same size #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    dscores = probs.copy()
    dscores[range(N), y] -= 1
    dscores /= N
    # according to dimension analysis to calculate grads
    grads['W2'] = np.dot(hidden_out.T, dscores) + 2 * reg * W2
    grads['b2'] = np.sum(dscores, axis=0)
    # do not forget the derivative of ReLU
    grad_hidden_out = np.dot(dscores, W2.T)
    grad_hidden_in = (hidden_out > 0) * grad_hidden_out

    grads['W1'] = np.dot(X.T, grad_hidden_in) + 2 * reg * W1
    grads['b1'] = np.sum(grad_hidden_in, axis=0)


    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    return loss, grads

还得补全一下train predict部分:
train自己要写的部分:

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# Cannot take a larger sample than population when 'replace=False'
# But batch_size can be larger than num_train if 'replace=True'

sample_index = np.random.choice(num_train, batch_size, replace=True)
X_batch = X[sample_index]
y_batch = y[sample_index]

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

#########################################################################
# TODO: Use the gradients in the grads dictionary to update the         #
# parameters of the network (stored in the dictionary self.params)      #
# using stochastic gradient descent. You'll need to use the gradients   #
# stored in the grads dictionary defined above.                         #
#########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

self.params['W1'] += -learning_rate * grads['W1']
self.params['b1'] += -learning_rate * grads['b1']
self.params['W2'] += -learning_rate * grads['W2']
self.params['b2'] += -learning_rate * grads['b2']

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

predict部分:

def predict(self, X):
    """
    Use the trained weights of this two-layer network to predict labels for
    data points. For each data point we predict scores for each of the C
    classes, and assign each data point to the class with the highest score.

    Inputs:
    - X: A numpy array of shape (N, D) giving N D-dimensional data points to
    classify.

    Returns:
    - y_pred: A numpy array of shape (N,) giving predicted labels for each of
    the elements of X. For all i, y_pred[i] = c means that X[i] is predicted
    to have class c, where 0 <= c < C.
    """
    y_pred = None

    ###########################################################################
    # TODO: Implement this function; it should be VERY simple!                #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    # do the forward pass
    # layer 1
    hidden_pred_in = X.dot(self.params['W1']) + self.params['b1']
    # out is pass the activation function:ReLU
    hidden_pred_out = np.maximum(hidden_pred_in, 0)
    # layer 2
    score = hidden_pred_out.dot(self.params['W2']) + self.params['b2']
    y_pred = np.argmax(score, axis=1)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    return y_pred

后来用验证集进行tune hyperparameters的时候,我试着从learning_rate,batch_size,regularization_strength,hidden_size这几个角度寻找最优的超参数组合,经过一段长时间(CPU运行)后,在迭代次数为2000次,学习率衰减指数为0.95的时候,最优的一组超参数为:lr = 0.001000, bs = 512, rs = 0.150000, hs = 150,验证集上的准确率能达到53.9%,测试机上的准确率为53.2%

features

这一节就是不直接在图像像素上训练分类器或者神经网络,而是在由图像提取的特征向量图上进行训练,具体是两种特征的拼接:

  • 方向梯度直方图HOG(Histogram of Oriented Gradient):主要提取纹理特征
  • 颜色直方图color histogram:主要提取颜色信息

这一节除了tune hyperparameter上需要写一下代码(其实跟之前two_layer_net大同小异),其他都是现成的

Train SVM on features

在特征上训练svm classifier,通过tune,得到最好的一组是:

lr 1.000000e-07 reg 5.000000e+05 train accuracy: 0.417265 val accuracy: 0.425000 test_accuracy:0.413

与之前直接在图像上训练的svm的最好的测试准确率0.361相比确实提高了不少。

Neural Network on image features

用验证集进行tune hyperparameters的时候,我试着从learning_rate,batch_size,regularization_strength这几个角度寻找最优的超参数组合,经过一段长时间(CPU运行)后,在迭代次数为1500次,学习率衰减指数为0.95的时候,最优的一组超参数为:lr = 0.300000, bs = 512, rs = 0.000300,验证集上的准确率能达到60.0%,测试机上的准确率为57.2%,达到了教程给出的参考:

you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.

链接

前后面的作业博文请见:

  • 上一篇的博文:svm/softmax
  • 下一篇的博文:Fully-Connected Neural Nets(全连接神经网络)

你可能感兴趣的:(CS231n Spring 2019 Assignment 1—two_layer_net/features)