到目前为止,作业1(assignment1)里面就剩两个ipynb作业了:two_layer_net.ipynb和features.ipynb。有了前面的基础,这两个作业并不难完成,虽然课程官网上有三个关于神经网络的笔记,但是实际做作业的时候好像没太用上里面的东西:
- Neural Networks Part 1: Setting up the Architecture
- Neural Networks Part 2: Setting up the Data and the Loss
- Neural Networks Part 3: Learning and Evaluation
鉴于里面内容还是非常重要的,涉及预处理,激活函数,权重初始化,batch normalization,正则化,优化方法,所以之后有机会会集中写一篇总结一下里面的要点。主要还是看一下lecture 4就好了,里面最大的收获就是学会用计算图来进行反向传播的推导。
two_layer_net
针对之前的线性模型,神经网络模型(在这节里面就是全连接神经网络)又是一种更新的方法,主要是增加了非线性成分,能够解决非线性问题,在做作业之前可以看这一篇教程笔记:Putting it together: Minimal Neural Network Case Study,能够看到对于一些非线性问题不能用一条直线或一个超平面将其分开,而神经网络模型就派上用场了。一个2层全连接神经网络模型如下(我们不把输入层算在层数内):
其实各个层之间也是一个线性映射,只是之后会进行一次激活而已,比如在这次的 two_layer_net.ipynb里面模型架构就是:
input - fully connected layer - ReLU - fully connected layer - softmax
作业跟前面的softmax差不多,只是多了一层的计算,反向传播的时候只要看着上面这个架构,一个一个节点得计算就行了,涉及到矩阵对矩阵的梯度,还是看教程给出的辅助文档 Backpropagation for a Linear Layer,看完就理解了。
TwoLayerNet loss
部分为:
def loss(self, X, y=None, reg=0.0):
"""
Compute the loss and gradients for a two layer fully connected neural
network.
Inputs:
- X: Input data of shape (N, D). Each X[i] is a training sample.
- y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
an integer in the range 0 <= y[i] < C. This parameter is optional; if it
is not passed then we only return scores, and if it is passed then we
instead return the loss and gradients.
- reg: Regularization strength.
Returns:
If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
the score for class c on input X[i].
If y is not None, instead return a tuple of:
- loss: Loss (data loss and regularization loss) for this batch of training
samples.
- grads: Dictionary mapping parameter names to gradients of those parameters
with respect to the loss function; has the same keys as self.params.
"""
# Unpack variables from the params dictionary
W1, b1 = self.params['W1'], self.params['b1']
W2, b2 = self.params['W2'], self.params['b2']
N, D = X.shape
# Compute the forward pass
scores = None
#############################################################################
# TODO: Perform the forward pass, computing the class scores for the input. #
# Store the result in the scores variable, which should be an array of #
# shape (N, C). #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# layer 1
hidden_in = X.dot(W1) + b1
# out is pass the activation function:ReLU
hidden_out = np.maximum(hidden_in, 0)
# layer 2
scores = hidden_out.dot(W2) + b2
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# If the targets are not given then jump out, we're done
if y is None:
return scores
# Compute the loss
loss = 0.0
#############################################################################
# TODO: Finish the forward pass, and compute the loss. This should include #
# both the data loss and L2 regularization for W1 and W2. Store the result #
# in the variable loss, which should be a scalar. Use the Softmax #
# classifier loss. #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# to keep numerical calculate stablly,minus maximum
scores = scores - np.max(scores,axis=1).reshape(-1,1)
exp_scores = np.exp(scores)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
correct_logprobs = -np.log(probs[range(N),y])
data_loss = np.sum(correct_logprobs) / N
# strange by code test in .ipynb, here no need to multiply by 0.5
reg_loss = reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
loss = data_loss + reg_loss
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# Backward pass: compute gradients
grads = {}
#############################################################################
# TODO: Compute the backward pass, computing the derivatives of the weights #
# and biases. Store the results in the grads dictionary. For example, #
# grads['W1'] should store the gradient on W1, and be a matrix of same size #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
dscores = probs.copy()
dscores[range(N), y] -= 1
dscores /= N
# according to dimension analysis to calculate grads
grads['W2'] = np.dot(hidden_out.T, dscores) + 2 * reg * W2
grads['b2'] = np.sum(dscores, axis=0)
# do not forget the derivative of ReLU
grad_hidden_out = np.dot(dscores, W2.T)
grad_hidden_in = (hidden_out > 0) * grad_hidden_out
grads['W1'] = np.dot(X.T, grad_hidden_in) + 2 * reg * W1
grads['b1'] = np.sum(grad_hidden_in, axis=0)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return loss, grads
还得补全一下train predict
部分:
train自己要写的部分:
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# Cannot take a larger sample than population when 'replace=False'
# But batch_size can be larger than num_train if 'replace=True'
sample_index = np.random.choice(num_train, batch_size, replace=True)
X_batch = X[sample_index]
y_batch = y[sample_index]
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#########################################################################
# TODO: Use the gradients in the grads dictionary to update the #
# parameters of the network (stored in the dictionary self.params) #
# using stochastic gradient descent. You'll need to use the gradients #
# stored in the grads dictionary defined above. #
#########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
self.params['W1'] += -learning_rate * grads['W1']
self.params['b1'] += -learning_rate * grads['b1']
self.params['W2'] += -learning_rate * grads['W2']
self.params['b2'] += -learning_rate * grads['b2']
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
predict部分:
def predict(self, X):
"""
Use the trained weights of this two-layer network to predict labels for
data points. For each data point we predict scores for each of the C
classes, and assign each data point to the class with the highest score.
Inputs:
- X: A numpy array of shape (N, D) giving N D-dimensional data points to
classify.
Returns:
- y_pred: A numpy array of shape (N,) giving predicted labels for each of
the elements of X. For all i, y_pred[i] = c means that X[i] is predicted
to have class c, where 0 <= c < C.
"""
y_pred = None
###########################################################################
# TODO: Implement this function; it should be VERY simple! #
###########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# do the forward pass
# layer 1
hidden_pred_in = X.dot(self.params['W1']) + self.params['b1']
# out is pass the activation function:ReLU
hidden_pred_out = np.maximum(hidden_pred_in, 0)
# layer 2
score = hidden_pred_out.dot(self.params['W2']) + self.params['b2']
y_pred = np.argmax(score, axis=1)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return y_pred
后来用验证集进行tune hyperparameters的时候,我试着从learning_rate,batch_size,regularization_strength,hidden_size这几个角度寻找最优的超参数组合,经过一段长时间(CPU运行)后,在迭代次数为2000次,学习率衰减指数为0.95的时候,最优的一组超参数为:lr = 0.001000, bs = 512, rs = 0.150000, hs = 150,验证集上的准确率能达到53.9%,测试机上的准确率为53.2%
features
这一节就是不直接在图像像素上训练分类器或者神经网络,而是在由图像提取的特征向量图上进行训练,具体是两种特征的拼接:
- 方向梯度直方图HOG(Histogram of Oriented Gradient):主要提取纹理特征
- 颜色直方图color histogram:主要提取颜色信息
这一节除了tune hyperparameter上需要写一下代码(其实跟之前two_layer_net大同小异),其他都是现成的
Train SVM on features
在特征上训练svm classifier,通过tune,得到最好的一组是:
lr 1.000000e-07 reg 5.000000e+05 train accuracy: 0.417265 val accuracy: 0.425000 test_accuracy:0.413
与之前直接在图像上训练的svm的最好的测试准确率0.361相比确实提高了不少。
Neural Network on image features
用验证集进行tune hyperparameters的时候,我试着从learning_rate,batch_size,regularization_strength这几个角度寻找最优的超参数组合,经过一段长时间(CPU运行)后,在迭代次数为1500次,学习率衰减指数为0.95的时候,最优的一组超参数为:lr = 0.300000, bs = 512, rs = 0.000300,验证集上的准确率能达到60.0%,测试机上的准确率为57.2%,达到了教程给出的参考:
you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.
链接
前后面的作业博文请见:
- 上一篇的博文:svm/softmax
- 下一篇的博文:Fully-Connected Neural Nets(全连接神经网络)