首先完成神经网络对scores和损失函数的计算,其中激活函数使用RELU函数,即max(0,x)函数。
neural_net.py的loss()函数
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
h1 = np.maximum(0,X.dot(W1) + b1)
scores = h1.dot(W2) + b2
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# 求指数 (N,C)
exp_scores = np.exp(scores)
# 求和,变为 (N,1)
row_sum = np.sum(exp_scores, axis=1).reshape(N, 1)
norm_scores = exp_scores / row_sum
data_loss = - 1 / N * np.sum(np.log(norm_scores[np.arange(N),y]))
reg_loss = 0.5 * reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
loss = data_loss + reg_loss
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
接下来是反向传播计算梯度,这部分有一定的难度,下面我将我自己的理解记录下来。
整体算法步骤,使用链式求导法则。每一个变量的梯度与该变量的原始大小保持一致。
下面是计算图,从左到右为神经网络正向传递,横线上方为计算得到的值;从右到左为反向传播,横线下方为梯度。
约定,所有的导数均为Loss对该变量的导数,即 d L o s s d 变 量 \frac{d_{Loss}}{d_{变量}} d变量dLoss,因此在程序中 d L o s s d_{Loss} dLoss省略不写,只写出分母。
dscores = norm_scores.copy()
dscores[range(N),y] -= 1
dscores /= N #(N,C)
db2 = np.sum(dscores,axis=0) #(C,)
dh1 = dscores.dot(W2.T) #(N,H)
dW2 = h1.T.dot(dscores) + reg * W2 #(H,C)
dRelu = (h1 > 0) * dh1 #(N,H)
dW1 = X.T.dot(dRelu) + reg * W1 #(D,H)
db1 = np.sum(dRelu,axis=0) #(H,)
grads['b2'] = db2
grads['W2'] = dW2
grads['W1'] = dW1
grads['b1'] = db1
Training data
主要完成随机选择数据和对参数进行更新
neural_net.py中的train()
idx = np.random.choice(num_train,batch_size,replace=True)
X_batch = X[idx,:]
y_batch = y[idx]
self.params['W2'] += -learning_rate * grads['W2']
self.params['b2'] += -learning_rate * grads['b2']
self.params['W1'] += -learning_rate * grads['W1']
self.params['b1'] += -learning_rate * grads['b1']
predict()
scores = self.loss(X)
y_pred = np.argmax(scores, axis=1)
接下来通过自己验证来选择超参,验证步骤:
效果:当hidden_size=150, reg=0.09, learning_rate=1e-3时,准确率达到53.7%
代码:
input_size = 32 * 32 * 3
hidden_size = [100,125,150]
num_classes = 10
reg = [0.03,0.05,0.09]
learing_rate = [1e-3]
best_acc = 0.40
for hs in hidden_size:
net = TwoLayerNet(input_size, hs, num_classes)
for r in reg:
for lr in learing_rate:
# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
num_iters=2000, batch_size=200,
learning_rate=lr, learning_rate_decay=0.95,
reg=r, verbose=False)
# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print('hidden_size:%f,reg:%f,learning_rate:%f'%(hs,r,lr))
print('Validation accuracy: ', val_acc)
if (val_acc > best_acc):
best_acc = val_acc
best_net = net
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Classification accuracy')
plt.legend()
plt.show()
pass
将网络中W1可视化:
最后在测试数据集上的准确率为:Test accuracy: 0.545
Inline Question
Now that you have trained a Neural Network classifier, you may find that your testing accuracy is much lower than the training accuracy. In what ways can we decrease this gap? Select all that apply.
Y o u r A n s w e r : \color{blue}{\textit Your Answer:} YourAnswer:
1、2、3
Y o u r E x p l a n a t i o n : \color{blue}{\textit Your Explanation:} YourExplanation:
当间隙很大时,很可能发生了过拟合,因此增加训练集、增加隐藏层单元数、增加正则化参数都可以降低过拟合程度,从而减小间隙大小。
参考文章:
cs231n的第一次作业2层神经网络