摘要
这是我学习了斯坦福大学的cs231n课程有关神经网络部分的学习笔记,是我对自己的知识的复习和自己编程过程中出错的问题总结。主要按照作业实现的思路进行总结。文中的图片都来自课程ppt,TODO代码90%是我自己编写,剩下的部分参考网上其他网友。
两层神经网络
作业要求首先实现一个两层的简单神经网络。
正向传播:
需要注意的地方:
- 由于使用的是softmax函数计算loss,所以需要计算指数,容易造成数值爆炸,所以对每个样本同时减去该样本特征值中国的最大值再计算指数,而减去一个常数对计算梯度无影响。
- 在求每个样本最大值时
a = np.max(scores, axis=1,keepdims=True)
此处的keepdims=True必不可少,否则无法完成broadcast。
# Compute the forward pass
scores = None
#############################################################################
# TODO: Perform the forward pass, computing the class scores for the input. #
# Store the result in the scores variable, which should be an array of #
# shape (N, C). #
#############################################################################
h1=X.dot(W1)+b1.T
h12=np.maximum(0,h1)
h1=np.maximum(0,h1)
scores=h1.dot(W2)+b2.T
# If the targets are not given then jump out, we're done
if y is None:
return scores
# Compute the loss
loss = None
#############################################################################
# TODO: Finish the forward pass, and compute the loss. This should include #
# both the data loss and L2 regularization for W1 and W2. Store the result #
# in the variable loss, which should be a scalar. Use the Softmax #
# classifier loss. So that your results match ours, multiply the #
# regularization loss by 0.5 #
#############################################################################
pass
#此处为了防止指数运算数值爆炸,将得到的scores全部减去对应样本的最大值,
#再进行指数操作,而减去一个常数对计算梯度无影响
a = np.max(scores, axis=1,keepdims=True)
#此处的keepdims必不可少,否则无法完成broadcast
scores -= a
a = np.exp(scores)
c1 = np.sum(a, axis=1)
c2 = 1 / c1
c3 = a[np.arange(a.shape[0]), y]
b1 = c2 * c3
L1 = np.log(b1)
loss=-np.sum(L1)
#loss = -np.sum(np.log(a[np.arange(a.shape[0]), y] / np.sum(a, axis=1)))
loss /= X.shape[0]
loss += 0.5* reg * np.sum(W1 * W1)
loss +=0.5* reg * np.sum(W2 * W2)
反向传播:
利用反向传播一步步往回计算梯度。这里的代码用了两次的db1,变量名定义的不好,与真正的b1的梯度混淆了。
需要注意的地方:
- 当计算dc1的时候,多做了几步将dc1从向量转化为二维矩阵,同样是因为向量不支持broadcast。感觉我用的方法很笨,希望能找到更好的方法。
-
在计算da的时候需要特别注意,要将a[y]和其他的元素分开计算梯度,
从公式中可以看出,a[y]就是分子的部分,分子分母都用到了a[y],所以需要将这两部分的梯度相加。
可以看下面SVM的传播图,W有两部分传播,所以计算梯度时也需要将两部分相加,原因与softmax一样。
#将两部分梯度相加
da[:,np.arange(da.shape[1])]=dc1
da[np.arange(da.shape[0]),y]+=dc3
代码:
# Backward pass: compute gradients
grads = {}
#############################################################################
# TODO: Compute the backward pass, computing the derivatives of the weights #
# and biases. Store the results in the grads dictionary. For example, #
# grads['W1'] should store the gradient on W1, and be a matrix of same size #
#############################################################################
pass
db1=-1/b1*1/X.shape[0]
dc2=db1*c3
dc3=db1*c2
dc1=dc2*(-1/np.square(c1))
z1=np.zeros([dc1.shape[0],1])
#这几步是将dc1从向量转化为二维矩阵,因为向量不支持broadcast
z1[:,0]=dc1
dc1=z1
da=np.zeros(a.shape)
da[:,np.arange(da.shape[1])]=dc1
da[np.arange(da.shape[0]),y]+=dc3
#注意!对于求a[y]的导数时,c1和c3都用到了a[y],所以需要相加!
dscores=a*da
dW2=h12.T.dot(dscores)
dh12=dscores.dot(W2.T)
dh1=dh12*(h1>0)
dW1=X.T.dot(dh1)
db2=np.sum(dscores.T,axis=1)
db1=np.sum(dh1.T,axis=1)
dW1+=reg*W1
dW2+=reg*W2
grads['W1']=dW1
grads['W2']=dW2
grads['b1']=db1
grads['b2']=db2
训练网络
采用的是部分梯度下降法,所以需要先取样。
inde = np.random.choice(xrange(X.shape[0]), batch_size, replace=True)
X_batch = X[inde, :]
y_batch = y[inde]
参数更新:
#使用momentum更新参数
mu=0.9
v_w1=mu*v_w1-learning_rate*grads['W1']
self.params['W1'] +=v_w1
v_w2 = mu * v_w2 - learning_rate * grads['W2']
self.params['W2'] += v_w2
v_b1 = mu * v_b1 - learning_rate * grads['b1']
self.params['b1'] += v_b1
v_b2 = mu * v_b2 - learning_rate * grads['b2']
self.params['b2'] += v_b2
#使用SGD更新参数
# self.params['W1'] +=-learning_rate*grads['W1']
# self.params['W2'] += -learning_rate*grads['W2']
# self.params['b1'] += -learning_rate*grads['b1']
# self.params['b2'] +=-learning_rate*grads['b2']
用了momentum和sgd两种,更多的方法在下面多层神经网络会有详细说明。
以上所有部分是作业中Neural_net.py文件中需要我们实现的代码,全部完整代码将在最后贴出。
筛选参数
完成了上面每个部分后就可以用数据进行训练了,下面是ipython中筛选超参数选出最好的参数代码:
best_net = None # store the best model into this
#best parameters by Yan Wei:lr 0.000150 reg 0.040000 hs 100 val accuracy: 0.514000
#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained #
# model in best_net. #
# #
# To help debug your network, it may help to use visualizations similar to the #
# ones we used above; these visualizations will have significant qualitative #
# differences from the ones we saw above for the poorly tuned network. #
# #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to #
# write code to sweep through possible combinations of hyperparameters #
# automatically like we did on the previous exercises. #
#################################################################################
pass
input_size = 32 * 32 * 3
hidden_size_2 = 50
num_classes = 10
results = {}
best_val = -1
best_stats=-1
learning_rates = [1.5e-4,2e-4,3e-4]
regularization_strengths = [0.02,0.03,0.04]
hidden_size_test=[100]
for lr in learning_rates:
for reg in regularization_strengths:
for hs in hidden_size_test:
net2 = TwoLayerNet(input_size, hs, num_classes)
# Train the network
stats2 = net2.train(X_train, y_train, X_val, y_val,
num_iters=1000, batch_size=200,
learning_rate=lr, learning_rate_decay=0.95,
reg=reg, verbose=True)
# Predict on the validation set
val_acc2 = (net2.predict(X_val) == y_val).mean()
print 'Validation accuracy: ', val_acc2
print 'lr: %f reg: %f hs: %d'%(lr,reg,hs)
if val_acc2>best_val:
best_val=val_acc2
best_net=net2
best_stats=stats2
results[(lr,reg,hs)]=val_acc2
for lr, reg,hs in sorted(results):
val_accuracy = results[(lr, reg,hs)]
print 'lr %f reg %f hs %d val accuracy: %f' % (
lr, reg,hs, val_accuracy)
print 'best validation accuracy achieved during cross-validation: %f' % best_val
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats2['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
plt.plot(stats2['train_acc_history'], label='train')
plt.plot(stats2['val_acc_history'], label='val')
plt.legend(['train', 'val'])
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()
下面是运行的一些结果截图:
最终在测试集上的准确率为49.6%,由于时间原因只是随便挑选了几个超参数训练,所以准确率结果不高。
最后附上计算手稿,比较随意的草稿方便自己以后复习。
在(多层)神经网络中将实现多层的神经网络。