cs231n assignment(1.4):two_layer_net

two_layer_net

这个练习中,使用一个两层全连接的神经网络用以分类工作
各层的情况是:
input - fully connected layer - ReLU - fully connected layer - softmax

两层的神经网络计算loss和梯度

def ReLU(x):    
    #ReLU non-linearity.   
    return np.maximum(0, x)
  def loss(self, X, y=None, reg=0.0):
    # Unpack variables from the params dictionary
    W1, b1 = self.params['W1'], self.params['b1']
    W2, b2 = self.params['W2'], self.params['b2']
    N, D = X.shape
    # Compute the forward pass
    scores = None

    s1 = X.dot(W1)+b1
    h1 = ReLU(s1)
    scores = h1.dot(W2)+b2    

    # If the targets are not given then jump out, we're done
    if y is None:
      return scores

    # Compute the loss
    loss = None
    f_max = np.max(scores,axis=1,keepdims=True)
    f_scores = scores - f_max
    prob = np.exp(f_scores)/np.sum(np.exp(f_scores),axis=1,keepdims=True)
    loss = np.sum(-np.log(prob[np.arange(N),y]))
    loss = loss/N + 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)

    # Backward pass: compute gradients
    grads = {}

    dscores = prob
    dscores[np.arange(N),y] -= 1
    dscores /= N
    #Second Layer
    dh1 = dscores.dot(W2.T)
    dW2 = h1.T.dot(dscores)
    db2 = np.sum(dscores,axis=0)
    #ReLU
    dh1[s1<=0]=0
    ds1 = dh1
    #First Layer
    dW1 = np.dot(X.T,ds1)
    db1 = np.sum(ds1,axis=0)
    #Reg
    dW1 += reg*W1
    dW2 += reg*W2
    #Store
    grads['W1'] = dW1
    grads['W2'] = dW2
    grads['b1'] = db1
    grads['b2'] = db2             
    return loss, grads

关于调试超参数:
loss的下降类似于线性,意味着学习率可能太低。训练和验证集精确度之间没有gap,说明模型capacity低,所以我们应该增大模型的size.因为更大的模型会更overfitting。

你可能感兴趣的:(人工智能,cs231n)