机器学习(4)-理解SVM的损失函数和梯度表达式的实现+编程总结

1.对于SVM这里总结下代码实现SVM的两种代码实现(for循环和矩阵操作)
首先看看SVM的损失函数,梯度函数
损失函数最终形式:
L = (1/N)∑iLi + λR(W)
其中:
R(W) = ∑k∑s (Wk,s)2
Li = ∑j≠yi max(0, (xiW)j−(xyiW)j+Δ)
理解:yi表示提前知道第yi个分类是正确的。比yi大的评分表示判断错的得分,因为如果正确,第yi个分类的得分应该是最高的。

梯度函数最终形式:
∇Wyi Li = - xiT(∑j≠yi 1(xiWj - xiWyi +1>0)) + 2λWyi (j = y[i])
∇Wj Li = xiT 1(xiWj - xiWyi +1>0) + 2λWj , (j≠yi) (j != y[i])

其中,注意1(·)是示性函数,其取值规则为:1(表达式为真) =1;1(表达式为假) =0。
理解:这里分了两种情况,当计算dW矩阵的时候,计算每一张图的dW矩阵,第y[i]列的计算要注意(我觉得这里不好懂),计算其他列的计算比较好理解。
下面先看for循环的实现代码:

def svm_loss_naive(W, X, y, reg):


  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:#也就是只有1(xiWj - xiWyi +1>0)的时候才进行loss和gradient的计算
        loss += margin
        dW[:,y[i]]+= -X[i,:]
        dW[:,j] += X[i,:]
        #上面两行代码这样理解,每张图片输入,就是i循环一次,对应的j循环了num_class次,就是把dW矩阵的每一列都遍历到了,(也就是每张图对应一个dW,i循环完了,就是将num_train个dW矩阵叠加了一遍),这里dW[:,y[i]]+= -X[i,:]在每一次i循环都执行,就是第y[i]列是其他列的和,其他列只循环了一次。
         if margin > 0:表示有错误的评分出现,这个时候才会产生损失值,所以这里才会有loss的计算

  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW = dW/num_train
  # Add regularization to the loss.
  loss += 0.5*reg * np.sum(W * W)
  dW = dW+ reg*W
  return loss, dW

下面看看矩阵操作的实现:

def svm_loss_vectorized(W, X, y, reg):

  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero
  num_classes = W.shape[1]
  num_train = X.shape[0]
  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the structured SVM loss, storing the    #
  # result in loss.                                                           #
  #############################################################################
  scores = X.dot(W)
  score_correct = scores[np.arange(num_train),y]#根据已知的y,找出本应该得分最高的分类得分,存放入一个矩阵,这个矩阵大小【1*N】

  score_correct = np.reshape(score_correct,(num_train,1))#必须这样才能把矩阵变为[N*1],不能用转置,python中转置操作对以为矩阵不起作用(有点坑)
  margin_matrix = scores - score_correct +1#矩阵化实现(xiW)j−(xyiW)j+Δ

  margin_matrix[np.arange(num_train),y] =0 #因为loss公式中指明j=y[i]的列不参与计算,因为这一列是本应该分类正确的,其他列都是和这一列作比较。

  margin_matrix[margin_matrix<=0] =0

  loss = np.sum(margin_matrix)#计算所有loss的和

  loss /= num_train
  loss = loss + 0.5*reg*np.sum(W*W)

  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################

  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the gradient for the structured SVM     #
  # loss, storing the result in dW.                                           #
  #                                                                           #
  # Hint: Instead of computing the gradient from scratch, it may be easier    #
  # to reuse some of the intermediate values that you used to compute the     #
  # loss.                                                                     #
  #############################################################################
  margin_matrix[margin_matrix>0] =1#这一步是对用梯度公式中 1(xiWj - xiWyi +1>0))  这一项

  row_sum = np.sum(margin_matrix, axis=1)#将所有非正确分类,且这个非正确分类的评分要大于正确的评分,做个求和
  margin_matrix[np.arange(num_train), y] = -row_sum
#上面两句代码对应for循环中的dW[:,y[i]]+= -X[i,:]这一句
  dW = np.dot(X.T, margin_matrix) / num_train + reg * W
 #然后根据矩阵的对应关系进行计算,就是维度关系
  return loss, dW

后面会根据梯度下降发来跟新W
其实代码就一句话

self.W += -learning_rate * grad

只是这里的grad一般使用很小的batch来计算的,为了加快运算速度而已。

你可能感兴趣的:(python,机器学习,svm,python)