SVM/hinge loss function

SVM/hinge loss function

loss function

CS231n课程作业一中,涉及到了SVM损失函数,经过研究,应该指的是hinge loss。其公式为:

Li=jyimax(0,wTjxiwTyixi+Δ)

  • 循环方式实现:
def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        if j != y[i]:
          loss += margin


  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train

  # Add regularization to the loss.
  loss += 0.5 * reg * np.sum(W * W)

  return loss, dW
  • 对应的向量化代码(自己实现):
def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero

  scores = X.dot(W)
  num_train = X.shape[0]
  rows = range(num_train)
  correct_class_score = scores[rows,y]
  margins = np.maximum(0,scores-np.reshape(correct_class_score,[num_train,1])+1)
  margins[rows,y] = 0
  loss = np.sum(margins)
  loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)
  pass

  return loss, dW

gradient of loss function

对higne函数求导推导如下:
由公式 Li=jyimax(0,wTjxiwTyixi+Δ) ,考虑两种情况,

  • jyi :
    此时,对 wj 求偏导,式 wTjxiwTyixi+Δ 中只有第一项有效果,第二项为常数。如果 wTjxiwTyixi+Δ>0 ,那么求得的值为 xi ;如果 wTjxiwTyixi+Δ0 ,由于 max 的作用,函数返回常数0,故偏导数为0。综上,可知,此时的偏导为:
    wyiLi=1(wTjxiwTyixi+Δ>0)xi
  • j=yi
    此时,对 wj 求偏导,式 wTjxiwTyixi+Δ 中只有第二项有效果,第一项为常数( j=yi )。如果 wTjxiwTyixi+Δ>0 ,那么求得的值为 xi ;如果 wTjxiwTyixi+Δ0 ,由于 max 的作用,函数返回常数0,故偏导数同样为0。需要注意的是,此时由于 的存在,导致偏导结果如下:

    wyiLi=jyi1(wTjxiwTyixi+Δ>0)xi

  • 循环方式实现

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        if j != y[i]:
          loss += margin
          dW[:, y[i]] += -1 * X[i]
          dW[:, j] += 1 * X[i]


  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW /= num_train

  # Add regularization to the loss.
  loss += 0.5 * reg * np.sum(W * W)
  dW += reg*W

  return loss, dW
  • 向量化实现
def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero

  scores = np.dot(X,W)
  num_train = X.shape[0]
  rows = range(num_train)
  correct_class_score = scores[rows,y]
  margins = np.maximum(0,scores-np.reshape(correct_class_score,[num_train,1])+1)
  margins[rows,y] = 0
  loss = np.sum(margins)
  loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)

  margins01 = 1 * (margins > 0)
  margins01[rows,y] = -1*np.sum(margins01, axis=1)
  dW = np.dot(X.transpose(), margins01)
  dW /= num_train
  dW += reg * W

  return loss, dW

你可能感兴趣的:(Deep,Learning)