CS231n课程作业一中,涉及到了SVM损失函数,经过研究,应该指的是hinge loss。其公式为:
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
if j != y[i]:
loss += margin
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
# Add regularization to the loss.
loss += 0.5 * reg * np.sum(W * W)
return loss, dW
def svm_loss_vectorized(W, X, y, reg):
"""
Structured SVM loss function, vectorized implementation.
Inputs and outputs are the same as svm_loss_naive.
"""
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
scores = X.dot(W)
num_train = X.shape[0]
rows = range(num_train)
correct_class_score = scores[rows,y]
margins = np.maximum(0,scores-np.reshape(correct_class_score,[num_train,1])+1)
margins[rows,y] = 0
loss = np.sum(margins)
loss /= num_train
loss += 0.5 * reg * np.sum(W * W)
pass
return loss, dW
对higne函数求导推导如下:
由公式 Li=∑j≠yimax(0,wTjxi−wTyixi+Δ) ,考虑两种情况,
j=yi :
此时,对 wj 求偏导,式 wTjxi−wTyixi+Δ 中只有第二项有效果,第一项为常数( j=yi )。如果 wTjxi−wTyixi+Δ>0 ,那么求得的值为 −xi ;如果 wTjxi−wTyixi+Δ≤0 ,由于 max 的作用,函数返回常数0,故偏导数同样为0。需要注意的是,此时由于 ∑ 的存在,导致偏导结果如下:
循环方式实现
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
if j != y[i]:
loss += margin
dW[:, y[i]] += -1 * X[i]
dW[:, j] += 1 * X[i]
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train
# Add regularization to the loss.
loss += 0.5 * reg * np.sum(W * W)
dW += reg*W
return loss, dW
def svm_loss_vectorized(W, X, y, reg):
"""
Structured SVM loss function, vectorized implementation.
Inputs and outputs are the same as svm_loss_naive.
"""
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
scores = np.dot(X,W)
num_train = X.shape[0]
rows = range(num_train)
correct_class_score = scores[rows,y]
margins = np.maximum(0,scores-np.reshape(correct_class_score,[num_train,1])+1)
margins[rows,y] = 0
loss = np.sum(margins)
loss /= num_train
loss += 0.5 * reg * np.sum(W * W)
margins01 = 1 * (margins > 0)
margins01[rows,y] = -1*np.sum(margins01, axis=1)
dW = np.dot(X.transpose(), margins01)
dW /= num_train
dW += reg * W
return loss, dW