多类支持向量机损失 Multiclass Support Vector Machine Loss+Python代码实战Mnist数据集! |
min w c , b c 1 2 ∑ c = 1 C w c T w c + θ ∑ i = 1 N ∑ c = 1 , c ≠ y i C max ( 0 , w c T x i + b c − w y i T x i − b y i + Δ ) (12) \min _{w_{c}, b_{c}} \frac{1}{2} \sum_{c=1}^{C} \mathbf{w}_{c}^{T} \mathbf{w}_{c}+\theta \sum_{i=1}^{N} \sum_{c=1, c \neq y_{i}}^{C} \max \left(0,\mathbf{w}_{c}^{T} x_{i}+b_{c}-\mathbf{w}_{y_{i}}^{T} x_{i}-b_{y_{i}}+\Delta\right)\tag{12} wc,bcmin21c=1∑CwcTwc+θi=1∑Nc=1,c=yi∑Cmax(0,wcTxi+bc−wyiTxi−byi+Δ)(12)
矩阵 W \mathbf{W} W 代表权值,维度是 D ∗ C D∗C D∗C,其中 D D D代表特征的维度, C C C 代表类别数目。
矩阵 X \mathbf{X} X 代表样本集合,维度是 N ∗ D N∗D N∗D, 其中 N N N 代表样本个数。
分值计算公式为 f = X f=\mathbf{X} f=X∗ W \mathbf{W} W,其维度为 N ∗ C N∗C N∗C, 每行代表一个样本的不同类别的分值。
对于第 i i i 个样本的损失函数(这里没有正则项)计算如下:
L i = ∑ c ≠ y i max ( 0 , W : , c T x i , : − W : , y i T x i , : + Δ ) (13) L_{i}=\sum_{c \neq y_{i}} \max \left(0, \mathbf{W}_{ :, c}^{T} x_{i, :}-\mathbf{W}_{ :, y_{i}}^{T} x_{i, :}+\Delta\right)\tag{13} Li=c=yi∑max(0,W:,cTxi,:−W:,yiTxi,:+Δ)(13)
偏导数计算如下:
∂ L i ∂ W : , y i = − ( ∑ c ≠ y i 1 ( w : , c T x i , i − w : y i T x i , i + Δ > 0 ) ) x i , : (14) \frac{\partial L_{i}}{\partial \mathbf{W}_{ :, y_{i}}}=-\left(\sum_{c \neq y_{i}} 1\left(\mathbf{w}_{ :, c}^{T} x_{i, i}-\mathbf{w}_{ : y_{i}}^{T} x_{i, i}+\Delta>0\right)\right) x_{i, :}\tag{14} ∂W:,yi∂Li=−⎝⎛c=yi∑1(w:,cTxi,i−w:yiTxi,i+Δ>0)⎠⎞xi,:(14)
同时 ∂ L i ∂ W : , c = 1 ( w : , c T x i , : − w : , y i T x i , : + Δ > 0 ) x i , : (15) \frac{\partial L_{i}}{\partial \mathbf{W}_{ :, c}}=1\left(\mathbf{w}_{ :, c}^{T} x_{i, :}-\mathbf{w}_{ :, y_{i}}^{T} x_{i, :}+\Delta>0\right) x_{i, :}\tag{15} ∂W:,c∂Li=1(w:,cTxi,:−w:,yiTxi,:+Δ>0)xi,:(15)
其中:
from __future__ import division
import numpy as np
"""
@author: Devinzhang 2019-10-01
This dataset is part of MNIST dataset,but there is only 3 classes,
classes = {0:'0',1:'1',2:'2'},and images are compressed to 14*14
pixels and stored in a matrix with the corresponding label, at the
end the shape of the data matrix is
num_of_images x 14*14(pixels)+1(lable)
"""
def load_data(split_ratio):
tmp=np.load("data.npy")
data=tmp[:,:-1]
label=tmp[:,-1]
mean_data=np.mean(data,axis=0)
train_data=data[int(split_ratio*data.shape[0]):]-mean_data
train_label=label[int(split_ratio*data.shape[0]):]
test_data=data[:int(split_ratio*data.shape[0])]-mean_data
test_label=label[:int(split_ratio*data.shape[0])]
return train_data,train_label,test_data,test_label
"""compute the hingle loss without using vector operation,
While dealing with a huge dataset,this will have low efficiency
X's shape [n,14*14+1],Y's shape [n,],W's shape [num_class,14*14+1]"""
def lossAndGradNaive(X,Y,W,reg):
dW=np.zeros(W.shape)
loss = 0.0
num_class=W.shape[0]
num_X=X.shape[0]
for i in range(num_X):
scores=np.dot(W,X[i])
cur_scores=scores[int(Y[i])]
for j in range(num_class):
if j==Y[i]:
continue
margin=scores[j]-cur_scores+1
if margin>0:
loss+=margin
dW[j,:]+=X[i]
dW[int(Y[i]),:]-=X[i]
loss/=num_X
dW/=num_X
loss+=reg*np.sum(W*W)
dW+=2*reg*W
return loss,dW
def lossAndGradVector(X,Y,W,reg):
dW=np.zeros(W.shape)
N=X.shape[0]
Y_=X.dot(W.T)
margin=Y_-Y_[range(N),Y.astype(int)].reshape([-1,1])+1.0
margin[range(N),Y.astype(int)]=0.0
margin=(margin>0)*margin
loss=0.0
loss+=np.sum(margin)/N
loss+=reg*np.sum(W*W)
"""For one data,the X[Y[i]] has to be substracted several times"""
countsX=(margin>0).astype(int)
countsX[range(N),Y.astype(int)]=-np.sum(countsX,axis=1)
dW+=np.dot(countsX.T,X)/N+2*reg*W
return loss,dW
def predict(X,W):
X=np.hstack([X, np.ones((X.shape[0], 1))])
Y_=np.dot(X,W.T)
Y_pre=np.argmax(Y_,axis=1)
return Y_pre
def accuracy(X,Y,W):
Y_pre=predict(X,W)
acc=(Y_pre==Y).mean()
return acc
def model(X,Y,alpha,steps,reg):
X=np.hstack([X, np.ones((X.shape[0], 1))])
W = np.random.randn(3,X.shape[1]) * 0.0001
for step in range(steps):
loss,grad=lossAndGradNaive(X,Y,W,reg)
W-=alpha*grad
print("The {} step, loss={}, accuracy={}".format(step,loss,accuracy(X[:,:-1],Y,W)))
return W
train_data,train_label,test_data,test_label=load_data(0.2)
W=model(train_data,train_label,0.0001,25,0.5)
print("Test accuracy of the model is {}".format(accuracy(test_data,test_label,W)))
ssh://[email protected]:22/home/zhangkf/anaconda3/envs/tf2c/bin/python -u /home/zhangkf/tf/TF1/svm_test/SVM_Version1.py
The 0 step, loss=1.9669065288495866, accuracy=0.9075144508670521
The 1 step, loss=0.5496390813958856, accuracy=0.9248554913294798
The 2 step, loss=0.42061570363890477, accuracy=0.9364161849710982
The 3 step, loss=0.3262427878538634, accuracy=0.9421965317919075
The 4 step, loss=0.2702680562078132, accuracy=0.9595375722543352
The 5 step, loss=0.21727828871056154, accuracy=0.9653179190751445
The 6 step, loss=0.16696083041345366, accuracy=0.9710982658959537
The 7 step, loss=0.1301956858983133, accuracy=0.9826589595375722
The 8 step, loss=0.1055701823919878, accuracy=0.9826589595375722
The 9 step, loss=0.08860423223431467, accuracy=0.9884393063583815
The 10 step, loss=0.07615233446374876, accuracy=0.9884393063583815
The 11 step, loss=0.06636955383351115, accuracy=0.9884393063583815
The 12 step, loss=0.06109815579247061, accuracy=0.9884393063583815
The 13 step, loss=0.05525243126969201, accuracy=0.9884393063583815
The 14 step, loss=0.04872271524678853, accuracy=0.9884393063583815
The 15 step, loss=0.04280360346485378, accuracy=0.9884393063583815
The 16 step, loss=0.036075227483702635, accuracy=0.9884393063583815
The 17 step, loss=0.03085022739599956, accuracy=0.9884393063583815
The 18 step, loss=0.024139528055343917, accuracy=0.9942196531791907
The 19 step, loss=0.017860570111603725, accuracy=0.9942196531791907
The 20 step, loss=0.013240863121183073, accuracy=0.9942196531791907
The 21 step, loss=0.00984510428960199, accuracy=1.0
The 22 step, loss=0.007724562778701633, accuracy=1.0
The 23 step, loss=0.00481805318548185, accuracy=1.0
The 24 step, loss=0.0029332622896068253, accuracy=1.0
Test accuracy of the model is 0.9767441860465116
Process finished with exit code 0
def svm_loss_naive(W, X, y, reg):
"""
# SVM 损失函数 native版本
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
# 对于每一个样本,累加loss
for i in xrange(num_train):
scores = X[i].dot(W) # (1, C)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
continue
# 根据 SVM 损失函数计算
margin = scores[j] - correct_class_score + 1 # note delta = 1
# 当 margin>0 时,才会有损失,此时也会有梯度的累加
if margin > 0: # max(0, yi - yc + 1)
loss += margin
# 根据公式:∇Wyi Li = - xiT(∑j≠yi1(xiWj - xiWyi +1>0)) + 2λWyi
dW[:, y[i]] += -X[i, :] # y[i] 是正确的类
# 根据公式: ∇Wj Li = xiT 1(xiWj - xiWyi +1>0) + 2λWj ,
dW[:, j] += X[i, :]
# 训练数据平均损失
loss /= num_train
dW /= num_train
# 正则损失
loss += 0.5 * reg * np.sum(W * W)
dW += reg * W
return loss, dW
def svm_loss_vectorized(W, X, y, reg):
"""
SVM 损失函数 向量化版本
Structured SVM loss function, vectorized implementation.Inputs and outputs
are the same as svm_loss_naive.
"""
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
scores = X.dot(W) # N by C 样本数*类别数
num_train = X.shape[0]
num_classes = W.shape[1]
scores_correct = scores[np.arange(num_train), y]
scores_correct = np.reshape(scores_correct, (num_train, 1)) # N*1 每个样本的正确类别
margins = scores - scores_correct + 1.0 # N by C 计算scores矩阵中每一处的损失
margins[np.arange(num_train), y] = 0.0 # 每个样本的正确类别损失置0
margins[margins <= 0] = 0.0 # max(0, x)
loss += np.sum(margins) / num_train # 累加所有损失,取平均
loss += 0.5 * reg * np.sum(W * W) # 正则
# compute the gradient
margins[margins > 0] = 1.0 # max(0, x) 大于0的梯度计为1
row_sum = np.sum(margins, axis=1) # N*1 每个样本累加
margins[np.arange(num_train), y] = -row_sum # 类正确的位置 = -梯度累加
dW += np.dot(X.T, margins)/num_train + reg * W # D by C
return loss, dW
def L_i(x, y, W):
"""
unvectorized version. Compute the multiclass svm loss for a single example (x,y)
- x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
with an appended bias dimension in the 3073-rd position (i.e. bias trick)
- y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
- W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
"""
delta = 1.0 # see notes about delta later in this section
scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
correct_class_score = scores[y]
D = W.shape[0] # number of classes, e.g. 10
loss_i = 0.0
for j in xrange(D): # iterate over all wrong classes
if j == y:
# skip for the true class to only loop over incorrect classes
continue
# accumulate loss for the i-th example
loss_i += max(0, scores[j] - correct_class_score + delta)
return loss_i
def L_i_vectorized(x, y, W):
"""
A faster half-vectorized implementation. half-vectorized
refers to the fact that for a single example the implementation contains
no for loops, but there is still one loop over the examples (outside this function)
"""
delta = 1.0
scores = W.dot(x)
# compute the margins for all classes in one vector operation
margins = np.maximum(0, scores - scores[y] + delta)
# on y-th position scores[y] - scores[y] canceled and gave delta. We want
# to ignore the y-th position and only consider margin on max wrong class
margins[y] = 0
loss_i = np.sum(margins)
return loss_i
def L(X, y, W):
"""
fully-vectorized implementation :
- X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)
- y is array of integers specifying correct class (e.g. 50,000-D array)
- W are weights (e.g. 10 x 3073)
"""
# evaluate loss over all examples in X without using any for loops
# left as exercise to reader in the assignment