Lr(b1,b2,y)=∑Ni=1{12(1−yi)||bi,1−bi,2||22+12yimax(m−||bi,1−bi,2||22,0)+α(|||bi,1|−1||1+|||bi,2|−1||1)}
Specially, for a pair of images I1,I2∈Ω and the corresponding binary network outputs b1,b2 , we dfine y=0 if they are similar, and y=1 otherwise.
creates a criterion that measures the mean absolute value of the element-wise difference between input x and target y:
loss(x,y)=1n∑|xi−yi|
loss(x,y)=1n|xi−yi|2
measures the mean squared error between n elements in the input x and target y .
the loss can be described as :
loss(x,class)=−logexclass∑jexj=−xclass+log(∑jexj)
or in the case of the weights argument being specified :
loss(x,class)=wclass∗(−xclass+log(∑jexj))
the loss can be described as :
loss(x,class)=−xclass
or ine the case of the weights argument it is specified as follows:
loss(x,class)=−wclass∗xclass
the loss can be described as:
loss(x,target)=1n∑(targeti∗(log(targeti)−xi))
creates a criterion that measures the Binary Cross Entropy between the target and the output:
loss(o,t)=−1n∑i(ti∗log(oi)+(1−ti)∗log(1−oi))
or in the case of the weights argument being specified:
loss(o,t)=−1n∑iwi(ti∗log(oi)+(1−ti)∗log(1−oi))
This is used for measureing the error of a reconstruction in for example an auto-encoder. Note that the targets ti should be numbers between 0 and 1.
this Binary Cross Entropy between the target and the output logits( no sigmoid applied) is:
loss(o,t)=−1n∑i(ti∗log(sigmoid(oi))+(1−ti)∗log(1−sigmoid(oi)))
or in the case of the weights argument being specified:
loss(o,t)=−1n∑iwi(ti∗log(sigmoid(oi))+(1−ti)∗log(1−sigmoid(oi)))
if y==1 then it assumed the first input should be ranked higher(have a larger ) than the second input , and vice-versa for y==-1.
the loss function for each sample in the mini-batch is:
loss(x,y)=max(0,−y∗(x1−x2)+margin)
loss(x,y)=1n{xi, max(0,margin−xi),yi==1yi==−1
loss(x,y)=∑i,j(max(0,1−(xyj−xi)))x.size(0)
where i=0 to x.size(0), j=0 to y.size(0), yj!=0 , and i!=yj for all i and j.
x and y must have the same size.
The criterion only considers the first non zero yj targets. This allows for different samples to have variable amounts of target calsses
Creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise. It is less sensitive to outliers than the MSELoss and in some cases prevents exploding gradients (e.g. see “Fast R-CNN” paper by Ross Girshick). Also known as the Huber loss:
loss(x,y)=1n∑{0.5∗(xi−yi)2,|xi−yi|<1 |xi−yi|−0.5,otherwise
loss(x,y)=∑ilog(1+e−yi∗xi)x.nelement()
loss(x,y)=−∑i(yi∗log11+e−xi+(1−yi)∗loge−xi1+e−xi)
where i==0 to x.nelement-1, yi in {0, 1}
and x must have the same size.
loss(x,y)={1−cos(x1,x2),y==1 max(0,cos(x1,x2)−margin),y==−1
loss(x,y)=∑imax(0,(margin−xy+xi))px.size(0)
where i==0 to x.size(0) and i!=y.
Optionally, you can give non-equal weighting on classes by passing a 1D weights tensor into the constructor.
The loss function then becomes:
loss(x,y)=∑imax(0,wy∗(margin−xy+xi))px.size(0)
The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V.Balntas, E.Riba et al.
L(a,p,n)=1N(∑Ni=1max{d(ai,pi)−d(ai,ni)+margin,0})
where d(xi,yi)=||xi−yi||22