Specially, for a pair of images I1,I2∈Ω and the corresponding binary network outputs b1,b2 , we dfine y=0 if they are similar, and y=1 otherwise.
creates a criterion that measures the mean absolute value of the element-wise difference between input x and target y:
measures the mean squared error between n elements in the input x and target y .
the loss can be described as :
or in the case of the weights argument being specified :
the loss can be described as :
or ine the case of the weights argument it is specified as follows:
the loss can be described as:
creates a criterion that measures the Binary Cross Entropy between the target and the output:
or in the case of the weights argument being specified:
This is used for measureing the error of a reconstruction in for example an auto-encoder. Note that the targets ti should be numbers between 0 and 1.
this Binary Cross Entropy between the target and the output logits( no sigmoid applied) is:
or in the case of the weights argument being specified:
if y==1 then it assumed the first input should be ranked higher(have a larger ) than the second input , and vice-versa for y==-1.
the loss function for each sample in the mini-batch is:
loss(x,y)=1n{xi, max(0,margin−xi),yi==1yi==−1
where i=0 to x.size(0), j=0 to y.size(0), yj!=0 , and i!=yj for all i and j.
x and y must have the same size.
The criterion only considers the first non zero yj targets. This allows for different samples to have variable amounts of target calsses
Creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise. It is less sensitive to outliers than the MSELoss and in some cases prevents exploding gradients (e.g. see “Fast R-CNN” paper by Ross Girshick). Also known as the Huber loss:
loss(x,y)=1n∑{0.5∗(xi−yi)2,|xi−yi|<1 |xi−yi|−0.5,otherwise
where i==0 to x.nelement-1, yi in {0, 1}
and x must have the same size.
loss(x,y)={1−cos(x1,x2),y==1 max(0,cos(x1,x2)−margin),y==−1
where i==0 to x.size(0) and i!=y.
Optionally, you can give non-equal weighting on classes by passing a 1D weights tensor into the constructor.
The loss function then becomes:
The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V.Balntas, E.Riba et al.
where d(xi,yi)=||xi−yi||22