Pytorch中的Loss Functions简介

原文链接

  • Mean Abosolute Error

    torch.nn.L1Loss

    It Measures the mean absolute error.
    l o s s ( x ,   y ) = ∣ x   −   y ∣ . loss(x, \ y) = |x \ - \ y|. loss(x, y)=x  y.

    What does it mean?
    It measures the numerical distance between the estimated and actual value.
    It is the simplest form of error metric. The absolute value of the error si taken because if we don’t then negatives will cancel out the positives. This isn’t useful to us, rather it makes it more unreliable.
    The lower the value of MAE, better is the model.

    When to use it?

    • Regression problems
    • Simplistic model

  • Mean Abosolute Error

    torch.nn.MSELoss

    It measures the mean squared error.
    l o s s ( x ,   y ) = ( x   −   y ) 2 . loss(x, \ y) = (x \ - \ y)^2. loss(x, y)=(x  y)2.

    What does it mean?
    The squaring of the difference of prediction and actual value means that we’re amplifying large losses. If the classifier is off by 200, the error is 40000 and if the classifier is off by 0.1, the error is 0.01. This penalizes the model when it makes large mistakes and incentivizes small errors.

    When to use it?

    • Regression problems
    • The numerical value fetures are not large
    • Problem is not very high dimensional

  • Smooth L1 Loss

    torch.nn.SmoothL1Loss

    Also known as Huber loss.
    l o s s ( x , y ) = { 0.5 ( x − y ) 2 if |x-y| < 1 ∣ x − y ∣ − 0.5 otherwise loss(x, y)= \begin{cases} 0.5(x - y)^2& \text{if |x-y| < 1}\\ |x - y| - 0.5& \text{otherwise} \end{cases} loss(x,y)={0.5(xy)2xy0.5if |x-y| < 1otherwise

    What does it mean?
    It uses a squared term if the absolute error falls below 1 and an absolute term otherwise. It is less sensitive to outlier than the mean sqaure error loss and in some cases prevents exploding gradients.

    When to use it?

    • Regression
    • When the features have large vaules
    • Well suited for most problems

  • Negative Log-Likelihood Loss

    torch.nn.NLLoss

    l o s s ( x , y ) = − ( l o g   y ) , loss(x, y) = - (log \ y), loss(x,y)=(log y),
    where y is the probability of predicted label.

    What does it mean?
    It maximizes the overall probability of the data. It penalizes the model when it predicts the correct class with smaller probabilities and incentivizes then the prediction is made with higher probability.

    When to use it?

    • Classification
    • Smaller quicker training
    • Simple tasks

  • Cross-Entropy Loss

    torch.nn.CrossEntropyLoss

    l o s s ( x , y ) = − ( ∑ x   l o g   y ) , loss(x, y) = - (\sum x \ log \ y), loss(x,y)=(x log y),
    where x is the probability of true label and y is the probability of predicted label.

    What does it mean?
    Cross-entropy as a loss function is used to learn the probability distribution of the data. While other loss functions like squared loss penalize wrong predictions, cross entropy gives a greater penalty when incorrect predictions are predicted with high confidence. What differentialtes it with negative log loss is that cross entropy also penalizes wrong but confident predictions and correct but less confident prediction, while negative loss loss dose not penalize accoring to the confidence of predictions.

    When to use it?

    • Classification tasks
    • For making confident model
    • For higher precision/recall values

  • Kullback-Leibler divergence

    torch.nn.KLDivLoss

    KL divergence gives a measure of how two probability distribution are different from each other.

    l o s s ( x , y ) = − x ( l o g   y / x ) , loss(x, y) = - x( log \ y /x), loss(x,y)=x(log y/x),
    where x is the probability of true label and y is the probability of predicted label.

    What does it mean?
    It is quite similar to cross entropy loss. The distinction is the difference between predicted and actual probability. This adds data about information loss in the model training. The farther away the predicted probability distribution is form the true probability distribution, greater is the loss. It dose not penalize the model based on the confidence of prediction, as in cross entropy loss, but how differnent is the prediction form ground truth. It usually outperforms mean square error, especially when data is not normally distributed. The reason why cross entropy is more widely used is that it can be broken down as a function of cross entropy. Minimzing the cross entropy is the same as minimzing KL divergence:
    K L = − x l o g ( y / x ) = − x l o g ( y ) + x l o g ( x ) , KL = - x log (y/x) = -xlog(y) + x log(x), KL=xlog(y/x)=xlog(y)+xlog(x),
    where xlog(x) is known as Entropy.

    When to use it?

    • Classification tasks
    • Same can be achieved with cross entropy with lesser computation, so avoid it.

  • Margin Ranking Loss

    torch.nn.MarginRankingLoss

    It measures the loss given inputs x1, x2, and a label tensor y with vaules (1 or -1). If y == 1, then it assumed the first input should be ranked higher than the second input, and vice versa.

    l o s s ( x , y ) = m a x ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n ) . loss(x, y) = max (0, -y * (x1 - x2) + margin). loss(x,y)=max(0,y(x1x2)+margin).

    What does it mean?
    The prediction y of the classifier is based on the ranking of the inputs x1 and x2. Assuming margin to have the default vaule of 0, if y and (x1 - x2) are the same sign, then the loss with be zero. This means that x1/x2 was ranked hihger

    When to use it?

    • GANs
    • Ranking tasks.

  • Hinge Embedding Loss

    torch.nn.HingeEmbeddingLoss

    It measures the loss given inputs x1, x2, and a label tensor y with vaules (1 or -1). If y == 1, then it assumed the first input should be ranked higher than the second input, and vice versa.

    l o s s ( x , y ) = m a x ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n ) . loss(x, y) = max (0, -y * (x1 - x2) + margin). loss(x,y)=max(0,y(x1x2)+margin).

    What does it mean?
    The prediction y of the classifier is based on the ranking of the inputs x1 and x2. Assuming margin to have the default vaule of 0, if y and (x1 - x2) are the same sign, then the loss with be zero. This means that x1/x2 was ranked hihger

    When to use it?

    • GANs
    • Ranking tasks.

你可能感兴趣的:(Pytorch中的Loss Functions简介)