What are benefits and weaknesses of various binary classification metrics?

Accuracy
Definition - Proportion of instances you predict correctly.
Strengths - Very intuitive and easy to explain.
Weaknesses - Works poorly when the signal in the data is weak compared to the signal from the class imbalance. Also, you cannot express your uncertainty about a certain prediction.

Area under the curve (AUC)
Definition (intuitive) - Given a random positive instance and a random negative instance, the probability that you can distinguish between them.
Definition (direct) - The area under the ROC curve
Strengths - Works well when you want to be able to test your ability to distinguish the two classes.
Weaknesses - You may not be able to interpret your predictions as probabilities if you use AUC, since AUC only cares about the rankings of your prediction scores and not their actual value. Thus you may not be able to express your uncertainty about a prediction, or even the probability that an item is successful.

LogLoss / Deviance
Strengths - Your estimates can be interpreted as probabilities.
Weaknesses - If you have a lot of predictions that are near the boundaries, your error metric may be very sensitive to false positives or false negatives.

F-score, Mean Average Precision, Cohen's Kappa

These are more esoteric and not as often used for general binary classification tasks. You may see them in specific subfields (e.g. F-score in NLP and Precision metrics in information retrieval)

你可能感兴趣的:(What are benefits and weaknesses of various binary classification metrics?)