

      • 召回率、准确率、F值



真正例(True Positive,TP):真实类别为正例,预测类别为正例。
假正例(False Positive,FP):真实类别为负例,预测类别为正例。
假负例(False Negative,FN):真实类别为正例,预测类别为负例。
真负例(True Negative,TN):真实类别为负例,预测类别为负例。
然后可以构建混淆矩阵(Confusion Matrix)如下表所示。

P = T P T P + F P P = \frac {TP} {TP+FP} P=TP+FPTP
R = T P T P + F N R = \frac {TP} {TP + FN } R=TP+FNTP
F 1 F_1 F1值:
F 1 = 2 ∗ P ∗ R P + R F_1 = \frac {2*P*R} {P+R} F1=P+R2PR
F 1 F_1 F1的一般形式:
F β = ( 1 + β 2 ) ∗ P ∗ R ( β 2 ∗ P ) + R F_\beta = \frac {(1+\beta^2)*P*R} {(\beta^2*P)+R} Fβ=(β2P)+R(1+β2)PR


When dealing with multiple classes there are two possible ways of averaging these measures(i.e. recall, precision, F1-measure) , namely, macro-average and micro-average. The macro-average weights equally all the classes, regardless of how many documents belong to it. The micro-average weights equally all the documents, thus favouring the performance on common classes. Different classifiers will perform different in common and rare categories. Learning algorithms are trained more often on more populated classes thus risking local over-fitting.

M a c r o _ P = 1 n ∑ i = 1 n P i Macro\_P = \frac {1} {n} \sum_{i=1}^nP_i Macro_P=n1i=1nPi
M a c r o _ R = 1 n ∑ i = 1 n R i Macro\_R = \frac {1} {n} \sum_{i=1}^nR_i Macro_R=n1i=1nRi
M a c r o _ F = 1 n ∑ i = 1 n F i Macro\_F = \frac {1} {n} \sum_{i=1}^nF_i Macro_F=n1i=1nFi
M a c r o _ F = 2 ∗ M a c r o _ P ∗ M a c r o _ R M a c r o _ P + M a c r o _ R Macro\_F = \frac {2*Macro\_P*Macro\_R} {Macro\_P+Macro\_R} Macro_F=Macro_P+Macro_R2Macro_PMacro_R
其中 P i = T P i T P i + F P i P_i = \frac {TP_i} {TP_i+FP_i} Pi=TPi+FPiTPi R i = T P i T P i + F N i R_i = \frac {TP_i} {TP_i+FN_i} Ri=TPi+FNiTPi
M i c r o _ P = ∑ i = 1 n T P i ∑ i = 1 n T P i + ∑ i = 1 n F P i Micro\_P = \frac {\sum_{i=1}^n TP_i} {\sum_{i=1}^n TP_i+\sum_{i=1}^n FP_i} Micro_P=i=1nTPi+i=1nFPii=1nTPi
M i c r o _ P = ∑ i = 1 n T P i ∑ i = 1 n T P i + ∑ i = 1 n F N i Micro\_P = \frac {\sum_{i=1}^n TP_i} {\sum_{i=1}^n TP_i+\sum_{i=1}^n FN_i} Micro_P=i=1nTPi+i=1nFNii=1nTPi
“macro” simply calculates the mean of the binary metrics,giving equal weight to each class. In problems where infrequent classesare nonetheless important, macro-averaging may be a means of highlightingtheir performance. On the other hand, the assumption that all classes areequally important is often untrue, such that macro-averaging willover-emphasize the typically low performance on an infrequent class.
“weighted” accounts for class imbalance by computing the average ofbinary metrics in which each class’s score is weighted by its presence in thetrue data sample.
“micro” gives each sample-class pair an equal contribution to the overallmetric (except as a result of sample-weight). Rather than summing themetric per class, this sums the dividends and divisors that make up theper-class metrics to calculate an overall quotient.Micro-averaging may be preferred in multilabel settings, includingmulticlass classification where a majority class is to be ignored.
“samples” applies only to multilabel problems. It does not calculate aper-class measure, instead calculating the metric over the true and predictedclasses for each sample in the evaluation data, and returning their(sample_weight-weighted) average.
