分类问题的评估(二分类&多分类)

文章目录

      • 召回率、准确率、F值

召回率、准确率、F值

对于二分类问题,可将样例根据其真实类别和分类器预测类别划分为:

真正例(True Positive,TP):真实类别为正例,预测类别为正例。
假正例(False Positive,FP):真实类别为负例,预测类别为正例。
假负例(False Negative,FN):真实类别为正例,预测类别为负例。
真负例(True Negative,TN):真实类别为负例,预测类别为负例。
然后可以构建混淆矩阵(Confusion Matrix)如下表所示。

准确率,又称查准率(Precision,P):
P = T P T P + F P P = \frac {TP} {TP+FP} P=TP+FPTP
召回率,又称查全率(Recall,R):
R = T P T P + F N R = \frac {TP} {TP + FN } R=TP+FNTP
F 1 F_1 F1值:
F 1 = 2 ∗ P ∗ R P + R F_1 = \frac {2*P*R} {P+R} F1=P+R2PR
F 1 F_1 F1的一般形式:
F β = ( 1 + β 2 ) ∗ P ∗ R ( β 2 ∗ P ) + R F_\beta = \frac {(1+\beta^2)*P*R} {(\beta^2*P)+R} Fβ=(β2P)+R(1+β2)PR

宏平均(macro-average)和微平均(micro-average)
如果只有一个二分类混淆矩阵,那么用以上的指标就可以进行评价,没有什么争议,但是当我们在n个二分类混淆矩阵上要综合考察评价指标的时候就会用到宏平均和微平均。宏平均(macro-average)和微平均(micro-average)是衡量文本分类器的指标。

When dealing with multiple classes there are two possible ways of averaging these measures(i.e. recall, precision, F1-measure) , namely, macro-average and micro-average. The macro-average weights equally all the classes, regardless of how many documents belong to it. The micro-average weights equally all the documents, thus favouring the performance on common classes. Different classifiers will perform different in common and rare categories. Learning algorithms are trained more often on more populated classes thus risking local over-fitting.

宏平均(Macro-averaging),是先对每一个类统计指标值,然后在对所有类求算术平均值。宏平均指标相对微平均指标而言受小类别的影响更大。
M a c r o _ P = 1 n ∑ i = 1 n P i Macro\_P = \frac {1} {n} \sum_{i=1}^nP_i Macro_P=n1i=1nPi
M a c r o _ R = 1 n ∑ i = 1 n R i Macro\_R = \frac {1} {n} \sum_{i=1}^nR_i Macro_R=n1i=1nRi
M a c r o _ F = 1 n ∑ i = 1 n F i Macro\_F = \frac {1} {n} \sum_{i=1}^nF_i Macro_F=n1i=1nFi
M a c r o _ F = 2 ∗ M a c r o _ P ∗ M a c r o _ R M a c r o _ P + M a c r o _ R Macro\_F = \frac {2*Macro\_P*Macro\_R} {Macro\_P+Macro\_R} Macro_F=Macro_P+Macro_R2Macro_PMacro_R
其中 P i = T P i T P i + F P i P_i = \frac {TP_i} {TP_i+FP_i} Pi=TPi+FPiTPi R i = T P i T P i + F N i R_i = \frac {TP_i} {TP_i+FN_i} Ri=TPi+FNiTPi
从上面的公式我们可以看到微平均并没有什么疑问,但是在计算宏平均F值时我给出了两个公式分别。都可以用。
微平均(Micro-averaging),是对数据集中的每一个实例不分类别进行统计建立全局混淆矩阵,然后计算相应指标。
M i c r o _ P = ∑ i = 1 n T P i ∑ i = 1 n T P i + ∑ i = 1 n F P i Micro\_P = \frac {\sum_{i=1}^n TP_i} {\sum_{i=1}^n TP_i+\sum_{i=1}^n FP_i} Micro_P=i=1nTPi+i=1nFPii=1nTPi
M i c r o _ P = ∑ i = 1 n T P i ∑ i = 1 n T P i + ∑ i = 1 n F N i Micro\_P = \frac {\sum_{i=1}^n TP_i} {\sum_{i=1}^n TP_i+\sum_{i=1}^n FN_i} Micro_P=i=1nTPi+i=1nFNii=1nTPi
M a c r o _ F = 2 ∗ M a c r o _ P ∗ M a c r o _ R M a c r o _ P + M a c r o _ R Macro\_F = \frac {2*Macro\_P*Macro\_R} {Macro\_P+Macro\_R} Macro_F=Macro_P+Macro_R2Macro_PMacro_R

“macro” simply calculates the mean of the binary metrics,giving equal weight to each class. In problems where infrequent classesare nonetheless important, macro-averaging may be a means of highlightingtheir performance. On the other hand, the assumption that all classes areequally important is often untrue, such that macro-averaging willover-emphasize the typically low performance on an infrequent class.
“weighted” accounts for class imbalance by computing the average ofbinary metrics in which each class’s score is weighted by its presence in thetrue data sample.
“micro” gives each sample-class pair an equal contribution to the overallmetric (except as a result of sample-weight). Rather than summing themetric per class, this sums the dividends and divisors that make up theper-class metrics to calculate an overall quotient.Micro-averaging may be preferred in multilabel settings, includingmulticlass classification where a majority class is to be ignored.
“samples” applies only to multilabel problems. It does not calculate aper-class measure, instead calculating the metric over the true and predictedclasses for each sample in the evaluation data, and returning their(sample_weight-weighted) average.

你可能感兴趣的:(机器学习,自然语言处理,深度学习)