对于预测结果的评估
The precision is the ratio ‘tp / (tp + fp)’ where ‘tp’ is the number of true positives and ‘fp’ the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.(精度是分类器不将样本标记为正样本的直观能力)
其中最好的值是1,最差的值是0
The recall is the ratio ‘tp / (tp + fn)’ where ‘tp’ is the number of true positives and ‘fn’ the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples(就是分类器将positive samples 分对的概率).
# The best value is 1 and the worst value is 0.
F1 score, also known as balanced F-score or F-measure
The formula for the F1 score is: F1 = 2 * (precision * recall) / (precision + recall)
favors recall (‘beta -> 0’ considers only precision, ‘beta -> inf’ only recall).
详细的学习参考链接
sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False)
y_true: 类别的真实标签值,类标签的列表
y_pred: 预测值的标签,类标签的列表
labels: 报告中要包含的标签索引的可选列表;这个参数一般不需要设置(如果要设置,比如200个类别,那么就应该如此设置:lable= range(200); 然后在sklearn.metrics.classification_report中将labels=label),可是有的时候不设置就会出错,之所以会出错是因为:比如你总共的类别为200个类,但是,你的测试集中真实标签包含的类别只有199个,有一个类别缺失数据,如果不设置这个参数就会报错;
target_name: 与标签匹配的名称,就是一个字符串列表,在报告中显示;也即是显示与labels对应的名称
sample_weight:设置权重的参数,一般不用,需要就可以用;
digits:这个参数是用来设置你要输出的格式位数,就是几位有效数字吧,大概就是这个意思,即指定输出格式的精确度;
output_dict:我一般不用,好像没啥用;如果为True,则将输出作为dict返回
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
"""
根据每个类别进行precision, recall, f1-score评价指标,其中support 表示该类中样本的个数
"""
print(classification_report(y_true, y_pred, target_names=target_names))
###################输出结果
precision recall f1-score support
class 0 0.50 1.00 0.67 1
class 1 0.00 0.00 0.00 1
class 2 1.00 0.67 0.80 3
accuracy 0.60 5
macro avg 0.50 0.56 0.49 5
weighted avg 0.70 0.60 0.61 5
y_true: 为样本真实标签,y_pred 为样本预测标签;
support:当前行的类别在测试数据中的样本总量,如上表就是,在class 0 类别在测试集中总数量为1;
precision:精度=正确预测的个数(TP)/被预测正确的个数(TP+FP);人话也就是模型预测的结果中有多少是预测正确的
recall:召回率=正确预测的个数(TP)/预测个数(TP+FN);人话也就是某个类别测试集中的总量,有多少样本预测正确了;
f1-score:F1 = 2精度召回率/(精度+召回率)
accuracy:计算所有数据下的指标值,假设全部数据 5 个样本中有 3 个预测正确,所以 micro avg 为 3/5=0.6,当这一行值相同时,只出现一个,例如上边输出结果只有一个0.6
== avg==:每个类别评估指标未加权的平均值,比如准确率的 macro avg,(0.50+0.00+1.00)/3=0.5
weighted avg:加权平均,就是测试集中样本量大的,我认为它更重要,给他设置的权重大点;比如第一个值的计算方法,(0.501 + 0.01 + 1.0*3)/5 = 0.70
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score, f1_score, fbeta_score
from sklearn.metrics import precision_recall_fscore_support, classification_report
if __name__ == "__main__":
y_true = np.array([1, 1, 1, 1, 0, 0])
y_hat = np.array([1, 0, 1, 1, 1, 1])
print ('Accuracy:\t', accuracy_score(y_true, y_hat))
# The precision is the ratio 'tp / (tp + fp)' where 'tp' is the number of
# true positives and 'fp' the number of false positives. The precision is
# intuitively the ability of the classifier not to label as positive a sample that is negative.(精度是分类器不将样本标记为正样本的直观能力)
# The best value is 1 and the worst value is 0.
precision = precision_score(y_true, y_hat)
print ('Precision:\t', precision)
# The recall is the ratio 'tp / (tp + fn)' where 'tp' is the number of
# true positives and 'fn' the number of false negatives. The recall is
# intuitively the ability of the classifier to find all the positive samples.
# The best value is 1 and the worst value is 0.
recall = recall_score(y_true, y_hat)
print ('Recall: \t', recall)
# F1 score, also known as balanced F-score or F-measure
# The F1 score can be interpreted as a weighted average of the precision and
# recall, where an F1 score reaches its best value at 1 and worst score at 0.
# The relative contribution of precision and recall to the F1 score are
# equal. The formula for the F1 score is:
# F1 = 2 * (precision * recall) / (precision + recall)
print ('f1 score: \t', f1_score(y_true, y_hat))
print (2 * (precision * recall) / (precision + recall))
# The F-beta score is the weighted harmonic mean(调和平均数) of precision and recall,
# reaching its optimal value at 1 and its worst value at 0.
# The 'beta' parameter determines the weight of precision in the combined
# score. 'beta < 1' lends more weight to precision, while 'beta > 1'
# favors recall ('beta -> 0' considers only precision, 'beta -> inf' only recall).
print ('F-beta:')
for beta in np.logspace(-3, 3, num=7, base=10):
fbeta = fbeta_score(y_true, y_hat, beta=beta)
print ('\tbeta=%9.3f\tF-beta=%.5f' % (beta, fbeta))
#print (1+beta**2)*precision*recall / (beta**2 * precision + recall)
print (precision_recall_fscore_support(y_true, y_hat, beta=1))
print (classification_report(y_true, y_hat))
################输出结果
Accuracy: 0.5
Precision: 0.6
Recall: 0.75
f1 score: 0.6666666666666665
0.6666666666666665
F-beta:
beta= 0.001 F-beta=0.60000
beta= 0.010 F-beta=0.60001
beta= 0.100 F-beta=0.60119
beta= 1.000 F-beta=0.66667
beta= 10.000 F-beta=0.74815
beta= 100.000 F-beta=0.74998
beta= 1000.000 F-beta=0.75000
(array([0. , 0.6]), array([0. , 0.75]), array([0. , 0.66666667]), array([2, 4], dtype=int64))
precision recall f1-score support
0 0.00 0.00 0.00 2
1 0.60 0.75 0.67 4
accuracy 0.50 6
macro avg 0.30 0.38 0.33 6
weighted avg 0.40 0.50 0.44 6
Process finished with exit code 0