python画PR曲线(precision-recall曲线)

使用python画precision-recall曲线的代码是:

sklearn.metrics.precision_recall_curve(y_true, probas_pred, pos_label=None, sample_weight=None)

以上代码会根据预测值和真实值,并通过改变判定阈值来计算一条precision-recall典线。

注意:以上命令只限制于二分类任务

precision(精度)为tp / (tp + fp),其中tp为真阳性数,fp为假阳性数。

recall(召回率)是tp / (tp + fn),其中tp是真阳性数,fn是假阴性数。

参数:

  • y_true:array, shape = [n_samples]:真实标签。如果标签不是{-1,1}或{0,1},那么pos_label应该显式给出。
  • probas_pred:array, shape = [n_samples] :正类的预测概率或决策函数
  • pos_label:int or str, default=None:正类标签。当pos_label=None时,如果y_true为{- 1,1}或{0,1},则pos_label设置为1,否则将报错。

返回值:

  • precision:array, shape = [n_thresholds + 1]:精度,最后一个元素是1。
  • recall:array, shape = [n_thresholds + 1]:召回率,最后一个是0
  • thresholds:array, shape = [n_thresholds <= len(np.unique(probas_pred))]:用于计算精度和召回率的决策函数的阈值

例子:

import numpy as np
from sklearn.metrics import precision_recall_curve
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
precision, recall, thresholds = precision_recall_curve(y_true, y_scores)
>>> precision
array([0.66666667, 0.5, 1., 1.])
>>> recall
array([1., 0.5, 0.5, 0.])
>>> thresholds
array([0.35, 0.4, 0.8])

sklearn.metrics.average_precision_score则计算预测值的平均准确率(AP: average precision)。该分数对应于presicion-recall曲线下的面积。该值在0和1之间,而且更高更好。 

sklearn.metrics.average_precision_score(y_true, y_score, average='macro', pos_label=1, sample_weight=None)

参数:

y_true:array, shape = [n_samples] or [n_samples, n_classes]

二元真实标签或二元标签指示器

y_score:array, shape = [n_samples] or [n_samples, n_classes]

目标分数,可以是正类的预测概率,置信度,或者无阈值决策

average:string, [None, ‘micro’, ‘macro’ (default), ‘samples’, ‘weighted’]

pos_label:int or str (default=1)

返回值:

average_precision:float

举例:

import numpy as np
from sklearn.metrics import average_precision_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
average_precision_score(y_true, y_scores)
# 0.83...

 

你可能感兴趣的:(python)