python3 使用ROC 曲线(sklearn.metrics实现)

主要需要使用的是函数roc_curve(from sklearn.metrics import roc_curve,auc)。

roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)
Compute Receiver operating characteristic (ROC)

Note: this implementation is restricted to the binary classification task.

Read more in the :ref:`User Guide `.


y_true : array, shape = [n_samples]
    True binary labels in range {0, 1} or {-1, 1}.  If labels are not
    binary, pos_label should be explicitly given.

y_score : array, shape = [n_samples]
    Target scores, can either be probability estimates of the positive
    class, confidence values, or non-thresholded measure of decisions
    (as returned by "decision_function" on some classifiers).

pos_label : int or str, default=None
    Label considered as positive and others are considered negative.

sample_weight : array-like of shape = [n_samples], optional
    Sample weights.

drop_intermediate : boolean, optional (default=True)
    Whether to drop some suboptimal thresholds which would not appear
    on a plotted ROC curve. This is useful in order to create lighter
    ROC curves.

    .. versionadded:: 0.17
       parameter *drop_intermediate*.

fpr : array, shape = [>2]
    Increasing false positive rates such that element i is the false
    positive rate of predictions with score >= thresholds[i].

tpr : array, shape = [>2]
    Increasing true positive rates such that element i is the true
    positive rate of predictions with score >= thresholds[i].

thresholds : array, shape = [n_thresholds]
    Decreasing thresholds on the decision function used to compute
    fpr and tpr. `thresholds[0]` represents no instances being predicted
    and is arbitrarily set to `max(y_score) + 1`.

See also
roc_auc_score : Compute Area Under the Curve (AUC) from prediction scores

Since the thresholds are sorted from low to high values, they
are reversed upon returning them to ensure they correspond to both ``fpr``
and ``tpr``, which are sorted in reversed order during their calculation.

.. [1] `Wikipedia entry for the Receiver operating characteristic

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
>>> fpr
array([ 0. ,  0.5,  0.5,  1. ])
>>> tpr
array([ 0.5,  0.5,  1. ,  1. ])
>>> thresholds
array([ 0.8 ,  0.4 ,  0.35,  0.1 ])

python3 使用ROC 曲线(sklearn.metrics实现)_第1张图片


> def test_draw_roc():
>     import matplotlib.pyplot as plt
>     from sklearn.metrics import roc_curve,auc
>     y_test=[0, 0, 1, 1]
>     scores=[0.1, 0.4, 0.35, 0.8]
>     fpr,tpr,threshold = roc_curve(y_test, scores)
>     print('y_test:%s, scores:%s'%(y_test, scores))
>     print('fpr:%s, tpr:%s, threshold:%s'%(fpr,tpr,threshold))
>     plt.plot(fpr,tpr)

python3 使用ROC 曲线(sklearn.metrics实现)_第2张图片

  1. y_test为实际的标签,可以看到标签有2类,0和1
  2. scores为预测的值
  3. roc_curve(y_test, scores)计算在每个threshold下的TPR和FPR。具体每个值怎么得的如下所示:
    1)通过代码注释thresholds : array, shape = [n_thresholds]
    Decreasing thresholds on the decision function used to compute
    fpr and tpr. thresholds[0] represents no instances being predicted
    and is arbitrarily set to max(y_score) + 1.可以看出,threshold的数量为max(y_score) + 1,在本例中,scores和y_test的长度为4
    FPR = FP/(FP + TN) 负样本中的错判率(假警报率)(所有判断为负的次数里,错误判断为负的占比)
    TPR = TP/(TP + FN) 判对样本中的正样本率(命中率)(所有判断为正的次数里,正确判断为正的概率)
threshold y_test predict TP FP TN FN FPR TPR
1.8 [0, 0, 1, 1] [0,0,0,0] 0 0 2 2 0 0
0.8 [0, 0, 1, 1] [0,0,0,1] 1 0 2 1 0 1/2=0.5
0.4 [0, 0, 1, 1] [0,1,0,1]
0.35 [0, 0, 1, 1] [0,1,1,1]
0.1 [0, 0, 1, 1] [1,1,1,1] 2 2 0 0 2/2=1 2/2=1
