python3 使用ROC 曲线(sklearn.metrics实现)

主要需要使用的是函数roc_curve(from sklearn.metrics import roc_curve,auc)。
这里先记录一下代码的注释:

roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)
Compute Receiver operating characteristic (ROC)

Note: this implementation is restricted to the binary classification task.

Read more in the :ref:`User Guide `.

Parameters
----------

y_true : array, shape = [n_samples]
    True binary labels in range {0, 1} or {-1, 1}.  If labels are not
    binary, pos_label should be explicitly given.

y_score : array, shape = [n_samples]
    Target scores, can either be probability estimates of the positive
    class, confidence values, or non-thresholded measure of decisions
    (as returned by "decision_function" on some classifiers).

pos_label : int or str, default=None
    Label considered as positive and others are considered negative.

sample_weight : array-like of shape = [n_samples], optional
    Sample weights.

drop_intermediate : boolean, optional (default=True)
    Whether to drop some suboptimal thresholds which would not appear
    on a plotted ROC curve. This is useful in order to create lighter
    ROC curves.

    .. versionadded:: 0.17
       parameter *drop_intermediate*.

Returns
-------
fpr : array, shape = [>2]
    Increasing false positive rates such that element i is the false
    positive rate of predictions with score >= thresholds[i].

tpr : array, shape = [>2]
    Increasing true positive rates such that element i is the true
    positive rate of predictions with score >= thresholds[i].

thresholds : array, shape = [n_thresholds]
    Decreasing thresholds on the decision function used to compute
    fpr and tpr. `thresholds[0]` represents no instances being predicted
    and is arbitrarily set to `max(y_score) + 1`.

See also
--------
roc_auc_score : Compute Area Under the Curve (AUC) from prediction scores

Notes
-----
Since the thresholds are sorted from low to high values, they
are reversed upon returning them to ensure they correspond to both ``fpr``
and ``tpr``, which are sorted in reversed order during their calculation.

References
----------
.. [1] `Wikipedia entry for the Receiver operating characteristic
        `_


Examples
--------
>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
>>> fpr
array([ 0. ,  0.5,  0.5,  1. ])
>>> tpr
array([ 0.5,  0.5,  1. ,  1. ])
>>> thresholds
array([ 0.8 ,  0.4 ,  0.35,  0.1 ])

ROC曲线定义:reference:https://blog.csdn.net/abcjennifer/article/details/7359370
一句话解释就是:ROC曲线上的每一个点对应于一个threshold,对于一个分类器,每个threshold下会有一个TPR和FPR。
python3 使用ROC 曲线(sklearn.metrics实现)_第1张图片
从定义可以看出,ROC曲线在实际的使用中其实应该是折线。

举个例子:

> def test_draw_roc():
>     import matplotlib.pyplot as plt
>     from sklearn.metrics import roc_curve,auc
>     y_test=[0, 0, 1, 1]
>     scores=[0.1, 0.4, 0.35, 0.8]
>     fpr,tpr,threshold = roc_curve(y_test, scores)
>     print('y_test:%s, scores:%s'%(y_test, scores))
>     print('fpr:%s, tpr:%s, threshold:%s'%(fpr,tpr,threshold))
>     plt.plot(fpr,tpr)
>     plt.show()

以上代码运行结果是:
python3 使用ROC 曲线(sklearn.metrics实现)_第2张图片
分析这段代码做了什么:

  1. y_test为实际的标签,可以看到标签有2类,0和1
  2. scores为预测的值
  3. roc_curve(y_test, scores)计算在每个threshold下的TPR和FPR。具体每个值怎么得的如下所示:
    1)通过代码注释thresholds : array, shape = [n_thresholds]
    Decreasing thresholds on the decision function used to compute
    fpr and tpr. thresholds[0] represents no instances being predicted
    and is arbitrarily set to max(y_score) + 1.可以看出,threshold的数量为max(y_score) + 1,在本例中,scores和y_test的长度为4
    2)在每个threshold下计算TPR和FPR:
    当score大于等于threshold时,定为1.
    FPR = FP/(FP + TN) 负样本中的错判率(假警报率)(所有判断为负的次数里,错误判断为负的占比)
    TPR = TP/(TP + FN) 判对样本中的正样本率(命中率)(所有判断为正的次数里,正确判断为正的概率)
threshold y_test predict TP FP TN FN FPR TPR
1.8 [0, 0, 1, 1] [0,0,0,0] 0 0 2 2 0 0
0.8 [0, 0, 1, 1] [0,0,0,1] 1 0 2 1 0 1/2=0.5
0.4 [0, 0, 1, 1] [0,1,0,1]
0.35 [0, 0, 1, 1] [0,1,1,1]
0.1 [0, 0, 1, 1] [1,1,1,1] 2 2 0 0 2/2=1 2/2=1

你可能感兴趣的:(python)