主要需要使用的是函数roc_curve(from sklearn.metrics import roc_curve,auc)。
这里先记录一下代码的注释:
roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)
Compute Receiver operating characteristic (ROC)
Note: this implementation is restricted to the binary classification task.
Read more in the :ref:`User Guide `.
Parameters
----------
y_true : array, shape = [n_samples]
True binary labels in range {0, 1} or {-1, 1}. If labels are not
binary, pos_label should be explicitly given.
y_score : array, shape = [n_samples]
Target scores, can either be probability estimates of the positive
class, confidence values, or non-thresholded measure of decisions
(as returned by "decision_function" on some classifiers).
pos_label : int or str, default=None
Label considered as positive and others are considered negative.
sample_weight : array-like of shape = [n_samples], optional
Sample weights.
drop_intermediate : boolean, optional (default=True)
Whether to drop some suboptimal thresholds which would not appear
on a plotted ROC curve. This is useful in order to create lighter
ROC curves.
.. versionadded:: 0.17
parameter *drop_intermediate*.
Returns
-------
fpr : array, shape = [>2]
Increasing false positive rates such that element i is the false
positive rate of predictions with score >= thresholds[i].
tpr : array, shape = [>2]
Increasing true positive rates such that element i is the true
positive rate of predictions with score >= thresholds[i].
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute
fpr and tpr. `thresholds[0]` represents no instances being predicted
and is arbitrarily set to `max(y_score) + 1`.
See also
--------
roc_auc_score : Compute Area Under the Curve (AUC) from prediction scores
Notes
-----
Since the thresholds are sorted from low to high values, they
are reversed upon returning them to ensure they correspond to both ``fpr``
and ``tpr``, which are sorted in reversed order during their calculation.
References
----------
.. [1] `Wikipedia entry for the Receiver operating characteristic
`_
Examples
--------
>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
>>> fpr
array([ 0. , 0.5, 0.5, 1. ])
>>> tpr
array([ 0.5, 0.5, 1. , 1. ])
>>> thresholds
array([ 0.8 , 0.4 , 0.35, 0.1 ])
ROC曲线定义:reference:https://blog.csdn.net/abcjennifer/article/details/7359370
一句话解释就是:ROC曲线上的每一个点对应于一个threshold,对于一个分类器,每个threshold下会有一个TPR和FPR。
从定义可以看出,ROC曲线在实际的使用中其实应该是折线。
举个例子:
> def test_draw_roc():
> import matplotlib.pyplot as plt
> from sklearn.metrics import roc_curve,auc
> y_test=[0, 0, 1, 1]
> scores=[0.1, 0.4, 0.35, 0.8]
> fpr,tpr,threshold = roc_curve(y_test, scores)
> print('y_test:%s, scores:%s'%(y_test, scores))
> print('fpr:%s, tpr:%s, threshold:%s'%(fpr,tpr,threshold))
> plt.plot(fpr,tpr)
> plt.show()
thresholds[0]
represents no instances being predictedmax(y_score) + 1
.可以看出,threshold的数量为max(y_score) + 1,在本例中,scores和y_test的长度为4threshold | y_test | predict | TP | FP | TN | FN | FPR | TPR |
---|---|---|---|---|---|---|---|---|
1.8 | [0, 0, 1, 1] | [0,0,0,0] | 0 | 0 | 2 | 2 | 0 | 0 |
0.8 | [0, 0, 1, 1] | [0,0,0,1] | 1 | 0 | 2 | 1 | 0 | 1/2=0.5 |
0.4 | [0, 0, 1, 1] | [0,1,0,1] | … | |||||
0.35 | [0, 0, 1, 1] | [0,1,1,1] | … | |||||
0.1 | [0, 0, 1, 1] | [1,1,1,1] | 2 | 2 | 0 | 0 | 2/2=1 | 2/2=1 |