python计算梯形面积_sklearn计算ROC曲线下面积AUC

sklearn.metrics.auc

sklearn.metrics.auc(x, y, reorder=False)

通用方法,使用梯形规则计算曲线下面积。

import numpy as np

from sklearn import metrics

y = np.array([1, 1, 2, 2])

pred = np.array([0.1, 0.4, 0.35, 0.8])

fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)

metrics.auc(fpr, tpr)

sklearn.metrics.roc_auc_score

sklearn.metrics.roc_auc_score(y_true, y_score, average='macro', sample_weight=None)

计算预测得分曲线下的面积。

只用在二分类任务或者 label indicator 格式的多分类。

y_true:array, shape = [n_samples] or [n_samples, n_classes]

真实的标签

y_score:array, shape = [n_samples] or [n_samples, n_classes]

预测得分,可以是正类的估计概率、置信值或者分类器方法 “decision_function” 的返回值;

average:string, [None, ‘micro’, ‘macro’ (default), ‘samples’, ‘weighted’]

sample_weight : array-like of shape = [n_samples], optional

import numpy as np

from sklearn.metrics import roc_auc_score

y_true = np.array([0, 0, 1, 1])

y_scores = np.array([0.1, 0.4, 0.35, 0.8])

roc_auc_score(y_true, y_scores)

一个完整的例子

from sklearn import datasets,svm,metrics,model_selection,preprocessing

iris=datasets.load_iris()

x=iris.data[iris.target!=0,:2]

x=preprocessing.StandardScaler().fit_transform(x)

y=iris.target[iris.target!=0]

x_train,x_test,y_train,y_test=model_selection.train_test_split(x,y,

test_size=0.1,random_state=25)

clf=svm.SVC(kernel='linear')

clf.fit(x_train,y_train)

metrics.f1_score(y_test,clf.predict(x_test))

0.75

fpr,tpr,thresholds=metrics.roc_curve(y_test,clf.decision_function(x_test),

pos_label=2)

metrics.auc(fpr,tpr)

0.95833333333333337

总结

roc_auc_score 是 预测得分曲线下的 auc,在计算的时候调用了 auc;

def _binary_roc_auc_score(y_true, y_score, sample_weight=None):

if len(np.unique(y_true)) != 2:

raise ValueError("Only one class present in y_true. ROC AUC score "

"is not defined in that case.")

fpr, tpr, tresholds = roc_curve(y_true, y_score,

sample_weight=sample_weight)

return auc(fpr, tpr, reorder=True)

两种方法都可以得到同样的结果。

import numpy as np

from sklearn.metrics import roc_auc_score

y_true = np.array([0, 0, 1, 1])

y_scores = np.array([0.1, 0.4, 0.35, 0.8])

print(roc_auc_score(y_true, y_scores))

0.75

fpr,tpr,thresholds=metrics.roc_curve(y_true,y_scores,pos_label=1)

print(metrics.auc(fpr,tpr))

0.75

需要注意的是,roc_auc_score 中不能设置 pos_label,而在 roc_curve中,pos_label的默认设置如下:

classes = np.unique(y_true)

if (pos_label is None and

not (array_equal(classes, [0, 1]) or

array_equal(classes, [-1, 1]) or

array_equal(classes, [0]) or

array_equal(classes, [-1]) or

array_equal(classes, [1]))):

raise ValueError("Data is not binary and pos_label is not specified")

elif pos_label is None:

pos_label = 1.

也就是说,roc_auc_score 中 pos_label 必须满足以上条件,才能直接使用,否则,需要使用 roc_curve 和auc。

import numpy as np

from sklearn import metrics

y = np.array([1, 1, 2, 2])

pred = np.array([0.1, 0.4, 0.35, 0.8])

fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=1)

print(metrics.auc(fpr, tpr))

0.75

print(metrics.roc_auc_score(y,pred))

ValueError: Data is not binary and pos_label is not specified

#pos_label 不符合 roc_curve的默认设置,因此报错,可以修改为

y=np.array([0,0,1,1]) #np.array([-1,-1,1,1])

print(metrics.roc_auc_score(y,pred))

0.75

你可能感兴趣的:(python计算梯形面积)