3 different approaches to evaluate the quality of predictions of a model:
Finally, Dummy estimators are useful to get a baseline value of those metrics for random predictions.
For “pairwise” metrics, between samples and not estimators or predictions, see the Pairwise metrics, Affinities and Kernels section.
The scoring
parameter: defining model evaluation rules
Scoring | Function | Comment |
Classification | ||
‘accuracy’ | metrics.accuracy_score |
‘average_precision’ | metrics.average_precision_score |
‘f1’ | metrics.f1_score |
for binary targets |
‘f1_micro’ | metrics.f1_score |
micro-averaged |
‘f1_macro’ | metrics.f1_score |
macro-averaged |
‘f1_weighted’ | metrics.f1_score |
weighted average |
‘f1_samples’ | metrics.f1_score |
by multilabel sample |
‘neg_log_loss’ | metrics.log_loss |
requires predict_proba support |
‘precision’ etc. | metrics.precision_score |
suffixes apply as with ‘f1’ |
‘recall’ etc. | metrics.recall_score |
suffixes apply as with ‘f1’ |
‘roc_auc’ | metrics.roc_auc_score |
Clustering | ||
‘adjusted_rand_score’ | metrics.adjusted_rand_score |
Regression | ||
‘neg_mean_absolute_error’ | metrics.mean_absolute_error |
‘neg_mean_squared_error’ | metrics.mean_squared_error |
‘neg_median_absolute_error’ | metrics.median_absolute_error |
‘r2’ | metrics.r2_score |
使用如cross_val_score(clf, X, y, scoring='neg_log_loss')
Note:计算auc值的时候需要注意的一点是,sklearn.metric.roc_auc_score(y_actual, y_pred)才是计算ROC AUC的,而sklearn.metrics.auc(x,y)是计算折线与x轴之间的面积,x是折线上点的横坐标,y是折线上点的纵坐标。而且需要注意的是,y_actual必须是二类的,不能是连续数值(如果是,那么就需要自己利用sklearn.metrics.auc写一个计算面积的函数了)。
[Common cases: predefined values¶]
>>> import numpy as np >>> def my_custom_loss_func(ground_truth, predictions): ... diff = np.abs(ground_truth - predictions).max() ... return np.log(1 + diff) ... >>> # loss_func will negate the return value of my_custom_loss_func, >>> # which will be np.log(2), 0.693, given the values for ground_truth >>> # and predictions defined below. >>> loss = make_scorer(my_custom_loss_func, greater_is_better=False) >>> score = make_scorer(my_custom_loss_func, greater_is_better=True) >>> ground_truth = [[1, 1]] >>> predictions = [0, 1] >>> from sklearn.dummy import DummyClassifier >>> clf = DummyClassifier(strategy='most_frequent', random_state=0) >>> clf =, predictions) >>> loss(clf,ground_truth, predictions) -0.69... >>> score(clf,ground_truth, predictions) 0.69...交叉验证自定义scoring函数
def rocAucScorer(*args): ''' 自定义ROC-AUC评价指标rocAucScorer(clf, x_test, y_true) :param y_true: y_test真值 :param x_test: x测试集 ''' from sklearn import metrics # y值比对函数 fun = lambda yt, ys: metrics.roc_auc_score([1.0 if _ > 0.0 else 0.0 for _ in yt],[ys < 0.0, ys > 1.0, True], [0.0, 1.0, ys])) return metrics.make_scorer(fun, greater_is_better=True)(*args)[Defining your scoring strategy from metric functions ¶]
或者You can generate even more flexible model scorers by constructing your ownscoring object from scratch, without using the make_scorer
factory.For a callable to be a scorer, it needs to meet the protocol specified bythe following two rules:
(estimator, X, y)
, where estimator
is the model that should be evaluated, X
is validation data, and y
isthe ground truth target for X
(in the supervised case) or None
(in theunsupervised case).estimator
prediction quality on X
, with reference to y
.Again, by convention higher numbers are better, so if your scorerreturns loss, that value should be negated.def rocAucScorer(clf, x_test, y_true): ''' 自定义ROC-AUC评价指标 :param y_true: y_test真值 :param x_test: x测试集 ''' from sklearn import metrics ys = clf.predict(x_test) score = metrics.roc_auc_score([1.0 if _ > 0.0 else 0.0 for _ in y_true],[ys < 0.0, ys > 1.0, True], [0.0, 1.0, ys])) return score
[Implementing your own scoring object¶]
Some of these are restricted to the binary classification case:
matthews_corrcoef (y_true, y_pred[, ...]) |
Compute the Matthews correlation coefficient (MCC) for binary classes |
precision_recall_curve (y_true, probas_pred) |
Compute precision-recall pairs for different probability thresholds |
roc_curve (y_true, y_score[, pos_label, ...]) |
Compute Receiver operating characteristic (ROC) |
Others also work in the multiclass case:
cohen_kappa_score (y1, y2[, labels, weights]) |
Cohen’s kappa: a statistic that measures inter-annotator agreement. |
confusion_matrix (y_true, y_pred[, labels, ...]) |
Compute confusion matrix to evaluate the accuracy of a classification |
hinge_loss (y_true, pred_decision[, labels, ...]) |
Average hinge loss (non-regularized) |
Some also work in the multilabel case:
accuracy_score (y_true, y_pred[, normalize, ...]) |
Accuracy classification score. |
classification_report (y_true, y_pred[, ...]) |
Build a text report showing the main classification metrics |
f1_score (y_true, y_pred[, labels, ...]) |
Compute the F1 score, also known as balanced F-score or F-measure |
fbeta_score (y_true, y_pred, beta[, labels, ...]) |
Compute the F-beta score |
hamming_loss (y_true, y_pred[, labels, ...]) |
Compute the average Hamming loss. |
jaccard_similarity_score (y_true, y_pred[, ...]) |
Jaccard similarity coefficient score |
log_loss (y_true, y_pred[, eps, normalize, ...]) |
Log loss, aka logistic loss or cross-entropy loss. |
precision_recall_fscore_support (y_true, y_pred) |
Compute precision, recall, F-measure and support for each class |
precision_score (y_true, y_pred[, labels, ...]) |
Compute the precision |
recall_score (y_true, y_pred[, labels, ...]) |
Compute the recall |
zero_one_loss (y_true, y_pred[, normalize, ...]) |
Zero-one classification loss. |
And some work with binary and multilabel (but not multiclass) problems:
average_precision_score (y_true, y_score[, ...]) |
Compute average precision (AP) from prediction scores |
roc_auc_score (y_true, y_score[, average, ...]) |
Compute Area Under the Curve (AUC) from prediction scores |
# calculate accuracy
from sklearn import metrics
print metrics.accuracy_score(y_test, y_pred_class)
# examine the class distribution of the testing set (using a Pandas Series method)
# calculate the percentage of ones
# calculate the percentage of zeros
1 - y_test.mean()
# calculate null accuracy(for binary classification problems coded as 0/1)
max(y_test.mean(), 1-y_test.mean())
# calculate null accuracy (for multi-class classification problems)
y_test.value_counts().head(1) / len(y_test)
# print the first 25 true and predicted responses
print "True:", y_test.values[0:25]
print "Pred:", y_pred_class[0:25]
# IMPORTANT: first argument is true values, second argument is predicted values
print metrics.confusion_matrix(y_test, y_pred_class)
[[118 12]
[ 47 15]]
print (TP+TN) / float(TP+TN+FN+FP)
print metrics.accuracy_score(y_test, y_pred_class)
错误率、误分类率(Classification Error):分类器误分类的比例
print (FP+FN) / float(TP+TN+FN+FP)
print 1-metrics.accuracy_score(y_test, y_pred_class)
print TP / float(TP+FN)
print metrics.recall_score(y_test, y_pred_class)
print TN / float(TN+FP)
假阳率(False Positive Rate):实际值是负例数据,预测错误的百分比
print FP / float(TN+FP)
specificity = TN / float(TN+FP)
print 1 - specificity
print TP / float(TP+FP)
precision = metrics.precision_score(y_test, y_pred_class)
print precision
print (2*precision*recall) / (precision+recall)
print metrics.f1_score(y_test, y_pred_class)
from sklearn import metrics
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
precision recall f1-score support
0 0.96 0.77 0.86 4950
1 0.18 0.62 0.28 392
avg / total 0.90 0.76 0.81 5342
Note: 如0.18就是正例1对应的precision。
sklearn.metrics.precision_recall_curve(y_true, probas_pred, pos_label=None, sample_weight=None)
y_true : array, shape = [n_samples]
True targets of binary classification in range {-1, 1} or {0, 1}. 也就是二分类的binary。
probas_pred : array, shape = [n_samples]
Estimated probabilities or decision function. 必须是连续的continuous.
def plotPR(yt, ys, title=None): ''' 绘制precision-recall曲线 :param yt: y真值 :param ys: y预测值, recall, ''' import seaborn from sklearn import metrics from matplotlib import pyplot as plt precision, recall, thresholds = metrics.precision_recall_curve(yt, ys) plt.plot(precision, recall, 'darkorange', lw=1, label='x=precision') plt.plot(recall, precision, 'blue', lw=1, label='x=recall') plt.legend(loc='best') plt.plot([0, 1], [0, 1], color='navy', linestyle='--') plt.title('Precision-Recall curve for %s' % title) plt.ylabel('Recall') plt.xlabel('Precision') plt.savefig(os.path.join(CWD, 'middlewares/pr-' + title + '.png'))[ sklearn.metrics.precision_recall_curve]
[机器学习模型的评价指标和方法 :ROC曲线和AUC]
sklearn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)
y_true : array, shape = [n_samples]
True targets of binary classification in range {-1, 1} or {0, 1}. 也就是二分类的binary。
y_score : array, shape = [n_samples]
Target scores, can either be probability estimates of the positiveclass, confidence values, or non-thresholded measure of decisions(as returned by “decision_function” on some classifiers).应该也必须是连续的continuous.
def plotRUC(yt, ys, title=None): ''' 绘制ROC-AUC曲线 :param yt: y真值 :param ys: y预测值 ''' from sklearn import metrics from matplotlib import pyplot as plt f_pos, t_pos, thresh = metrics.roc_curve(yt, ys) auc_area = metrics.auc(f_pos, t_pos) print('auc_area: {}'.format(auc_area)) plt.plot(f_pos, t_pos, 'darkorange', lw=2, label='AUC = %.2f' % auc_area) plt.legend(loc='lower right') plt.plot([0, 1], [0, 1], color='navy', linestyle='--') plt.title('ROC-AUC curve for %s' % title) plt.ylabel('True Pos Rate') plt.xlabel('False Pos Rate') plt.savefig(os.path.join(CWD, 'middlewares/roc-' + title + '.png'))[
(y_true, y_score[, pos_label, ...])]
(y_true, y_score[, average, ...])]
ref: [机器学习模型的评价指标和方法 ]
[scikit-learn User Guide]
[Model selection and evaluation]
[3.3. Model evaluation: quantifying the quality of predictions]*
[3.5. Validation curves: plotting scores to evaluate models]*