ML:常见判断类模型好坏指标 - 混淆矩阵 & ROC曲线 & AUC & 其他

混淆矩阵

ML:常见判断类模型好坏指标 - 混淆矩阵 & ROC曲线 & AUC & 其他_第1张图片

  1. 前提概念
缩写 全拼 含义
TP True Positive 预测对了,预测了“Positive”
FN False Negative 预测错了,预测了“Negetive”
FP False Positive 预测错了,预测了“Positive”
TN True Negtive 预测对了,预测了“Negtive”

2.指标定义

指标 定义 备注
Accuracy 在这里插入图片描述
Precision TP /(TP+FP) 对于模型标记为无误的样本中,它有多大比重是实际上也正确的
Recall / Sensitivity = TP /(TP+FN) 对于实际上是正确的样本,它有多大比重被模型无误的找出来了
F1 - Score 2*Precision*Recall / (Precision + Recall) 取值范围是从-到1的。1是最好,0是最差

ROC曲线

Receiver Operating Characteristic Curve / 感受性曲线 / 受试者工作特征曲线

ML:常见判断类模型好坏指标 - 混淆矩阵 & ROC曲线 & AUC & 其他_第2张图片
ROC曲线越向左上角凸,其效果越好;

AUC:即ROC曲线下的阴影部分的面积,故不展开;

附注:sklearn的评价指标(官网链接)

指标 函数 备注
分类
‘accuracy’ metrics.accuracy_score
‘balanced_accuracy’ metrics.balanced_accuracy_score
‘average_precision’ metrics.average_precision_score
‘brier_score_loss’ metrics.brier_score_loss
‘f1’ metrics.f1_score 用于二分类
'f1_micro metrics.f1_score
‘f1_macro’ metrics.f1_score
‘f1_weighted’ metrics.f1_score
‘f1_samples’ metrics.f1_score
‘precision’ etc metrics.precision_score f1搭配使用
‘recall’ etc metrics.recall_score f1搭配使用
‘jaccard’ etc metrics.jaccard_score f1搭配使用
‘neg_log_loss’ metrics.log_loss 需要 predict_proba支持
‘roc_auc’ metrics.roc_auc_score
聚类
‘adjusted_mutual_info_score’ metrics.adjusted_mutual_info_score
‘adjusted_rand_score’ metrics.adjusted_rand_score
‘completeness_score’ metrics.completeness_score
‘fowlkes_mallows_score’ metrics.fowlkes_mallows_score
‘homogeneity_score’ metrics.homogeneity_score
‘mutual_info_score’ metrics.mutual_info_score
‘normalized_mutual_info_score’ metrics.normalized_mutual_info_score
‘v_measure_score’ metrics.v_measure_score
回归
‘explained_variance’ metrics.explained_variance_score
‘r2’ metrics.r2_score
‘max_error’ metrics.max_error
‘neg_mean_absolute_error’ metrics.mean_absolute_error
‘neg_mean_squared_error’ metrics.mean_squared_error
‘neg_mean_squared_log_error’ metrics.mean_squared_log_error
‘neg_median_absolute_error’ metrics.median_absolute_error

注:以_score结尾的,值越大说明模型越好,以_error_loss结尾的 越小越好。


Sklearn示例

by oopcode in stackoverflow.(有改动)

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metric import roc_curve, auc, roc_auc_score

lr = LogisticRegression()
X = np.random.rand(10, 2)
y = np.random.randint(2, size=20)
lf.fit(X, y)

FP_rate, TP_rate, thresholds = roc_curve(y, lr.predict(X))
print(auc(FP_rate, TP_rate)
# 0.5
print(roc_auc_score(y, lr.predict(X))
# 0.5

你可能感兴趣的:(机器学习算法)