首先:文中有废话,有缺失是必然的,大家听我bb几分钟,不如看看
时隔好久,我又回来更新了
一方面是又学到了新的干货,
一方面是把我学到的内容和big data的同学们分享一下,希望对大家课设有所帮助
准确率一般用于评价分类器,是正确分类的样本数与总样本数之比
from sklearn.metrics import accuracy_score
# y_true是指真实值
# y_pred模型根据数据(一般是训练集数据)得出的预测值
print('Accuracy score: ', format(accuracy_score(y_test, y_pred)))
已有的定义和公式我就不说了,接下来我说点人话
混淆矩阵相关的概念也有很多,在此同样略过
说的笼统且不专业一点:
精确率(precision),也称查准率,P = 预测对的/(预测对的+预测错的)
召回率(recall),即查全率,F = 预测对的/(预测对的+没预测出来的)
F1是precision和recall的调和均值 F1 = 2PR/(P+R)
P & R都很高时,F1也会很高
from sklearn.metrics import precision_score, recall_score,f1_score
print('Accuracy score: ', accuracy_score(y_test, predictions))
print('Precision score: ', precision_score(y_test, predictions))
print('Recall score: ', recall_score(y_test, predictions))
print('F1 score: ', f1_score(y_test, predictions))
受试者工作特性曲线(receiver operating characteristic),简称ROC曲线。
ROC曲线的横坐标为假阳性率(False Positive Rate, FPR);纵轴为真阳性率(True Positive Rate, TPR)
将就着看看,有空我再补充
这里有一个 官方例子
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 1, 1])
# y_score = classifier.fit(X_train, y_train).decision_function(X_test)
y_scores = np.array([0.1, 0.4, 0.35, 0.8]) # 这些必须是总和为1的概率估计值
roc_auc_score(y_true, y_scores)
>>> 0.75
看到另一个大佬的用法
from sklearn.metrics import roc_auc_score
#calculating roc_auc_score for xgboost
auc_xgb = roc_auc_score(y_test,ypred)
加一个参数就好了
more details还是参见官方文档
print('Precision score: ', precision_score(y_test, y_pred,average ='macro'))
print('Recall score: ', recall_score(y_test, y_pred,average ='macro'))
print('F1 score: ', f1_score(y_test, y_pred,average ='macro'))
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y,pred_y)
from sklearn.metrics import mean_square_error
mean_square_error(y,pred_y)
from sklearn.metrics import r2_score
r2_score(y,pred_y)
最后举个例子,或者说方便大家copy,
比如对于GaussNaiveBayes,经典的分类模型,我们可以:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy score: ', accuracy_score(y_test, predictions))
print('Precision score: ', precision_score(y_test, predictions))
print('Recall score: ', recall_score(y_test, predictions))
print('F1 score: ', f1_score(y_test, predictions))
>>> Accuracy score: 0.9885139985642498
>>> Precision score: 0.9720670391061452
>>> Recall score: 0.9405405405405406
>>> F1 score: 0.9560439560439562