读取data.csv文件数据完成:
1.分别计算真正例(TP)、真负例(TN)、假正例(FP)、假负例(FN)数量
2.分别计算各类别(正/负例)的精确率(Precision)、召回率(Recall)、F1值(F1-score)
3.分别计算精确率、召回率、F1-score的宏平均(Macro Average)并且计算准确率(Accuracy)
4.绘制ROC曲线并计算曲线下面积AUC (可使用sklearn包)
预测概率转换为预测类别:
以0.5为阈值,将预测概率predict_prob列二值化为0/1,即值小于0.5的元素变为0,不小于0.5的元素变为1,建议新生成单独的预测类别数组
注:读取数据的标签中1表示正例,0表示负例
import pandas as pd
import numpy as np
DATA_PATH = 'data.csv'
data = pd.read_csv(DATA_PATH)
def transform(a):
if a['predict_prob'] < 0.5:
a.loc['predict_prob'] = 0
else:
a.loc['predict_prob'] = 1
return a
data0 = data.loc[:].apply(transform, axis = 1)
data0
def count(data, a, b):
ans = 0;
for indexs in data.index:
if data.loc[indexs, 'predict_prob'] == a and data.loc[indexs, 'label'] == b:
ans += 1
return ans;
TP = count(data0, 1, 1)
TN = count(data0, 0, 0)
FP = count(data0, 1, 0)
FN = count(data0, 0, 1)
print(str(TP)+'\n'+str(TN)+'\n'+str(FP)+'\n'+str(FN))
运行结果如下:
348
198
14
9
PP = TP/(TP+FP)
NP = TN/(TN+FN)
PR = TP/(TP+FN)
NR = TN/(TN+FP)
PF1 = 2*(PP*PR/(PP+PR))
NF1 = 2*(NP*NR/(NP+NR))
print('%.4f\n%.4f\n%.4f\n%.4f\n%.4f\n%.4f\n' %(PP, NP, PR, NR, PF1, NF1))
运行结果如下:
0.9613
0.9565
0.9748
0.9340
0.9680
0.9451
P = (PP+NP)/2
R = (PR+NR)/2
F1 = (PF1+NF1)/2
A = (TP+TN)/(TP+FP+TN+FN)
print('%.4f\n%.4f\n%.4f\n%.4f'%(P, R, F1, A))
运行结果如下:
0.9589
0.9544
0.9566
0.9596
from sklearn.metrics import roc_curve, auc
import matplotlib as mpl
import matplotlib.pyplot as plt
def plot_roc(labels, predict_prob):
false_positive_rate,true_positive_rate,thresholds=roc_curve(labels, predict_prob)
roc_auc=auc(false_positive_rate, true_positive_rate)
plt.title('ROC')
plt.plot(false_positive_rate, true_positive_rate,'b',label='AUC = %0.4f'% roc_auc)
plt.legend(loc='lower right')
plt.plot([0,1],[0,1],'r--')
plt.ylabel('TPR')
plt.xlabel('FPR')
plt.show()
plot_roc(data['label'], data['predict_prob'])