原因参见老师的博客:基于序的评价指标 (特别针对推荐系统和多标签学习)
由于算法的输出数值类型的,所以阈值的确定影响着算法的效果。
由二分类问题中的分类结果混淆矩阵引申,得到 F 1 F_1 F1在该算法中的最大值称为 P e a k − F 1 Peak-F_1 Peak−F1.
Python实现:
def compute_peak_f1(self):
temp_predict_vector = self.predict_prob_matrix.reshape(-1)
temp_predict_sort_index = np.argsort(-temp_predict_vector)
temp_test_target_vector = self.test_target.reshape(-1)
temp_test_target_sort = temp_test_target_vector[temp_predict_sort_index]
temp_f1_list = []
TP_FN = np.sum(self.test_target > 0)
for i in range(temp_predict_sort_index.size):
TP = np.sum(temp_test_target_sort[0:i + 1] == 1)
P = TP / (i + 1)
R = TP / TP_FN
temp_f1 = 0
if (P + R) != 0:
temp_f1 = 2.0 * P * R / (P + R)
pass
temp_f1_list.append(temp_f1)
pass
temp_f1_list = np.array(temp_f1_list)
temp_max_f1_index = np.argmax(temp_f1_list)
peak_f1 = np.max(temp_f1_list)
threshold_value = temp_predict_vector[temp_max_f1_index]
self.threshold_value = threshold_value
print("compute_peak_f1:", peak_f1)
return peak_f1
pass
直接从算法的代码中粘贴过来了。
predict_prob_matrix是模型预测的数值结果,test_target是测试集对应的真正值。
matlab代码:
function [score] = F1(label_prob, label_target)
[sortArray,temp] = sort(-label_prob);
allLabelSort = label_target(temp);
tempF1 = zeros(1, numel(temp));
allTP = sum(label_target == 1);
for i = 1: numel(temp)
sliceArray = allLabelSort(1:i);
TP = sum(sliceArray == 1);
P = TP / (i);
R = TP / allTP;
if(P + R == 0)
tempF1(i) = 0;
else
tempF1(i) = (2.0 * P * R) / (P + R);
end
end
score = max(tempF1);
end
代码一:
def compute_auc(self):
temp_predict_vector = self.predict_prob_matrix.reshape(-1)
temp_test_target_vector = self.test_target.reshape(-1)
temp_predict_sort_index = np.argsort(temp_predict_vector)
M, N = 0, 0
for i in range(temp_predict_vector.size):
if temp_test_target_vector[i] == 1:
M += 1
else:
N = N + 1
pass
pass
sigma = 0
for i in range(temp_predict_vector.size - 1, -1, -1):
if temp_test_target_vector[temp_predict_sort_index[i]] == 1:
sigma += i + 1
pass
pass
auc = (sigma - (M + 1) * M / 2) / (M * N)
print("compute_auc:", auc)
return auc
代码二:
def computeAUC(self):
tempProbVector = self.predict_prob_matrix.reshape(-1)
tempTargetVector = self.test_target.reshape(-1)
auc = metrics.roc_auc_score(tempTargetVector, tempProbVector)
print("computeAUC:", auc)
return auc
matlab实现:
function [result] = AUC(output, test_targets)
[A,I]=sort(output);
M=0;N=0;
for i=1:length(output)
if(test_targets(i)==1)
M=M+1;
else
N=N+1;
end
end
sigma=0;
for i=M+N:-1:1
if(test_targets(I(i))==1)
sigma=sigma+i;
end
end
result=(sigma-(M+1)*M/2)/(M*N);
end
python实现
代码一:
def compute_ndgc(self):
temp_predict_vector = self.predict_prob_matrix.reshape(-1)
temp_test_target_vector = self.test_target.reshape(-1)
temp_predict_sort_index = np.argsort(-temp_predict_vector)
temp_predict_target_sort = temp_test_target_vector[temp_predict_sort_index]
temp_target_sort = np.sort(temp_test_target_vector)
temp_target_sort = np.flipud(temp_target_sort)
dcg = 0;
for i in range(temp_predict_vector.size):
rel = temp_predict_target_sort[i]
denominator = math.log2(i + 2)
dcg += rel / denominator
idcg = 0
for i in range(temp_predict_vector.size):
rel = temp_target_sort[i]
denominator = math.log2(i + 2)
idcg += rel / denominator
ndcg = dcg / idcg
print("compute_ndgc: ", ndcg)
return ndcg
代码二
def computeNDCG(self):
# 获得概率序列与原目标序列
tempProbVector = self.predict_prob_matrix.reshape(-1)
tempTargetVector = self.test_target.reshape(-1)
# 按照概率序列排序原1/0串
temp = np.argsort(-tempProbVector)
allLabelSort = tempTargetVector[temp]
# 获得最佳序列: 1111...10000...0
sortedTargetVector = np.sort(tempTargetVector)[::-1]
# compute DCG(使用预测的顺序, rel是真实顺序, 实际是111110111101110000001000100
DCG = 0
for i in range(temp.size):
rel = allLabelSort[i]
denominator = np.log2(i + 2)
DCG += (rel / denominator)
# compute iDCG(使用最佳顺序: 11111111110000000000)
iDCG = 0
for i in range(temp.size):
rel = sortedTargetVector[i]
denominator = np.log2(i + 2)
iDCG += (rel / denominator)
ndcg = DCG / iDCG
print("computeNDCG: ", ndcg)
return ndcg
matlab实现:
function [ndcg] = NDCG(label_prob, label_target)
[sortArray,temp] = sort(-label_prob); % 按照预测值进行排序
allLabelSort = label_target(temp); % 根据排序后的预测值获取对应的标签值
sortedTargetVector = sort(label_target); % 对标签值进行排序(这里是升序排列)
sortedTargetVector = fliplr(sortedTargetVector);%对排序后的标签值进行翻转,使之呈降序排列
dcg = 0;
for i = 1: numel(temp)
rel = allLabelSort(i);
denominator = log2(i + 1);
dcg = dcg + (rel / denominator);
end
idcg = 0; %最理想的DCG状态就是按照目标值的进行排列
for i = 1: numel(temp)
rel = sortedTargetVector(i);
denominator = log2(i + 1);
idcg = idcg + (rel / denominator);
end
ndcg = max(dcg / idcg);
end
以后直接上这儿粘贴了,为偷懒打下基础。