ndcg 指标和k的关系

ndcg 指标和k的关系

实验不同k

    for i in range(3,7):
        rele_table,pred_table,pred_rand = {},{},{}
        rele_table['kol'] = 0.1
        rele_table['media'] = 0.2
        rele_table['other'] = 0.3
        rele_table['guandian'] = 0.4
        rele_table['taolun'] = 0.5
        rele_table['f012'] = 0.6

        pred_table['kol'] = 0.34828833
        pred_table['media'] = 0.31925637
        pred_table['other'] = 0.30245525
        pred_table['guandian'] = 0.13245525
        pred_table['taolun'] = 0.83245525
        pred_table['f012'] = 0.37245525

        mids = ['kol',"media","other","guandian","taolun","f012"]

        pred_rand['kol'] = random.random()
        pred_rand['media'] = random.random()
        pred_rand['other'] = random.random()
        pred_rand['guandian'] = random.random()
        pred_rand['taolun'] = random.random()
        pred_rand['f012'] = random.random()

        value = cal_list_ndcg(mids, rele_table, pred_table, i)
        ndcg_rand = cal_list_ndcg(mids, rele_table, pred_rand, i)
        print(i)
        print(value,ndcg_rand)
        print(value / i,ndcg_rand / i)

def cal_list_ndcg(mids, rele_table, pred_table, n):
    mids = sorted(mids, reverse=True, key=lambda x: rele_table[x])
    idcg = 0
    for i, m in enumerate(mids):
        if i >= n:
            break
        idcg += ((2**rele_table[m] - 1) / (math.log2(i+2)))
    mids = sorted(mids, reverse=True, key=lambda x: pred_table[x])
    # print(" ".join([str(rele_table[mid]) for mid in mids]))
    dcg = 0
    for i, m in enumerate(mids):
        if i >= n:
            break
        dcg += ((2**rele_table[m] - 1) / (math.log2(i+2)))
    return dcg / idcg

dcg随rel(i)变化情况

i越大 ,dcg越大
dcg += ((2**rele_table[m] - 1) / (math.log2(i+2)))

结果

3
1.0 0.4281434559617804
0.3333333333333333 0.14271448532059347
4
1.0 0.768846962242674
0.25 0.1922117405606685
5
1.0 0.8405557275318762
0.2 0.16811114550637524
6
1.0 0.8676354240144575
0.16666666666666666 0.14460590400240958

结论

在排序一致的前提下
可见k越大,ndcg越小
原因应该是
随着每次考虑的item 越多,分母cout增加,但是分子ndcg不变,导致整体下降。

那么为什么分子ndcg不变,也就是 dcg / idcg 不变呢
原因是他们排序完全一致 dcg==idcg

理想的idcg 应该随着k 的增大,增量不断下降,因为重要的都被排在前面

实际如果dcg越大,整体比例会越大,在k一定的情况下

!!但是如果排序完全相反

3
0.3001289811601986 0.5473110827374296
0.1000429937200662 0.18243702757914318
4
0.40407676211684973 0.7345030054951796
0.10101919052921243 0.1836257513737949
5
0.5293144802949171 0.5967650853888128
0.10586289605898343 0.11935301707776255
6
0.68132617129277 0.888722526516354
0.11355436188212832 0.148120421086059

可以看到,k越大,ndcg越大,分析可能是k大的时候容错比较好

所以如果在排序好的情况下,k越大,ndcg越小(因为分子永恒为1,分母变大count++)。
但是如果在排序差的情况下,k越大,ndcg越大(idcg 和不会变,但如果dcg加的多了,整体值会增加)。

你可能感兴趣的:(机器学习,算法,python)