Paddle深度学习快速入门

1、定义Dataset

train_dataset = BaseDataset(train_df)
test_dataset = BaseDataset(test_df)

其中train_df和test_df中的每列数据项是列表

2、指标计算

  \space   2.1 Hit Rate

def hitrate(test_df,k=20):
    user_num = test_df['user_id'].nunique()
    test_gd_df = test_df[test_df['ranking']<=k].reset_index(drop=True)
    return test_gd_df['label'].sum() / user_num

点击数/用户数

  \space   2.2 NDCG

NDCG的全称是:Normalized Discounted Cumulative Gain(归一化折损累计增益),用来评估排序结果

def ndcg(test_df,k=20):
    '''
    idcg@k 一定为1
    dcg@k 1/log_2(ranking+1) -> log(2)/log(ranking+1)
    '''
    user_num = test_df['user_id'].nunique()
    test_gd_df = test_df[test_df['ranking']<=k].reset_index(drop=True)
    
    test_gd_df = test_gd_df[test_gd_df['label']==1].reset_index(drop=True)
    test_gd_df['ndcg'] = math.log(2) / np.log(test_gd_df['ranking']+1)
    return test_gd_df['ndcg'].sum() / user_num

意见

user_num = test_df['user_id'].nunique()

在计算两个指标时,这行代码都写在了函数体内,可以将user_num定义成类变量,减少计算次数

你可能感兴趣的:(推荐论文,paddle,深度学习,python)