推荐系统 有关recall,precision.实验总结

recall

在这里插入图片描述
为用户u推荐N个物品(记为R(u)),令用户在测试集上喜欢的物品的集合为T(u)
,然后计算召回率。召回率描述还有多少比例的用户-物品评分记录包含在最终的推荐列表中。所以T(u)是测试集的总长度。

precison

在这里插入图片描述
准确率描述最终的推荐列表中有多少比例是发生过的用户-物品评分记录。

论文中实例总结,这篇文章将持续更新。

  • GRU4REC中,计算TOP 20的召回率时,每个session的target只有一个,所以T(u)的长度是1,意思为当取推荐的前20个数值时,(也就是softmax的结果取前20个概率对应的item),中是否包含了这个target,如果包含了就记为1。
def get_recall(indices, targets):  
  """    Calculates the recall score for the given predictions and targets
    Args:        indices (Bxk): torch.LongTensor. top-k indices predicted by the model.        targets (B): 			  torch.LongTensor. actual target indices.
    Returns:        recall (float): the recall score    """
    targets = targets.view(-1, 1).expand_as(indices)    
    hits = (targets == indices).nonzero()    
    if len(hits) == 0:        
              return 0    
    n_hits = (targets == indices).nonzero()[:, :-1].size(0)    
    recall = float(n_hits) / targets.size(0)   
    return recall

计算时,先计算一个batch的recall,最后对一个epoch的recall取平均。
也可以将一个epoch所有的正确的项加起来,除以一个epoch的也就是总的test的长度,计算recall。

  • Caser中 为用户生成一个推荐列表N,然后使用每个用户后百分之20的数据作为测试集数据,最后使用预测列表与测试列表进行准确率等的计算,完全符合公式的定义。只要包含即可,不用 按照顺序来

推荐系统 有关recall,precision.实验总结_第1张图片

推荐系统 有关recall,precision.实验总结_第2张图片


def _compute_precision_recall(targets, predictions, k):

    pred = predictions[:k]    
    num_hit = len(set(pred).intersection(set(targets)))    
    precision = float(num_hit) / len(pred)    
    recall = float(num_hit) / len(targets)    
    return precision, recall

def evaluate_ranking(model, test, train=None, k=10):    
"""    
Compute Precision@k, Recall@k scores and average precision (AP).    
One score is given for every user with interactions in the test   
 set, representing the AP, Precision@k and Recall@k of all their    test items.
    Parameters    ----------
    model: fitted instance of a recommender model        The model to evaluate.    
    test: :class:`spotlight.interactions.Interactions`        Test interactions.    
    train: :class:`spotlight.interactions.Interactions`, optional        Train interactions. If supplied, rated items in        interactions will be excluded.    
    k: int or array of int,        The maximum number of predicted items  
      """

    test = test.tocsr()
    if train is not None:        
    	train = train.tocsr()
    if not isinstance(k, list):        
    	ks = [k]    
    else:       
    	 ks = k
    precisions = [list() for _ in range(len(ks))]    
    recalls = [list() for _ in range(len(ks))]    
    apks = list()
    for user_id, row in enumerate(test):
        if not len(row.indices):      
              continue
        predictions = -model.predict(user_id)        
        predictions = predictions.argsort()
        if train is not None:        
            rated = set(train[user_id].indices)        
         else:       
              rated = []
        predictions = [p for p in predictions if p not in rated]
        targets = row.indices
        for i, _k in enumerate(ks):       
             precision, recall = _compute_precision_recall(targets, predictions, _k)    
             precisions[i].append(precision)         
             recalls[i].append(recall)
              apks.append(_compute_apk(targets, predictions, k=np.inf))
      precisions = [np.array(i) for i in precisions]    
      recalls = [np.array(i) for i in recalls]
     if not isinstance(k, list):     
        precisions = precisions[0]      
        recalls = recalls[0]
    mean_aps = np.mean(apks)
    return precisions, recalls, mean_aps


  • Next Item Recommendation with Self-Attention

推荐系统 有关recall,precision.实验总结_第3张图片

推荐系统 有关recall,precision.实验总结_第4张图片

该论文中的计算方法与GRU4REC的计算方法类似,在计算HR@50 的时候,也是对于生成的rank50中是否有目标target进行计算,如果有,则为1,没有则为0
计算mrr的意思是衡量这个模型排序排的怎么样,直观地说,在实践中,将groundtruth项排列得越高越好。MRR关心groundtruth项的位置,并计算放置groundtruth项的位置的倒数。

`

你可能感兴趣的:(自然语言处理,python)