推荐系统-基于邻域的算法

最近在看项亮的《推荐系统实践》,文章只有只有代码片段,没有完整的代码。所以在原有代码之上,根据书籍介绍的内容,还原了部分代码。
UserCF算法(基于用户的协同过滤算法):
N(u) 表示用户 u 的正反馈的物品集合,令 N(v) 表示用户 v 的正反馈物品集合。那么Jaccard相似度为:

wuv=|N(u)N(v)||N(u)N(v)|

余弦相似度计算:
wuv=|N(u)N(v)||N(u)||N(v)|

得到用户之间的兴趣相似度之后,UserCF算法会给用户推荐和他兴趣最相似的K个用户喜欢的物品。如下公式度量UserCF算法中用户u对物品i的感兴趣程度:
p(u,i)=vS(u,K)N(i)wuvrvi

其中, S(u,K) 表示和用户u兴趣最接近的K个用户。在隐式反馈中, rvi=1
代码如下:

# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 12:46:42 2017

@author: lanlandetian
"""
import math
import operator


'''
#W is the similarity matrix
def UserSimilarity(train):
    W = dict()
    for u in train.keys():
        for v in train.keys():
            if u == v:
                continue
            W[u][v] = len(train[u] & train[v])
            W[u][v] /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)
    return W
'''



def UserSimilarity(train):
    # build inverse table for item_users
    item_users = dict()
    for u,items in train.items():
        for i in items.keys():
            if i not in item_users:
                item_users[i] = set()
            item_users[i].add(u)

    #calculate co-rated items between users
    C = dict()
    N = dict()
    for i,users in item_users.items():
        for u in users:
            N.setdefault(u,0)
            N[u] += 1
            C.setdefault(u,{})
            for v in users:
                if u == v:
                    continue
                C[u].setdefault(v,0)
                C[u][v] += 1

    #calculate finial similarity matrix W
    W = C.copy()
    for u, related_users in C.items():
        for v, cuv in related_users.items():
            W[u][v] = cuv / math.sqrt(N[u] * N[v])
    return W


def Recommend(user,train,W,K = 3):
    rank = dict()
    interacted_items = train[user]
    for v, wuv in sorted(W[user].items(), key = operator.itemgetter(1), \
                         reverse = True)[0:K]:
        for i, rvi in train[v].items():
            #we should filter items user interacted before 
            if i in interacted_items:
                continue
            rank.setdefault(i,0)
            rank[i] += wuv * rvi
    return rank

def Recommendation(users, train, W, K = 3):
    result = dict()
    for user in users:
        rank = Recommend(user,train,W,K)
        R = sorted(rank.items(), key = operator.itemgetter(1), \
                   reverse = True)
        result[user] = R
    return result                 

用户相似度的改进(UserCF_IIF算法):
两个用户对于冷门物品的的行为更能说明他们兴趣的相似度。因此,改进的用户相似度公式如下:

wuv=iN(u)N(v)1log(1+|N(i)|)|N(u)||N(v)|

该公式中, 1log(1+|N(i)|) 惩罚了热门物品对于相似度的影响。
代码如下与UserCF类似。

ItemCF算法:
N(i) 表示与物品i交互过的用户的结合。则物品i和物品j的相似度为

wij=|N(i)N(j)||N(i)||N(j)|

在得到物品的相似度后,ItemCF通过如下公式计算用户u对物品i的兴趣:
p(u,i)=jN(u)S(i,K)wijrui

其中, S(i,K) 表示与物品i最相近的K个物品的集合。

代码如下:

# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 13:09:26 2017

@author: lanlandetian
"""

import math
import operator


def ItemSimilarity(train):
    #calculate co-rated users between items
    #构建用户-物品表
    C =dict()
    N = dict()
    for u,items in train.items():
        for i in items:
            N.setdefault(i,0)
            N[i] += 1
            C.setdefault(i,{})
            for j in items:
                if i == j:
                    continue
                C[i].setdefault(j,0)
                C[i][j] += 1

    #calculate finial similarity matrix W
    W = C.copy()
    for i,related_items in C.items():
        for j,cij in related_items.items():
            W[i][j] = cij / math.sqrt(N[i] * N[j])
    return W


def Recommend(user_id,train, W,K = 3):
    rank = dict()
    ru = train[user_id]
    for i,pi in ru.items():
        for j,wij in sorted(W[i].items(), \
                           key = operator.itemgetter(1), reverse = True)[0:K]:
            if j in ru:
                continue
            rank.setdefault(j,0)
            rank[j] += pi * wij
    return rank


#class Node:
#    def __init__(self):
#        self.weight = 0
#        self.reason = dict()
#    
#def Recommend(user_id,train, W,K =3):
#    rank = dict()
#    ru = train[user_id]
#    for i,pi in ru.items():
#        for j,wij in sorted(W[i].items(), \
#                           key = operator.itemgetter(1), reverse = True)[0:K]:
#            if j in ru:
#                continue
#            if j not in rank:
#                rank[j] = Node()
#            rank[j].reason.setdefault(i,0)
#            rank[j].weight += pi * wij
#            rank[j].reason[i] = pi * wij
#    return rank

def Recommendation(users, train, W, K = 3):
    result = dict()
    for user in users:
        rank = Recommend(user,train,W,K)
        R = sorted(rank.items(), key = operator.itemgetter(1), \
                   reverse = True)
        result[user] = R
    return result

改进的物品相似度(UserCF_IUF):
活跃用户对物品相似度的贡献应该小于不活跃的用户,应该增加IUF
参数来修正物品相似度的计算公式:

wi,j=uN(i)N(j)1log(1+|N(u)|)|N(i)||N(j)|

代码与ItemCF类似。

此外,书中是使用dict表示数据集的。所以,我在github中是实现了整个算法的流程,包括数据读取,和最后的交叉验证。
github网址如下:
https://github.com/1092798448/RecSys.git

你可能感兴趣的:(★机器学习)