最近在看项亮的《推荐系统实践》,文章只有只有代码片段,没有完整的代码。所以在原有代码之上,根据书籍介绍的内容,还原了部分代码。
UserCF算法(基于用户的协同过滤算法):
令 N(u) 表示用户 u 的正反馈的物品集合,令 N(v) 表示用户 v 的正反馈物品集合。那么Jaccard相似度为:
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 12:46:42 2017
@author: lanlandetian
"""
import math
import operator
'''
#W is the similarity matrix
def UserSimilarity(train):
W = dict()
for u in train.keys():
for v in train.keys():
if u == v:
continue
W[u][v] = len(train[u] & train[v])
W[u][v] /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)
return W
'''
def UserSimilarity(train):
# build inverse table for item_users
item_users = dict()
for u,items in train.items():
for i in items.keys():
if i not in item_users:
item_users[i] = set()
item_users[i].add(u)
#calculate co-rated items between users
C = dict()
N = dict()
for i,users in item_users.items():
for u in users:
N.setdefault(u,0)
N[u] += 1
C.setdefault(u,{})
for v in users:
if u == v:
continue
C[u].setdefault(v,0)
C[u][v] += 1
#calculate finial similarity matrix W
W = C.copy()
for u, related_users in C.items():
for v, cuv in related_users.items():
W[u][v] = cuv / math.sqrt(N[u] * N[v])
return W
def Recommend(user,train,W,K = 3):
rank = dict()
interacted_items = train[user]
for v, wuv in sorted(W[user].items(), key = operator.itemgetter(1), \
reverse = True)[0:K]:
for i, rvi in train[v].items():
#we should filter items user interacted before
if i in interacted_items:
continue
rank.setdefault(i,0)
rank[i] += wuv * rvi
return rank
def Recommendation(users, train, W, K = 3):
result = dict()
for user in users:
rank = Recommend(user,train,W,K)
R = sorted(rank.items(), key = operator.itemgetter(1), \
reverse = True)
result[user] = R
return result
用户相似度的改进(UserCF_IIF算法):
两个用户对于冷门物品的的行为更能说明他们兴趣的相似度。因此,改进的用户相似度公式如下:
ItemCF算法:
令 N(i) 表示与物品i交互过的用户的结合。则物品i和物品j的相似度为
代码如下:
# -*- coding: utf-8 -*-
"""
Created on Sun Dec 31 13:09:26 2017
@author: lanlandetian
"""
import math
import operator
def ItemSimilarity(train):
#calculate co-rated users between items
#构建用户-物品表
C =dict()
N = dict()
for u,items in train.items():
for i in items:
N.setdefault(i,0)
N[i] += 1
C.setdefault(i,{})
for j in items:
if i == j:
continue
C[i].setdefault(j,0)
C[i][j] += 1
#calculate finial similarity matrix W
W = C.copy()
for i,related_items in C.items():
for j,cij in related_items.items():
W[i][j] = cij / math.sqrt(N[i] * N[j])
return W
def Recommend(user_id,train, W,K = 3):
rank = dict()
ru = train[user_id]
for i,pi in ru.items():
for j,wij in sorted(W[i].items(), \
key = operator.itemgetter(1), reverse = True)[0:K]:
if j in ru:
continue
rank.setdefault(j,0)
rank[j] += pi * wij
return rank
#class Node:
# def __init__(self):
# self.weight = 0
# self.reason = dict()
#
#def Recommend(user_id,train, W,K =3):
# rank = dict()
# ru = train[user_id]
# for i,pi in ru.items():
# for j,wij in sorted(W[i].items(), \
# key = operator.itemgetter(1), reverse = True)[0:K]:
# if j in ru:
# continue
# if j not in rank:
# rank[j] = Node()
# rank[j].reason.setdefault(i,0)
# rank[j].weight += pi * wij
# rank[j].reason[i] = pi * wij
# return rank
def Recommendation(users, train, W, K = 3):
result = dict()
for user in users:
rank = Recommend(user,train,W,K)
R = sorted(rank.items(), key = operator.itemgetter(1), \
reverse = True)
result[user] = R
return result
改进的物品相似度(UserCF_IUF):
活跃用户对物品相似度的贡献应该小于不活跃的用户,应该增加IUF
参数来修正物品相似度的计算公式:
此外,书中是使用dict表示数据集的。所以,我在github中是实现了整个算法的流程,包括数据读取,和最后的交叉验证。
github网址如下:
https://github.com/1092798448/RecSys.git