推荐系统学习之近邻算法推荐10个电影

推荐系统学习之近邻算法推荐10个电影

实验环境:surprise (surprise地址)+python2.7

实现思路如下图:

推荐系统学习之近邻算法推荐10个电影_第1张图片

代码如下:

# coding:utf-8  设置编码
'''
使用MovieLens的数据,根据某个电影名,计算与其相邻最近的10个电影
'''
#导入工具库
import os
import io

from surprise import Dataset
from surprise import KNNBaseline

#加载数据数据集
data=Dataset.load_builtin('ml-100k')

# 存储为稀疏矩阵
train_set=data.build_full_trainset()

#字典参数
sim_options = {'name': 'pearson_baseline', 'user_based': False}
# 创建KNNBaseline实例
algo=KNNBaseline(sim_options=sim_options)
#给实例feed数据
algo.train(train_set)

#构建rid_name字典
def rid_name_dic():
    file_name = (os.path.expanduser('~') +
                 '/.surprise_data/ml-100k/ml-100k/u.item')
    #定义由rid-name的dic
    rid_name_dic={}
    name_rid_dic={}
    with io.open(file_name,'r',encoding='ISO-8859-1') as f:
        for line in f:
            line = line.split('|')
            rid_name_dic[line[0]] = line[1]
            name_rid_dic[line[1]] = line[0]
    return rid_name_dic,name_rid_dic

film_name='Toy Story (1995)'
#name转化为iid,作为算法输入
rid_name_dic,name_rid_dic=rid_name_dic()
film_rid=name_rid_dic[film_name]
film_iid=algo.trainset.to_inner_iid(film_rid)
#inner_id 参数k设置近邻数k,并返回近邻的iid
film_neighbors=algo.get_neighbors(film_iid,k=10)

# 将近邻的iid转化为film_name
rid_list=(algo.trainset.to_raw_iid(inner_id) for inner_id in film_neighbors)
name_list=(rid_name_dic[rid] for rid in rid_list)
print 'Toy Story (1995) 相近的十个电影是: '
for film_name in name_list:
    print film_name


输出如下:

Toy Story (1995) 相近的十个电影是: 
Beauty and the Beast (1991)
Raiders of the Lost Ark (1981)
That Thing You Do! (1996)
Lion King, The (1994)
Craft, The (1996)
Liar Liar (1997)
Aladdin (1992)
Cool Hand Luke (1967)
Winnie the Pooh and the Blustery Day (1968)
Indiana Jones and the Last Crusade (1989)

补充解释:

rid是电影的原始id,iid是构建相似度矩阵的下标,每部电影的内部索引。

完整代码地址

你可能感兴趣的:(机器学习)