推荐系统---surprise库的测试

1:加载数据集

def load_format2trainset():
    file_path = "F:\\ML\\recommendation_data\\music_playlist_farmat.txt"
    # 指定文件格式
    reader = Reader(line_format='user item rating timestamp', sep=',')
    # 从文件读取数据
    music_data = Dataset.load_from_file(file_path, reader=reader)
    print("构建数据集...")
    retrainset = music_data.build_full_trainset()
    return retrainset

主要用的到的类有:Reader --- 解析包含评分的文件  reader类

                             Dataset--- 包含一些数据集操作,主要方法有load_builtion('数据集名')  #加载内置数据集

                                                                                              load_from_df()  #加载pandas结构数据

                                                                                              load_from_file() #加载用户自己的数据

                                                                                              load_from_folds() #加载多个数据,例如

# folds_files is a list of tuples containing file paths:
# [(u1.base, u1.test), (u2.base, u2.test), ... (u5.base, u5.test)]
train_file = files_dir + 'u%d.base'
test_file = files_dir + 'u%d.test'
folds_files = [(train_file % i, test_file % i) for i in (1, 2, 3, 4, 5)]

data = Dataset.load_from_folds(folds_files, reader=reader)
对数据集的操作包括:
build_full_trainset()   #不对数据集做切分,返回整个数据
split(n_folds=5, shuffle=True)  #切分数据集

2:算法选择,surprise库包含了基于协同过滤的和基于矩阵分解的两大类算法。

random_pred.NormalPredictor Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.
baseline_only.BaselineOnly Algorithm predicting the baseline estimate for given user and item.
knns.KNNBasic A basic collaborative filtering algorithm.
knns.KNNWithMeans A basic collaborative filtering algorithm, taking into account the mean ratings of each user.
knns.KNNWithZScore A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.
knns.KNNBaseline A basic collaborative filtering algorithm taking into account a baseline rating.
matrix_factorization.SVD The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.
matrix_factorization.SVDpp The SVD++ algorithm, an extension of SVD taking into account implicit ratings.
matrix_factorization.NMF A collaborative filtering algorithm based on Non-negative Matrix Factorization.
slope_one.SlopeOne A simple yet accurate collaborative filtering algorithm.
co_clustering.CoClustering A collaborative filtering algorithm based on co-clustering.

3:模型训练

下面是示例代码(非完整,以推荐歌单为例):

    algo = KNNBaseline()
    algo.fit(trainset)    #训练模型
    current_playlist = list(name_id_dic.keys())[listid] #name_id_dic存储的是歌单名到歌单id的映射
    playlist_id = name_id_dic[current_playlist]
    # 取出来对应的内部user id => to_inner_uid
    playlist_inner_id = algo.trainset.to_inner_uid(playlist_id)  #将raw_i转化成inner_id
    playlist_neighbors = algo.get_neighbors(playlist_inner_id, k=10)  #获取歌单的近邻歌单,返回值是inner_id

  • 主要用到了Trainset类和里面的方法,可以参考trainset,其主要方法和属性包括:

 to_inner_uid(ruid) :Convert     a user raw id      to       an inner id.

 to_inner_iid(riid)            :Convert      an item raw id     to      an inner id.

 to_raw_iid(iiid)              :Convert       an item inner id   to       a raw id.

 to_raw_uid(iuid)             :Convert        a user inner id     to       a raw id.

 all_items()                        :返回一个可迭代的items的inner_id 列表

 all_users()                         :  Generator function to iterate over all users. 返回users的inner_id列表

 all_ratings()                       :  返回:A tuple (uid, iid, rating)   id为内部id

 build_testset()                  :生成测试数据list

 ur ------用户评分, 返回字典,value是:(item_inner_id, rating). key值是user的inner_id

  ir ------物品评分. 返回字典,value:(user_inner_id, rating). The keys are item inner ids.

n_users /  n_items / n_ratings   :数据集包含的用户,物品,评分数量 

rating_scale  :评分范围

global_mean :评分均值

   算法基础类: The algorithm base class ,主要包括

  • fit(trainset) :              根据给定数据集训练算法  Return: self
  • get_neighbors(iid, k) :参数:iid--user or item 的inner_id  k:邻居数  Return:K个最近邻居的inner_id
  • predict(uid, iid, r_ui=None, clip=True, verbose=False)  : 预测给定ueser或者item 的评分,算法转化raw_id==》inner_id,然后执行预测函数,预测失败时例如user 和 item 都不知道的情况下,返回全局评分均值

Parameters:
uid – (Raw) id of the user. See this note.
iid – (Raw) id of the item. See this note.
r_ui (float) – The true rating ruirui. Optional, default is None.
clip (bool) – Whether to clip the estimation into the rating scale. For example, if r^uir^ui is 5.55.5 while the rating scale is [1,5][1,5], then r^uir^ui is set to 55. Same goes if r^ui<1r^ui<1. Default is True.
verbose (bool) – Whether to print details of the prediction. Default is False.
Returns:
A Prediction object containing:
The (raw) user id uid.
The (raw) item id iid.
The true rating r_ui (r^uir^ui).
The estimated rating (r^uir^ui).

Some additional details about the prediction that might be useful for later analysis.


你可能感兴趣的:(ML)