recsyscode - implementations of classic algorithms of recommender system - Google Project Hosting

recsyscode - implementations of classic algorithms of recommender system - Google Project Hosting

重要更新: svd++ 和 combine(svd++和lfGNbr的结合)算法有重要bug,在更新sumQE的时候不对,现在已经修正,请大家更新代码svn update. 更新的内容如下:"sumQE[K_NUM+1] = eui * q[itemI][k];” 改成 "sumQE[k] += eui * q[itemI][k];"

important update: There are very big bugs on svd++ model and combine model(svd++ and latent factor global neighborhood model! Please update your code using "svn update" command! The details of change is: "sumQE[K_NUM+1] = eui * q[itemI][k];" was changed to "sumQE[k] += eui * q[itemI][k];"

English Version





(3)其他一些困难,此处省略1000字:-) ……




(2)代码使用GPL V3协议发布,大家在使用的时候请保留版权信息。

(3)目前实现的算法有:baseline predictor, knn, svd, svd++, asymmetric-svd, global neighborhood based model(gNbr), combine of svd++ and gNbr.



(6)所有代码在debian 6.0和RHEL AS4(gcc3.4.6)上测试通过,其他环境因为条件所限没有测试


(1) 新人第一步:快速使用本project的入门指南

(2) 获取本project中用到的netflix的测试集和训练集数据的方法:netflix数据预处理方法

(3) 本project中knn model使用步骤

(4) 在代码实现过程中遇到的问题

(5) knn算法执行的一些结果

(6) svd算法执行的一些结果

(7) 目前能够获得的一些测试数据集

(8) Recommender system Handbook下载

(9) Recommender Systems:An Introduction下载

(10) 一些相关的论文资料

希望有更多的人加入这个project,将更多的算法代码贡献出来,比如目前尚缺RBM model,temporal model的实现

想加入开发的或者交流的朋友可以从这里很方便的联系我:我的联系方式 或者直接给我发mail,honglianglv at gmai的邮箱

ps: 这也是我的第一个开源项目,用了这么多的开源软件,今天算是迈出了回馈开源界的第一步,以后如果有好的东西我也会分享给大家

English version:

Why start this project?

I encountered a lot of difficulties when I implement the classic algorithms of recommender system:

(1) for large-scale data (the netflix dataset,100M scoring data), the way of arbitrary using the cpu and memory does not work. Because the large amount of data, algorithms and data structures should be compactly designed to avoid too large time and space consumption to accept.

(2) data initialization and parameter setting has a great influence on the results, in order to reproduce the results of koren, my first svd model implementation took about 2 weeks, including 4 days of tuning parameters.

(3) some other difficulties, ......

In order to reduce the difficulty of entering the field of recommendation system. I provide some details of the algorithms as long as the koren's papers in the form of code. So that the newbie of recommender system can get started as soon as possible.And also give the friends in the recommender system a good reference. Also hope there are more and more people to enter this interesting and useful area!

Code Description:

(1) all of the code is written by c + + (c + +, with high efficiency for large dataset such as netflix dataset. Scripting language is too slow)

(2) The code released under the license GPL V3. Please retain the copyright information when use the code.

(3)Current finished algorithms: baseline predictor, knn, svd, svd + +, asymmetric-svd, global neighborhood based model (gNbr), combine of svd + + and gNbr.

(4) code style

(5) the code must have a lot of imperfections and mistakes here, if you find some bug or mistake, please email me. I also hope you join me to perfect this project.

(6)all the code are tested under the enviroment of debain 6.0 and RHEL AS4(gcc 3.4.6).

Some useful links:

(1) quick start: quick start of this project

(2) The way to get the training set and test set of netflix dataset:netflix dataset preprocessing

(3) the steps of using knn model in this project

(4) the problems I encountered in the implementation

(5) the results of knn algorithm

(6) the results of svd algorithm

(7) the datasets available now

(8) Recommender system Handbook download

(9) some papers related to this project

Hope more people to join this project, contribute more code to this project. Such as the current shortfall: the implimentation of RBM model and temporal model

Friends who want to join this project can contact me here, or contact me directly via email, honglianglv at gmail

ps: This is my first open source project after I have benefited from open source softwear for so many years. This is my first step on the way of open source and I hope to contribute more in the future.
