Mahout对于基于内容的推荐较少。前两者是协同过滤的返利,也就是仅仅通过了解用户与物品之间的关系进行推荐。
这里的数据是以偏好值表达的,指的是用户对物品的喜好指数。
1,101,5
1,102,3
1,103,2.5
2,101,2
2,102,2.5
2,103,5
2,104,2
3,101,2.5
3,104,4
3,105,4.5
3,107,5
4,101,5
4,103,3
4,104,4.5
4,106,4
5,101,4
5,102,3
5,103,2
5,104,4
5,105,3.5
5,106,4
这个时候可以分析下,用户对物品的喜欢,这里举个例子:1,5用户喜好相同,都喜欢101,然后喜欢102,最后103。然后5用户对104,106的评价是4,那么推荐给1用户应该不错。下面就是我们第一个推荐程序。
直接放出代码:
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
import java.io.*;
import java.util.*;
class RecommenderIntro {
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("/Users/ericxk/Downloads/test.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); //生成推荐系统
List<RecommendedItem> recommendations = recommender.recommend(1, 2); //为用户1推荐2个物品
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
常见的方法是评估其估计的偏好值的质量,也就是评估所估计的偏好在多大程度上与实际的偏好相匹配。
书中这里用的平均差和均方根来作为评价指标,下面的代码是用平均差。
代码如下:
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
import org.apache.mahout.common.RandomUtils;
import java.io.*;
class RecommenderIntro {
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("/Users/ericxk/Downloads/test.csv"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderBuilder builder = new RecommenderBuilder() { //构建推荐程序
public Recommender buildRecommender(DataModel model)throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(builder, null, model, 0.7, 1.0); //训练70% 测试30%
System.out.println(score);
}
}
如果要使用均方根,可以用RMSRecommenderEvaluator代替AverageAbsoluteDifferenceRecommenderEvaluator。这里吐槽下,这些函数名字都好长。不过长的好处就是一看函数名就知道是干什么用的。
查准率(precision)查全率(recall),这两个很好理解,不懂可以百度下。在放出precision和recall之前要说问题。##就是我们推荐系统是
推荐用户没有交互过##,但是最好的推荐结果不一定都在那些用户已知的物品中,测试框架这里只会从用户已有的偏好集合中选择好的推荐。这里可以打个比喻就是,一道题有很多种解法,结果你做的解法都不在正确解法里面,但是也有可能是对的。话题转过来,对于推荐系统,如果偏好是布尔型即是01,那么这个问题就很麻烦,因为没有相对偏好的概念可用于选出包含好物品的数据子集。但是这个precision和recall还是有一定用处的,不能说是很完美的选择。下面放出代码:
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
import org.apache.mahout.common.RandomUtils;
import java.io.*;
class RecommenderIntro {
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("/Users/ericxk/Downloads/test.csv"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
public Recommender buildRecommender(DataModel model)throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =new NearestNUserNeighborhood(2, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 2, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
}
}
书中介绍了这个网站https://www.grouplens.org/, 具体给的数据集链接给错了,应该是这个https://www.grouplens.org/datasets/movielens/ ,下载了个几M的数据集,跑了下,偏差是
0.926020408163265,这个结果并不理想。
书中这里介绍了其他的推荐程序,org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommeder 不过很遗憾我用的mahout0.9并没有在impl.recommender中找到slopeone。