配置:
maven:下载,配置,用于在Mahout目录mvn install 编译mahout
eclipse:导入jars,编译测试例子
hadoop:分布式
Mahout:下载,配置 /etc/profile
推荐系统实例:
1. 新建Java工程,新建Class Test
2. 参考:http://blog.csdn.net/aidayei/article/details/6626699
package org.apache.mahout.fpm.pfpgrowth;
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
import java.io.*;
import java.util.*;
public class Test {
private Test(){};
public static void main (String args[])throws Exception{
// step:1 构建模型 2 计算相似度 3 查找k紧邻 4 构造推荐引擎
DataModel model =new FileDataModel(new File("/usr/hadoop/testdata/cf.txt"));//文件名一定要是绝对路径
UserSimilarity similarity =new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =new NearestNUserNeighborhood(2,similarity,model);
Recommender recommender= new GenericUserBasedRecommender(model,neighborhood,similarity);
List<RecommendedItem> recommendations =recommender.recommend(1, 2);//为用户1推荐两个ItemID
for(RecommendedItem recommendatf0 \}
数据准备:test.txt
第一列为UserID ,第二列为ItemID,第三列为Preference Value 即评分
1,101,5
1,102,3
1,103,2.5
2,101,2
2,102,2.5
2,103,5
2,104,2
3,101,2.5
3,104,4
3,105,4.5
3,107,5
4,101,5
4,103,3
4,104,4.5
4,106,4
5,101,4
5,102,3
5,103,2
5,104,4
5,105,3.5
5,106,4
输出:
RecommendedItem[item:104, value:4.257081]
RecommendedItem[item:106, value:4.0]
资源参考
1.配置:http://blog.csdn.net/chjshan55/article/details/5923646。 师兄:http://hi.baidu.com/czb_xyls/blog/item/76019d02cfa3cd101c95833a.html
2.测试,读取HDFS中的文件,原来是序列化到HDFS,要用命令读出:bin/mahout vectordump --seqFile /user/hadoopuser/output/data/part-00000
问题:
1.用什么命令将HDFS中的文件读取到本地文件中
2.算法内部是如何进行的?推荐系统并行化如何处理的?