http://archive.apache.org/dist/mahout/
因为我用的是Ubuntu,所以下载的是
<mahout.version>0.9</mahout.version>
当然啦,mahout的系统变量也是早就加好了的啊!
然后就是在依赖项里边把需要加进去的包写上,依赖项写在<dependencies> /<dependencies>之间。
每个依赖项用<dependency></dependency>包含起来,
包括groupId,artifactId,version,scope四个属性
</pre><pre name="code" class="html"><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>yjj</groupId> <artifactId>maven-mahout</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>maven-mahout</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <mahout.version>0.9</mahout.version> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-core</artifactId> <version>0.9</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-integration</artifactId> <version>0.9</version> <scope>compile</scope> <exclusions> <exclusion> <artifactId>jetty</artifactId> <groupId>org.mortbay.jetty</groupId> </exclusion> <exclusion> <artifactId>cassandra-all</artifactId> <groupId>org.apache.cassandra</groupId> </exclusion> <exclusion> <artifactId>hector-core</artifactId> <groupId>me.prettyprint</groupId> </exclusion> </exclusions> </dependency> </dependencies> </project>
1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0
在项目的src/main/java包下面新建个类UserCF.java,用来编写基于用户协同过滤的推荐。
user id1 5.000000 3.000000 2.500000 0.000000 0.000000 0.000000 0.000000 user id2 2.000000 2.500000 5.000000 2.000000 0.000000 0.000000 0.000000 user id3 2.500000 0.000000 0.000000 4.000000 4.500000 0.000000 5.000000 user id4 5.000000 0.000000 3.000000 4.500000 0.000000 4.000000 0.000000 user id5 4.000000 3.000000 2.000000 4.000000 3.500000 4.000000 0.000000
public class UserCF { final static int NEIGHBORHOOD_NUM = 3; final static int RECOMMENDER_NUM = 2; public static void main(String[] args) throws IOException, TasteException { String file = "item.csv";//数据文件 DataModel model = new FileDataModel(new File(file));//创建数据模型 UserSimilarity user = new EuclideanDistanceSimilarity(model);//采用欧式距离计算相似度 NearestNUserNeighborhood neighbor = new NearestNUserNeighborhood(NEIGHBORHOOD_NUM, user, model);//设定最近邻选取个数 UserBasedRecommender r = new GenericUserBasedRecommender(model, neighbor, user);//生成基于用户的推荐模型 LongPrimitiveIterator iter = model.getUserIDs(); //遍历每个用户 while (iter.hasNext()) { long uid = iter.nextLong(); List<RecommendedItem> list = r.recommend(uid, RECOMMENDER_NUM);//为用户uid生成RECOMMENDER_NUM个推荐 System.out.printf("uid:%s", uid);//输出推荐物品id以及预测uid对它的喜好度 for (RecommendedItem ritem : list) { System.out.printf("(%s,%f)", ritem.getItemID(), ritem.getValue()); } System.out.println(); } }
uid:1(106,4.000000)(104,3.853723) uid:2(105,4.055916) uid:3(106,4.000000)(103,3.258251) uid:4(105,3.884381)(102,3.000000) uid:5
double[][] a = new double[5][7];//首先创建一个5*7的二维数组,这样创建好以后数组中各个元素自动就初始化为了0哦! //从DataModel里边得到UserIDs和ItemIds都是以LongPrimitiveIterator的形式保存的,之后用迭代器的方式一一访问。 LongPrimitiveIterator iterUser = model.getUserIDs(); while (iterUser.hasNext()) { long uid = iterUser.nextLong(); FastIDSet items = model.getItemIDsFromUser(uid); long[] lItems = items.toArray(); for (int i = 0; i < lItems.length; i++){ int itemid = (int)lItems[i]; a[(int) uid - 1][(itemid-100-1)] = model.getPreferenceValue(uid, itemid); } } for (int i = 0; i < 5; i++){ System.out.print("user id" + (i+1) + " "); for (int j = 0; j < 7; j++){ System.out.printf("%f\t", a[i][j]); } System.out.println(); }
public static double getDis(double[][] a, int id1, int id2){ double ret = 0; double num = 0; for (int i = 0; i < 7; i++){ double d1 = a[id1][i], d2 = a[id2][i]; if (d1 != 0 && d2 != 0){ double dSub = d1 - d2; ret += dSub * dSub; num += 1.0; } } if (num != 0) return (Math.sqrt(ret/num)); else return Double.MAX_VALUE; }之后计算5个用户之间的距离并输出
double[][] dis = new double[5][5]; for (int i = 0; i < 5; i++){ for (int j = 0; j < 5; j++){ dis[i][j] = getDis(a, i, j); System.out.printf("%f\t", dis[i][j]); } System.out.println(); }得到结果如下:
0.000000 2.273030 2.500000 0.353553 0.645497 2.273030 0.000000 1.457738 2.533114 2.076656 2.500000 1.457738 0.000000 1.802776 1.040833 0.353553 2.533114 1.802776 0.000000 0.750000 0.645497 2.076656 1.040833 0.750000 0.000000为了验证下结果,用推荐模型来看看它对每个用户给出的最近邻情况是否满足上面情况
for (int i = 1; i <= 5; i++){ long[] users = r.mostSimilarUserIDs(i, 4); System.out.println(users[0] + ":" + users[1] + ":" + users[2] + ":" + users[3]); }结果为
4:5:2:3 3:5:1:4 5:2:4:1 1:5:3:2 1:4:3:2与上面的距离矩阵的结果正好相符哦!