课程链接:https://www.imooc.com/video/15790
代码链接: https://github.com/SkillyZ/java-spring/tree/master/skilly-hadoop
Hadoop的一些java访问接口编程步骤等:https://www.cnblogs.com/zhangyinhua/p/7678704.html#_lab2_1_1
MapReduce步骤(ItemCF)
1、 根据用户行为列表构建评分矩阵。
输入:用户ID,物品ID,分值
输出:物品ID (行)一用户ID (列)_分值
2、 利用评分矩阵,构建物品与物品的相似度矩阵。
输入:步骤1的输出
缓存:步骤1的输出
(输出和缓存是相同的文件)
输出:物品ID (行)一物品ID (列)_相似度
3、 将评分矩阵转置
输入:步骤1的输出
输出:用户ID (行)一物品ID (列)一分值
4、 物品与物品相似度矩阵X评分矩阵(经过步骤3转罝)
输入:步骤2的输出
缓存:步骤3的输出
输出:物品ID (行)一用户ID (列)一分值
5、 根据评分矩阵,将步骤4的输出中,用户己经有过行为的商品评分置0
输入:步骤4的输出
缓存:步骤1的输出
输出:用户ID (行)一物品ID (列)一分值(最终的推荐列表)
@Override
public void map(LongWritable key, Text text, Context context) throws IOException, InterruptedException {
String[] rowAndLine = text.toString().split("\t");
String rowMatrix1 = rowAndLine[0];
String[] columnValues1 = rowAndLine[1].split(",");
//计算左侧矩阵行的空间距离
double denomination1 = 0;
for (String columnValue : columnValues1) {
String score = columnValue.split("_")[1];
denomination1 += Double.valueOf(score) * Double.valueOf(score);
}
denomination1 = Math.sqrt(denomination1);
for(String line : cacheList) {
String rowMatrix2 = line.toString().split("\t")[0];
String[] columnValues2 = line.toString().split("\t")[1].split(",");
double denomination2 = 0;
for (String columnValue : columnValues2) {
String score = columnValue.split("_")[1];
denomination2 += Double.valueOf(score) * Double.valueOf(score);
}
denomination2 = Math.sqrt(denomination2);
//矩阵相乘
int numerator = 0;
//遍历左侧矩阵每一列
for (String columnValue:columnValues1) {
String column1 = columnValue.split("_")[0];
String value1 = columnValue.split("_")[1];
//遍历右侧矩阵每一列
for (String column2:columnValues2) {
if (column2.startsWith(column1 + "_")) {
String value2 = column2.split("_")[1];
//相加
numerator += Integer.valueOf(value1) * Integer.valueOf(value2);
}
}
}
double cos = numerator / (denomination1 * denomination2);
if (cos == 0) {
continue;
}
outKey.set(rowMatrix1);
outValue.set(rowMatrix2 + "_" + decimalFormat.format(cos));
context.write(outKey, outValue);
}
}
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] rowAndLine = value.toString().split("\t");
String row = rowAndLine[0];
String[] lines = rowAndLine[1].split(",");
for (int i = 0; i < lines.length; i++) {
String column = lines[i].split("_")[0];
String valueStr = lines[i].split("_")[1];
outKey.set(column);
outValue.set(row + "_" + valueStr);
context.write(outKey, outValue);
}
}
@Override
public void map(LongWritable key, Text text, Context context) throws IOException, InterruptedException {
String[] rowAndLine = text.toString().split("\t");
String rowMatrix1 = rowAndLine[0];
String[] columnValues1 = rowAndLine[1].split(",");
for(String line : cacheList) {
String rowMatrix2 = line.toString().split("\t")[0];
String[] columnValues2 = line.toString().split("\t")[1].split(",");
//矩阵相乘
double result = 0;
//遍历左侧矩阵每一列
for (String columnValue:columnValues1) {
String column1 = columnValue.split("_")[0];
String value1 = columnValue.split("_")[1];
//遍历右侧矩阵每一列
for (String column2:columnValues2) {
if (column2.startsWith(column1 + "_")) {
String value2 = column2.split("_")[1];
//相加
result += Double.valueOf(value1) * Double.valueOf(value2);
}
}
}
if (result == 0) {
continue;
}
outKey.set(rowMatrix1);
outValue.set(rowMatrix2 + "_" + decimalFormat.format(result));
context.write(outKey, outValue);
}
}
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String itemMatrix1 = value.toString().split("\t")[0];
String[] userScoreArrayMatrix1 = value.toString().split("\t")[1].split(",");
for (String line : cacheList) {
String itemMatrix2 = line.toString().split("\t")[0];
String[] userScoreArrayMatrix2 = line.toString().split("\t")[1].split(",");
//如果物品id相同
if (itemMatrix1.equalsIgnoreCase(itemMatrix2)) {
//遍历matrix1列
for (String userScoreMatrix1 : userScoreArrayMatrix1) {
boolean flag = false;
String userMatrix1 = userScoreMatrix1.split("_")[0];
String scoreMatrix1 = userScoreMatrix1.split("_")[1];
//遍历matrix2列
for (String userScoreMatrix2 : userScoreArrayMatrix2) {
String userMatrix2 = userScoreMatrix2.split("_")[0];
if (userMatrix1.equalsIgnoreCase(userMatrix2)) {
flag = true;
}
}
if (!flag) {
outKey.set(userMatrix1);
outValue.set(itemMatrix1 + "_" + scoreMatrix1);
}
}
}
}
context.write(outKey, outValue);
}
与基于物品的推荐算法基本上一样,只是先将用户作为行,物品做为列
基于内容的推荐算法:
物品特征建模
第一步:
第二步:
第三步:
第四步:
第五步:将部分值置0
代码步骤;