MapReduce的基于物品的协同过滤算法实现

文章目录

        • 写在前面
        • 步骤一:根据已知用户行为列表计算用户、物品的评分矩阵
          • 输入
          • 输出
          • 代码实现
            • Mapper类实现逻辑(Map1.java)
            • Reducer类实现逻辑(Red1.java)
            • Run主类实现逻辑(Run1.java)
        • 步骤二:根据用户、物品的评分矩阵得到物品与物品的相似度矩阵
          • 输入
          • 缓存
          • 输出
          • 代码实现
            • Mapper类实现逻辑(Map2.java)
            • Reducer类实现逻辑(Red2.java)
            • Run主类实现逻辑(Run2.java)
        • 步骤三:将用户、物品的评分矩阵转置
          • 输入
          • 输出
          • 代码实现
            • Mapper类实现逻辑(Map3.java)
            • Reducer类实现逻辑(Red3.java)
            • Run主类实现逻辑(Run3.java)
        • 步骤四:物品与物品的相似度矩阵 x 用户、物品的评分矩阵 = 伪推荐列表
          • 输入
          • 缓存
          • 输出
          • 代码实现
            • Mapper类实现逻辑(Map4.java)
            • Reducer类实现逻辑(Red4.java)
            • Run主类实现逻辑(Run4.java)
        • 步骤五:把伪推荐列表中用户之前有过行为的元素置0
          • 输入
          • 缓存
          • 输出
          • 代码实现
            • Mapper类实现逻辑(Map5.java)
            • Reducer类实现逻辑(Red5.java)
            • Run主类实现逻辑(Run5.java)

写在前面

关于基于物品的协同过滤算法的算法图解请看这篇blog:推荐系统----基于物品的协同过滤,关于MapReduce的基于物品的协同过滤算法的代码实现请看下面。别看下面代码这么多,其实大部分都是靠复制粘贴搞定的,其算法的代码核心逻辑实现我觉得在于矩阵的转置和矩阵的乘法运算,步骤二涉及到的是矩阵乘法运算,步骤三涉及到的是矩阵转置,步骤四涉及到的是矩阵乘法运算。关于MapReduce关于矩阵转置和矩阵乘法运算的代码实现可以戳这篇blog:MapReduce实现矩阵乘法

步骤一:根据已知用户行为列表计算用户、物品的评分矩阵

输入

用户行为列表

其中用户行为点击1分,搜索3分,收藏5分,付款10分

用户 物品 行为
A 1 1
C 3 5
B 2 3
B 5 3
B 6 5
A 2 10
C 3 10
C 4 5
C 1 5
A 1 1
A 6 5
A 4 3

输入文件是这样的,实际就是上面的用户行为列表

MapReduce的基于物品的协同过滤算法实现_第1张图片

输出

用户、物品的评分矩阵

步骤一的目的即是从上面已知的用户的行为列表计算得到如下用户、物品的评分矩阵

A B C
1 2.0 0.0 5.0
2 10.0 3.0 0.0
3 0.0 0.0 15.0
4 3.0 0.0 5.0
5 0.0 3.0 0.0
6 5.0 5.0 0.0

得到输出文件是这样的,实际就是上面的用户、物品的评分矩阵

MapReduce的基于物品的协同过滤算法实现_第2张图片

代码实现
Mapper类实现逻辑(Map1.java)
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map1 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
   
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        
        //matrix row number
        String[] values = value.toString().split(",");
        String userID = values[0];
        String itemID = values[1];
        String score = values[2];
        
        outKey.set(itemID);
        outValue.set(userID + "_" + score);
        
        context.write(outKey, outValue);
    }
}
Reducer类实现逻辑(Red1.java)
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red2 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
    	String itemID = key.toString();
    	
    	Map<String,Integer> map = new HashMap<String,Integer>();
    	
    	for(Text value:values){
    		String userID = value.toString().split("_")[0];
    		String score = value.toString().split("_")[1];
    		if(map.get(userID) == null){
    			map.put(userID, Integer.valueOf(score));
    		}else{
    			Integer preScore = map.get(userID);
    			map.put(userID, preScore + Integer.valueOf(score));
    		}
    	}
    	
        StringBuilder sBuilder = new StringBuilder();
        for(Map.Entry<String,Integer> entry : map.entrySet()) {
            String userID = entry.getKey();
            String score = String.valueOf(entry.getValue());
            sBuilder.append(userID + "_" + score + ",");
        }
        String line = null;
        if(sBuilder.toString().endsWith(",")) {
            line = sBuilder.substring(0,sBuilder.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }
}

Run主类实现逻辑(Run1.java)
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run1 {
    private static String inPath = "/user/hadoop/user_matrix.txt";
    
    private static String outPath = "/user/hadoop/Tuser_matrix.txt";
    
    private static String hdfs ="hdfs://Master:9000";
    
    public int run() {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step1");
            
            job.setJarByClass(Run1.class);
            job.setMapperClass(Map1.class);
            job.setReducerClass(Red1.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            
            return job.waitForCompletion(true)?1:-1;
        
        }catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return -1;
    }
    public static void main(String[] args) {
        int result = -1;
        result = new Run1().run();
        if(result==1) {
            System.out.println("step1 success...");
        }else if(result==-1) {
            System.out.println("step1 failed...");
        }
    }
}

步骤二:根据用户、物品的评分矩阵得到物品与物品的相似度矩阵

输入

用户、物品的评分矩阵(即是步骤一的输出)

请自行翻阅上面步骤一的输出:用户、物品的评分矩阵

缓存

用户、物品的评分矩阵

输入和缓存是同样的文件

输出

物品与物品的相似度矩阵

1 2 3 4 5 6
1 1.0 0.36 0.93 0.99 0 0.26
2 0.36 1.0 0 0.49 0.29 0.88
3 0.93 0 1.0 0.86 0 0
4 0.99 0.49 0.86 1.0 0 0.36
5 0 0.29 0 0 1.0 0.71
6 0.26 0.88 0 0.36 0.71 1.0

步骤二一部分做矩阵乘法,计算两行向量的点乘和,一部分计算两行向量的平方和开方的加和,两行向量的点乘和做分子,两行向量的平方和开方的加和做分母,得到两个物品的相似度

其输出文件实际上在HDFS文件中的表示是这样的

MapReduce的基于物品的协同过滤算法实现_第3张图片

代码实现
Mapper类实现逻辑(Map2.java)
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map2 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
    private DecimalFormat df = new DecimalFormat("0.00");
    
    private List<String> cacheList = new ArrayList<String>();
   
    protected void setup(Context context)
            throws IOException, InterruptedException {
        super.setup(context);
        FileReader fr = new FileReader("itemsource1");
        BufferedReader br = new BufferedReader(fr);
        String line = null;
        while((line=br.readLine())!=null) {
            cacheList.add(line);
        }
        fr.close();
        br.close();
    }

    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        String row_matrix1 = value.toString().split("\t")[0];
        String[] column_value_array_matrix1 = value.toString().split("\t")[1].split(",");
        
        double denominator1 = 0;
        for(String column_value:column_value_array_matrix1){
        	String score = column_value.split("_")[1];
        	denominator1 += Double.valueOf(score)*Double.valueOf(score);
        	
        }
        denominator1 = Math.sqrt(denominator1);
        
        for(String line:cacheList) {
            String row_matrix2 = line.toString().split("\t")[0];
            String[] column_value_array_matrix2 = line.toString().split("\t")[1].split(",");
            
            double denominator2 = 0;
            for(String column_value:column_value_array_matrix2){
            	String score = column_value.split("_")[1];
            	denominator2 += Double.valueOf(score)*Double.valueOf(score);
            	
            }
            denominator2 = Math.sqrt(denominator2);
            
            
            int numberator = 0;
            for(String column_value_matrix1:column_value_array_matrix1) {
                String column_matrix1 = column_value_matrix1.split("_")[0];
                String value_matrix1 = column_value_matrix1.split("_")[1];
                
                for(String column_value_matrix2:column_value_array_matrix2) {
                    if(column_value_matrix2.startsWith(column_matrix1 + "_")) {
                        String value_matrix2 = column_value_matrix2.split("_")[1];
                        numberator += Integer.valueOf(value_matrix1) *Integer.valueOf(value_matrix2); 
                    }
                }
            }
            double cos = numberator / (denominator1*denominator2);
            if(cos == 0){
            	continue;
            }
            outKey.set(row_matrix1);
            outValue.set(row_matrix2+"_"+df.format(cos));
            context.write(outKey, outValue);
        }
    }
}

Reducer类实现逻辑(Red2.java)
import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red2 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    } 
}
Run主类实现逻辑(Run2.java)
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run2 {
    private static String inPath = "/user/hadoop/output/Tuser_matrix.txt";
    private static String outPath = "/user/hadoop/output/step2_output.txt";
    private static String cache = "/user/hadoop/output/Tuser_matrix.txt/part-r-00000";  
    private static String hdfs ="hdfs://Master:9000";   
    public int run() throws URISyntaxException {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step2");
            
            job.addCacheArchive(new URI(cache+"#itemsource1"));
            
            job.setJarByClass(Run2.class);
            job.setMapperClass(Map2.class);
            job.setReducerClass(Red2.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            System.out.println("111111...");
            return job.waitForCompletion(true)?1:-1;
        
        } catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch(URISyntaxException e) {
            e.printStackTrace();
        }
        return -1;                                                      
    }
    public static void main(String[] args) {
        try {
            int result=-1;
            result = new Run2().run();
        
            if(result == 1) {
                System.out.println("step2 success...");
            }
            else if(result == -1){
                System.out.println("step2 failed...");
            }
        } catch (URISyntaxException e) {
            e.printStackTrace();
        }
    }
}

步骤三:将用户、物品的评分矩阵转置

输入

步骤一的输出,即用户、物品的评分矩阵

输出

步骤一的输出的转置矩阵,即用户、物品的评分矩阵的转置矩阵

其输出文件实际上在HDFS文件中的表示是这样的

MapReduce的基于物品的协同过滤算法实现_第4张图片

代码实现
Mapper类实现逻辑(Map3.java)
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map3 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
    
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        String[] rowAndLine = value.toString().split("\t");
        
        String row = rowAndLine[0];
        String[] lines = rowAndLine[1].split(",");

        for(int i=0;i<lines.length;i++) {
            String column = lines[i].split("_")[0];
            String valueStr = lines[i].split("_")[1];
            //key:column value:rownumber_value
            outKey.set(column);
            outValue.set(row+"_"+valueStr);
            context.write(outKey, outValue);
        }
    }
}
Reducer类实现逻辑(Red3.java)
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red3 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values,Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }   
}
Run主类实现逻辑(Run3.java)
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run3 {
    private static String inPath = "/user/hadoop/output/Tuser_matrix.txt";
    private static String outPath = "/user/hadoop/output/step3_output.txt";
    private static String hdfs ="hdfs://Master:9000";
    
    public int run() {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step3");
            
            job.setJarByClass(Run3.class);
            job.setMapperClass(Map3.class);
            job.setReducerClass(Red3.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            FileOutputFormat.setOutputPath(job, outputPath);
            return job.waitForCompletion(true)?1:-1;
        
        }catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return -1;
    }
    public static void main(String[] args) {
        int result = -1;
        result = new Run3().run();
        if(result==1) {
            System.out.println("step3 success...");
        }else if(result==-1) {
            System.out.println("step3 failed...");
        }
    }
}

步骤四:物品与物品的相似度矩阵 x 用户、物品的评分矩阵 = 伪推荐列表

输入

步骤二的输出,即物品与物品的相似度矩阵

缓存

步骤三的输出,即用户、物品的评分矩阵的转置矩阵

输出

伪推荐列表

A B C
1 9.9 2.4 23.9
2 16.6 8.3 4.3
3 4.4 0 24.0
4 11.7 3.3 22.9
5 6.5 7.4 0
6 15.4 9.8 3.1

其输出文件实际上在HDFS文件中的表示是这样的

MapReduce的基于物品的协同过滤算法实现_第5张图片

貌似我们在这步即可得到最终的推荐列表,但其实程序走到这一步还不够,因为我们无法真正地依据上面的伪推荐列表决策出该给哪个用户推荐哪些商品,我们还需把伪推荐列表中用户曾对物品有过行为的物品相关推荐系数置0(即用户曾经购买或者收藏点击过的商品我们不予推荐),至于这一步怎么实现请看步骤五。

代码实现
Mapper类实现逻辑(Map4.java)
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map4 extends Mapper<LongWritable, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue =new Text();
    private DecimalFormat df = new DecimalFormat("0.00");
    private List<String> cacheList = new ArrayList<String>();
    
    protected void setup(Context context)
            throws IOException, InterruptedException {
        super.setup(context);
        FileReader fr = new FileReader("itemsource2");
        BufferedReader br = new BufferedReader(fr);
        String line = null;
        while((line=br.readLine())!=null) {
            cacheList.add(line);
        }
        fr.close();
        br.close();
    }

    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
            throws IOException, InterruptedException {
        String row_matrix1 = value.toString().split("\t")[0];
        String[] column_value_array_matrix1 = value.toString().split("\t")[1].split(",");
         
        for(String line:cacheList) {
            String row_matrix2 = line.toString().split("\t")[0];
            String[] column_value_array_matrix2 = line.toString().split("\t")[1].split(",");
            
            double numberator = 0;
            for(String column_value_matrix1:column_value_array_matrix1) {
                String column_matrix1 = column_value_matrix1.split("_")[0];
                String value_matrix1 = column_value_matrix1.split("_")[1];
                
                for(String column_value_matrix2:column_value_array_matrix2) {
                    if(column_value_matrix2.startsWith(column_matrix1 + "_")) {
                        String value_matrix2 = column_value_matrix2.split("_")[1];
                        numberator += Double.valueOf(value_matrix1) *Integer.valueOf(value_matrix2); 
                    }
                }
            }

            outKey.set(row_matrix1);
            outValue.set(row_matrix2+"_"+df.format(numberator));
            context.write(outKey, outValue);
        }
    }
}
Reducer类实现逻辑(Red4.java)
import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red4 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }   
}
Run主类实现逻辑(Run4.java)
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run4 {
    private static String inPath = "/user/hadoop/output/step2_output.txt";
    private static String outPath = "/user/hadoop/output/step4_output.txt";
    private static String cache = "/user/hadoop/output/step3_output.txt/part-r-00000";
    private static String hdfs ="hdfs://Master:9000";
       
    public int run() throws URISyntaxException {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step4");
            
            job.addCacheArchive(new URI(cache+"#itemsource2"));
            
            job.setJarByClass(Run4.class);
            job.setMapperClass(Map4.class);
            job.setReducerClass(Red4.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            System.out.println("111111...");
            return job.waitForCompletion(true)?1:-1;
        
        } catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch(URISyntaxException e) {
            e.printStackTrace();
        }
        return -1;                                                      
    }
    public static void main(String[] args) {
        try {
            int result=-1;
            result = new Run4().run();
        
            if(result == 1) {
                System.out.println("step4 success...");
            }
            else if(result == -1){
                System.out.println("step4 failed...");
            }
        } catch (URISyntaxException e) {
            e.printStackTrace();
        }
    }
}

步骤五:把伪推荐列表中用户之前有过行为的元素置0

输入

步骤四的输出,即伪推荐列表

缓存

步骤一的输出,即用户、物品的评分矩阵

输出

最终的推荐列表

1 2 3 4 5 6
A 0 0 4.44 0 6.45 0
B 2.38 0 0 3.27 0 0
C 0 4.25 0 0 0 3.10

其输出文件实际上在HDFS文件中的表示是这样的

MapReduce的基于物品的协同过滤算法实现_第6张图片

代码实现
Mapper类实现逻辑(Map5.java)
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;


public class Map5 extends Mapper<LongWritable, Text, Text, Text> {
	private Text outKey = new Text();
	private Text outValue = new Text();

	private List<String> cacheList = new ArrayList<String>();
	private DecimalFormat df = new DecimalFormat("0.00");

	@Override
	protected void setup(Context context) throws IOException, InterruptedException {
		super.setup(context);

		FileReader fr = new FileReader("itemsource3");
		BufferedReader br = new BufferedReader(fr);

		String line = null;
		while ((line = br.readLine()) != null) {
			cacheList.add(line);
		}

		br.close();
		fr.close();
	}

	@Override
	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { ;
		String item_matrix1 = value.toString().split("\t")[0];
		String[] user_score_array_matrix1 = value.toString().split("\t")[1].split(",");

		for (String line : cacheList) {
			String item_matrix2 = line.toString().split("\t")[0];
			String[] user_score_array_matrix2 = line.toString().split("\t")[1].split(",");

			//物品ID相同
			if (item_matrix1.equals(item_matrix2)) {
				for (String user_score_matrix1 : user_score_array_matrix1) {
					boolean flag = false;
					String user_matrix1 = user_score_matrix1.split("_")[0];
					String score_matrix1 = user_score_matrix1.split("_")[1];

					for (String user_score_matrix2 : user_score_array_matrix2) {
						String user_matrix2 = user_score_matrix2.split("_")[0];
						if (user_matrix1.equals(user_matrix2)) {
							flag = true;
						}
					}

					if (false == flag) {
						outKey.set(user_matrix1);
						outValue.set(item_matrix1 + "_" + score_matrix1);
						context.write(outKey, outValue);
					}
				}
			}
		}
	}
}
Reducer类实现逻辑(Red5.java)
import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red5 extends Reducer<Text, Text, Text, Text>{
    private Text outKey = new Text();
    private Text outValue = new Text();
    
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        StringBuilder sb = new StringBuilder();
        for(Text text:values) {
            sb.append(text+",");
        }
        String line = null;
        if(sb.toString().endsWith(",")) {
            line = sb.substring(0,sb.length()-1);
        }
        
        outKey.set(key);
        outValue.set(line);
        
        context.write(outKey, outValue);
    }   
}
Run主类实现逻辑(Run5.java)
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Run5 {
    private static String inPath = "/user/hadoop/output/step4_output.txt";
    private static String outPath = "/user/hadoop/output/step5_output.txt";
    private static String cache = "/user/hadoop/output/Tuser_matrix.txt/part-r-00000";
    private static String hdfs ="hdfs://Master:9000";
       
    public int run() throws URISyntaxException {
        try {
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", hdfs);
            Job job = Job.getInstance(conf,"step5");
            
            job.addCacheArchive(new URI(cache+"#itemsource3"));
            
            job.setJarByClass(Run5.class);
            job.setMapperClass(Map5.class);
            job.setReducerClass(Red5.class);
            
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            
            FileSystem fs = FileSystem.get(conf);
            Path inputPath = new Path(inPath);
            if(fs.exists(inputPath)) {
                FileInputFormat.addInputPath(job, inputPath);
            }
            
            Path outputPath = new Path(outPath);
            fs.delete(outputPath,true);
            
            FileOutputFormat.setOutputPath(job, outputPath);
            System.out.println("111111...");
            return job.waitForCompletion(true)?1:-1;
        
        } catch(IOException e) {
            e.printStackTrace();
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch(URISyntaxException e) {
            e.printStackTrace();
        }
        return -1;                                                      
    }
    public static void main(String[] args) {
        try {
            int result=-1;
            result = new Run5().run();
        
            if(result == 1) {
                System.out.println("step5 success...");
            }
            else if(result == -1){
                System.out.println("step5 failed...");
            }
        } catch (URISyntaxException e) {
            e.printStackTrace();
        }
    }
}

写到这里我们的运行jar包里已经有5个MapReduce程序,15个类了,感兴趣的人可以试试把它们集成起来放到一个类里运行,这里就不演示了。

你可能感兴趣的:(Hadoop)