Mapreduce之多表关联Join---(附例子)

需求:

 

address.txt:  

1    Beijing
2    Guangzhou
3    Shenzhen
4    Xian    
factory.txt:
Beijing Red Star    1
Shenzhen Thunder    3
Guangzhou Honda    2
Beijing Rising    1
Guangzhou Development Bank    2
Tencent    3
Back of Beijing    1

结果:
             factory                   city

             Beijing Red Star    Beijing
             Shenzhen Thunder    Shenzhen
             Guangzhou Honda    Guangzhou
             Beijing Rising    Beijing
             Guangzhou Development Bank    Guangzhou
              Tencent    Shenzhen
             Back of Beijing    Beijing

分析:
              map函数:------<1,“1 ,Beijing Red Star ”>  ,  <1, "1,Back of Beijing">  , <1,"0,Beijing">
              reducer函数:----<1,[ " 1 ,Beijing Red Star " ,"1,Back of Beijing", "0,Beijing" ]

             关联查询需要一个标识位,我们需要利用标识位去得到所对应的value值。
1.Mapper.class
 

public class JoinMapper extends Mapper {

	 @Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
		 String line=value.toString();
		 FileSplit inputsplit=(FileSplit)context.getInputSplit();
		 String filename= inputsplit.getPath().toString();//得到文件路径名
		 if(line.contains("address.txt")| line.contains("factory.txt")){
			 return ;
		 }
		 String[] _str=line.split("\t");  //切分
		 if(filename.endsWith("address.txt")){
			 context.write(new Text(_str[0]), new Text("1,"+_str[1]));
		 }else{         //   key作为标识位                  //标识位
			 context.write(new Text(_str[1]),new Text("0,"+_str[0]));
		 }
	 }
}

2.Reducer。class

public class JoinReduce extends Reducer {
	@Override
	protected void setup(Context context)
			throws IOException, InterruptedException {
			 context.write(new Text("工厂名"),new Text("城市"));
	}//只执行一次
	@Override
	protected void reduce(Text key, Iterable values, Context context)
			throws IOException, InterruptedException {
		 ArrayList left=new ArrayList();
		 ArrayList right=new ArrayList();
		for(Text v:values){
			if(v.toString().contains("1")){
				left.add(v.toString().split(",")[1]);//city名
			}else{
				right.add(v.toString().split(",")[1]);//工厂名
			}
		} 
		for(int i=0;i

3.Driver.class

public class JoinDriver {
	public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException, URISyntaxException {
		Configuration conf = new Configuration();
		conf.set("mapred.job.queue.name", "order");
		 Path outfile = new Path("file:///D:/输出结果/joinout");
		FileSystem fs = outfile.getFileSystem(conf);
		if(fs.exists(outfile)){
			fs.delete(outfile,true);
		}
		Job job = Job.getInstance(conf);
		job.setJarByClass(JoinDriver.class);
		job.setJobName("Sencondary Sort");
		job.setMapperClass(JoinMapper.class);  
	    job.setReducerClass(JoinReduce.class);
	 
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		 
		FileInputFormat.addInputPath(job, new Path("file:///D:/测试数据/join连接/"));
		FileOutputFormat.setOutputPath(job,outfile);
		
		System.exit(job.waitForCompletion(true)?0:1);
	}
}

4.运行结果

address.txt:  

        1    Beijing
        2    Guangzhou
        3    Shenzhen
        4    Xian    
factory.txt:
        Beijing Red Star    1
        Shenzhen Thunder    3
        Guangzhou Honda    2
        Beijing Rising    1
        Guangzhou Development Bank    2
        Tencent    3
        Back of Beijing    1

 结果:
             factory             city

             Beijing Red Star    Beijing
             Shenzhen Thunder    Shenzhen
             Guangzhou Honda    Guangzhou
             Beijing Rising    Beijing
             Guangzhou Development Bank    Guangzhou
              Tencent    Shenzhen
             Back of Beijing    Beijing

总结:join解决表关联查询的时候,特别要锁定标识位,通常作为key,去比较筛选所得的value,最后context.write(),写出
 










 

你可能感兴趣的:(Hadoop,MapReduce)