MapReduce 顺序组合, 迭代式,组合式,链式

1、顺序组合式

顺序组合式就是按照指定顺序执行任务如:mapreduce1 --> mapreduce2 --> mapreduce3

即:mapreduce1的输出是mapreduce2的输入,mapreduce2的输出式mapreduce3的输入

代码片段如下:

Java代码   收藏代码
  1. String inPath1 = "hdfs://hadoop0:9000/user/root/3D/";  
  2.         String outPath1 = "hdfs://hadoop0:9000/user/root/3DZout/";  
  3.         String outPath2 = "hdfs://hadoop0:9000/user/root/3DZout2/";  
  4.         String outPath3 = "hdfs://hadoop0:9000/user/root/3DZout3/";   
  5.           
  6.         // job1配置  
  7.         Job job1 = Job.getInstance(conf);  
  8.         job1.setJarByClass(Mode.class);  
  9.         job1.setMapperClass(Map1.class);  
  10.         job1.setReducerClass(Reduce1.class);  
  11.         job1.setMapOutputKeyClass(Text.class);  
  12.         job1.setMapOutputValueClass(IntWritable.class);  
  13.         job1.setOutputKeyClass(Text.class);  
  14.         job1.setOutputValueClass(IntWritable.class);  
  15.         FileInputFormat.addInputPath(job1, new Path(inPath1));  
  16.         FileOutputFormat.setOutputPath(job1, new Path(outPath1));  
  17.         job1.waitForCompletion(true);  
  18.           
  19.         // job2配置  
  20.         Job job2 = Job.getInstance(conf);  
  21.         job2.setJarByClass(Mode.class);  
  22.         job2.setMapperClass(Map2.class);  
  23.         job2.setReducerClass(Reduce2.class);  
  24.         job2.setMapOutputKeyClass(Text.class);  
  25.         job2.setMapOutputValueClass(IntWritable.class);  
  26.         job2.setOutputKeyClass(Text.class);  
  27.         job2.setOutputValueClass(IntWritable.class);  
  28.         FileInputFormat.addInputPath(job2, new Path(inPath1));  
  29.         FileOutputFormat.setOutputPath(job2, new Path(outPath2));  
  30.         job2.waitForCompletion(true);  
  31.           
  32.         // job3配置  
  33.         Job job3 = Job.getInstance(conf);  
  34.         job3.setJarByClass(Mode.class);  
  35.         job3.setMapperClass(Map3.class);  
  36.         job3.setReducerClass(Reduce3.class);  
  37.         job3.setMapOutputKeyClass(Text.class);  
  38.         job3.setMapOutputValueClass(IntWritable.class);  
  39.         job3.setOutputKeyClass(Text.class);  
  40.         job3.setOutputValueClass(IntWritable.class);  
  41.         FileInputFormat.addInputPath(job3, new Path(outPath2));  
  42.         FileOutputFormat.setOutputPath(job3, new Path(outPath3));  
  43.         job3.waitForCompletion(true);  

子任务作业配置代码运行后,将按照顺序逐个执行每个子任务作业。由于后一个子任务需要使用前一个子任务的输出数据,因此,每一个子任务

都需要等前一个子任务执行执行完毕后才允许执行,这是通过job.waitForCompletion(true)方法加以保证的。

2、迭代组合式

迭代也可以理解为for循环或while循环,当满足某些条件时,循环结束

mapreduce的迭代算法正在研究中,后续提供完整源码....

代码如下:


3、复杂的依赖组合式

处理复杂的要求的时候,有时候一个mapreduce程序完成不了,往往需要多个mapreduce程序 这个时候就牵扯到各个任务之间的依赖关系,

所谓依赖就是一个M/R job的处理结果是另外一个M/R的输入,以此类推,

这里的顺序是 job1 和 job2 单独执行, job3依赖job1和job2执行后的结果

代码如下:

Java代码   收藏代码
  1. package com.hadoop.mapreduce;  
  2.   
  3. import java.io.IOException;  
  4. import java.util.StringTokenizer;  
  5.   
  6. import org.apache.hadoop.conf.Configuration;  
  7. import org.apache.hadoop.fs.FileSystem;  
  8. import org.apache.hadoop.fs.Path;  
  9. import org.apache.hadoop.io.IntWritable;  
  10. import org.apache.hadoop.io.Text;  
  11. import org.apache.hadoop.mapreduce.Job;  
  12. import org.apache.hadoop.mapreduce.Mapper;  
  13. import org.apache.hadoop.mapreduce.Reducer;  
  14. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
  15. import org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob;  
  16. import org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl;  
  17. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
  18. import org.apache.hadoop.util.GenericOptionsParser;  
  19.   
  20. public class Mode {  
  21.   
  22.     // 第一个Job  
  23.     public static class Map1 extends Mapper{  
  24.         Text word = new Text();  
  25.         @Override  
  26.         protected void map(Object key, Text value,Context context)  
  27.                 throws IOException, InterruptedException {  
  28.             StringTokenizer st = new StringTokenizer(value.toString());  
  29.             while(st.hasMoreTokens()){  
  30.                 word.set(st.nextToken());  
  31.                 context.write(word, new IntWritable(1));  
  32.             }  
  33.         }  
  34.           
  35.     }  
  36.       
  37.     public static class Reduce1 extends Reducer{  
  38.         IntWritable result = new IntWritable();  
  39.         @Override  
  40.         protected void reduce(Text key, Iterable values,Context context)  
  41.                 throws IOException, InterruptedException {  
  42.             int sum = 0;  
  43.             for(IntWritable val : values){  
  44.                 sum += val.get();  
  45.             }  
  46.             result.set(sum);  
  47.             context.write(key, result);  
  48.         }  
  49.           
  50.     }  
  51.       
  52.     // 第二个Job  
  53.     public static class Map2 extends Mapper{  
  54.         Text word = new Text();  
  55.         @Override  
  56.         protected void map(Object key, Text value,Context context)  
  57.                 throws IOException, InterruptedException {  
  58.             StringTokenizer st = new StringTokenizer(value.toString());  
  59.             while(st.hasMoreTokens()){  
  60.                 word.set(st.nextToken());  
  61.                 context.write(word, new IntWritable(1));  
  62.             }  
  63.         }  
  64.     }  
  65.       
  66.     public static class Reduce2 extends Reducer{  
  67.         IntWritable result = new IntWritable();  
  68.         @Override  
  69.         protected void reduce(Text key, Iterable values,Context context)  
  70.                 throws IOException, InterruptedException {  
  71.             int sum = 0;  
  72.             for(IntWritable val : values){  
  73.                 sum += val.get();  
  74.             }  
  75.             result.set(sum);  
  76.             context.write(key, result);  
  77.         }  
  78.     }  
  79.       
  80.     // 第三个Job  
  81.     public static class Map3 extends Mapper{  
  82.         Text word = new Text();  
  83.         @Override  
  84.         protected void map(Object key, Text value,Context context)  
  85.                 throws IOException, InterruptedException {  
  86.             StringTokenizer st = new StringTokenizer(value.toString());  
  87.             while(st.hasMoreTokens()){  
  88.                 word.set(st.nextToken());  
  89.                 context.write(word, new IntWritable(1));  
  90.             }  
  91.         }  
  92.     }  
  93.       
  94.     public static class Reduce3 extends Reducer{  
  95.         IntWritable result = new IntWritable();  
  96.         @Override  
  97.         protected void reduce(Text key, Iterable values,Context context)  
  98.                 throws IOException, InterruptedException {  
  99.             int sum = 0;  
  100.             for(IntWritable val : values){  
  101.                 sum += val.get();  
  102.             }  
  103.             result.set(sum);  
  104.             context.write(key, result);  
  105.         }  
  106.     }  
  107.       
  108.       
  109.     public static void main(String[] args) throws IOException{  
  110.         String inPath1 = "hdfs://hadoop0:9000/user/root/3D/";  
  111.         String outPath1 = "hdfs://hadoop0:9000/user/root/3DZout/";  
  112.         String outPath2 = "hdfs://hadoop0:9000/user/root/3DZout2/";  
  113.         String outPath3 = "hdfs://hadoop0:9000/user/root/3DZout3/";  
  114.         String[] inOut = {inPath1, outPath1};  
  115.         Configuration conf = new Configuration();  
  116.         String[] otherArgs = new GenericOptionsParser(conf, inOut).getRemainingArgs();  
  117.         if (otherArgs.length < 2) {  
  118.             System.err.println("Usage: wordcount  [...] ");  
  119.             System.exit(2);  
  120.         }  
  121.         // 判断输出路径是否存在,如存在先删除  
  122.         FileSystem hdfs = FileSystem.get(conf);  
  123.         Path findFile = new Path(outPath1);  
  124.         boolean isExists = hdfs.exists(findFile);  
  125.         if(isExists){  
  126.             hdfs.delete(findFile, true);  
  127.         }  
  128.         if(hdfs.exists(new Path(outPath2))){  
  129.             hdfs.delete(new Path(outPath2), true);  
  130.         }  
  131.         if(hdfs.exists(new Path(outPath3))){  
  132.             hdfs.delete(new Path(outPath3), true);  
  133.         }         
  134.           
  135.         // job1配置  
  136.         Job job1 = Job.getInstance(conf);  
  137.         job1.setJarByClass(Mode.class);  
  138.         job1.setMapperClass(Map1.class);  
  139.         job1.setReducerClass(Reduce1.class);  
  140.         job1.setMapOutputKeyClass(Text.class);  
  141.         job1.setMapOutputValueClass(IntWritable.class);  
  142.         job1.setOutputKeyClass(Text.class);  
  143.         job1.setOutputValueClass(IntWritable.class);  
  144.         FileInputFormat.addInputPath(job1, new Path(inPath1));  
  145.         FileOutputFormat.setOutputPath(job1, new Path(outPath1));  
  146.         // 将job1加入控制容器  
  147.         ControlledJob ctrljob1 = new ControlledJob(conf);  
  148.         ctrljob1.setJob(job1);  
  149.           
  150.         // job2配置  
  151.         Job job2 = Job.getInstance(conf);  
  152.         job2.setJarByClass(Mode.class);  
  153.         job2.setMapperClass(Map2.class);  
  154.         job2.setReducerClass(Reduce2.class);  
  155.         job2.setMapOutputKeyClass(Text.class);  
  156.         job2.setMapOutputValueClass(IntWritable.class);  
  157.         job2.setOutputKeyClass(Text.class);  
  158.         job2.setOutputValueClass(IntWritable.class);  
  159.         FileInputFormat.addInputPath(job2, new Path(inPath1));  
  160.         FileOutputFormat.setOutputPath(job2, new Path(outPath2));  
  161.         // 将job2加入控制容器  
  162.         ControlledJob ctrljob2 = new ControlledJob(conf);  
  163.         ctrljob2.setJob(job2);  
  164.           
  165.         // job3配置  
  166.         Job job3 = Job.getInstance(conf);  
  167.         job3.setJarByClass(Mode.class);  
  168.         job3.setMapperClass(Map3.class);  
  169.         job3.setReducerClass(Reduce3.class);  
  170.         job3.setMapOutputKeyClass(Text.class);  
  171.         job3.setMapOutputValueClass(IntWritable.class);  
  172.         job3.setOutputKeyClass(Text.class);  
  173.         job3.setOutputValueClass(IntWritable.class);  
  174.         FileInputFormat.addInputPath(job3, new Path(outPath2));  
  175.         FileOutputFormat.setOutputPath(job3, new Path(outPath3));  
  176.         ControlledJob ctrljob3 = new ControlledJob(conf);  
  177.         // 设置job3依赖job1和job2  
  178.         ctrljob3.addDependingJob(ctrljob1);   
  179.         ctrljob3.addDependingJob(ctrljob2);  
  180.         ctrljob3.setJob(job3);  
  181.           
  182.           
  183.         // 主控制器  
  184.         JobControl jobCtrl = new JobControl("myctrl");  
  185.         jobCtrl.addJob(ctrljob1);  
  186.         jobCtrl.addJob(ctrljob2);  
  187.         jobCtrl.addJob(ctrljob3);  
  188.           
  189.         // 在启动线程,记住一定要有这个  
  190.         Thread t = new Thread(jobCtrl);  
  191.         t.start();  
  192.           
  193.         while(true){  
  194.             // 如果作业全部完成,就打印成功作业的信息  
  195.             if(jobCtrl.allFinished()){  
  196.                 System.out.println(jobCtrl.getSuccessfulJobList());  
  197.                 jobCtrl.stop();  
  198.                 break;  
  199.             }  
  200.         }  
  201.     }  
  202.       
  203. }  


3、链式组合式

所谓连式MapReduce就是用多个Mapper处理任务,最后用一个Reducer输出结果,注意和迭代式和组合式MapReduce的不同之处

一个MapReduce作业可能会有一些前处理和后处理步骤,将这些前后处理步骤以单独的MapReduce任务实现也可以达到目的,但由于

增加了多个MapReduce作业,将增加整个作业的处理周期,而且还会增加很多I/O操作,因此处理效率不高。

Hadoop为此提供了专门的链式Mapper(ChainMapper)和链式Reducer(ChainReducer)来完成这种处理。

ChainMapper允许在一个单一Map任务中添加和使用多个Map子任务;而ChainReducer则允许在一个单一Reduce任务执行了Reduce处理

后,继续使用多个Map子任务完成一些后续处理。

Java代码   收藏代码
  1. package com.hadoop.mapreduce;  
  2.   
  3. import java.io.IOException;  
  4. import java.util.StringTokenizer;  
  5.   
  6. import org.apache.hadoop.conf.Configuration;  
  7. import org.apache.hadoop.fs.FileSystem;  
  8. import org.apache.hadoop.fs.Path;  
  9. import org.apache.hadoop.io.IntWritable;  
  10. import org.apache.hadoop.io.LongWritable;  
  11. import org.apache.hadoop.io.Text;  
  12. import org.apache.hadoop.mapreduce.Job;  
  13. import org.apache.hadoop.mapreduce.Mapper;  
  14. import org.apache.hadoop.mapreduce.Reducer;  
  15. import org.apache.hadoop.mapreduce.lib.chain.ChainMapper;  
  16. import org.apache.hadoop.mapreduce.lib.chain.ChainReducer;  
  17. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
  18. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
  19. import org.apache.hadoop.util.GenericOptionsParser;  
  20.   
  21. public class Chain {  
  22.   
  23.     // 第一个Job  
  24.     public static class Map1 extends Mapper{  
  25.         Text word = new Text();  
  26.         @Override  
  27.         protected void map(LongWritable key, Text value,Context context)  
  28.                 throws IOException, InterruptedException {  
  29.             StringTokenizer st = new StringTokenizer(value.toString());  
  30.             while(st.hasMoreTokens()){  
  31.                 word.set(st.nextToken());  
  32.                 context.write(word, new IntWritable(1));  
  33.             }  
  34.         }  
  35.           
  36.     }  
  37.       
  38.     public static class Reduce1 extends Reducer{  
  39.         IntWritable result = new IntWritable();  
  40.         @Override  
  41.         protected void reduce(Text key, Iterable values,Context context)  
  42.                 throws IOException, InterruptedException {  
  43.             int sum = 0;  
  44.             for(IntWritable val : values){  
  45.                 sum += val.get();  
  46.             }  
  47.             result.set(sum);  
  48.             context.write(key, result);  
  49.         }  
  50.           
  51.     }  
  52.       
  53.     // 第二个Job  
  54.     public static class Map2 extends Mapper{  
  55.         Text word = new Text();  
  56.         @Override  
  57.         protected void map(Text key, IntWritable value,Context context)  
  58.                 throws IOException, InterruptedException {  
  59.             StringTokenizer st = new StringTokenizer(value.toString());  
  60.             while(st.hasMoreTokens()){  
  61.                 word.set(st.nextToken());  
  62.                 context.write(word, new IntWritable(1));  
  63.             }  
  64.         }  
  65.     }  
  66.       
  67.     public static class Reduce2 extends Reducer{  
  68.         IntWritable result = new IntWritable();  
  69.         @Override  
  70.         protected void reduce(Text key, Iterable values,Context context)  
  71.                 throws IOException, InterruptedException {  
  72.             int sum = 0;  
  73.             for(IntWritable val : values){  
  74.                 sum += val.get();  
  75.             }  
  76.             result.set(sum);  
  77.             context.write(key, result);  
  78.         }  
  79.     }  
  80.       
  81.     // 第三个Job  
  82.     public static class Map3 extends Mapper{  
  83.         Text word = new Text();  
  84.         @Override  
  85.         protected void map(Text key, IntWritable value,Context context)  
  86.                 throws IOException, InterruptedException {  
  87.             StringTokenizer st = new StringTokenizer(value.toString());  
  88.             while(st.hasMoreTokens()){  
  89.                 word.set(st.nextToken());  
  90.                 context.write(word, new IntWritable(1));  
  91.             }  
  92.         }  
  93.     }  
  94.       
  95.     public static class Reduce3 extends Reducer{  
  96.         IntWritable result = new IntWritable();  
  97.         @Override  
  98.         protected void reduce(Text key, Iterable values,Context context)  
  99.                 throws IOException, InterruptedException {  
  100.             int sum = 0;  
  101.             for(IntWritable val : values){  
  102.                 sum += val.get();  
  103.             }  
  104.             result.set(sum);  
  105.             context.write(key, result);  
  106.         }  
  107.     }  
  108.       
  109.       
  110.     public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{  
  111.         String inPath1 = "hdfs://hadoop0:9000/user/root/input/";  
  112.         String outPath1 = "hdfs://hadoop0:9000/user/root/3DZout/";  
  113.         String outPath2 = "hdfs://hadoop0:9000/user/root/3DZout2/";  
  114.         String outPath3 = "hdfs://hadoop0:9000/user/root/3DZout3/";  
  115.         String[] inOut = {inPath1, outPath1};  
  116.         Configuration conf = new Configuration();  
  117.         String[] otherArgs = new GenericOptionsParser(conf, inOut).getRemainingArgs();  
  118.         if (otherArgs.length < 2) {  
  119.             System.err.println("Usage: wordcount  [...] ");  
  120.             System.exit(2);  
  121.         }  
  122.         // 判断输出路径是否存在,如存在先删除  
  123.         FileSystem hdfs = FileSystem.get(conf);  
  124.         Path findFile = new Path(outPath1);  
  125.         boolean isExists = hdfs.exists(findFile);  
  126.         if(isExists){  
  127.             hdfs.delete(findFile, true);  
  128.         }  
  129.         if(hdfs.exists(new Path(outPath2))){  
  130.             hdfs.delete(new Path(outPath2), true);  
  131.         }  
  132.         if(hdfs.exists(new Path(outPath3))){  
  133.             hdfs.delete(new Path(outPath3), true);  
  134.         }         
  135.           
  136.         // job1配置  
  137.         Job job1 = Job.getInstance(conf);  
  138.         job1.setJarByClass(Chain.class);  
  139.         job1.setJobName("ChainJob");  
  140.           
  141.         FileInputFormat.addInputPath(job1, new Path(inPath1));  
  142.         FileOutputFormat.setOutputPath(job1, new Path(outPath1));  
  143.           
  144.         // 连式编程要注意的是,可以有多个个Mapper,且后面Mapper的输入是是上一个Mapper的输出,最后一个Mapper的输出是Reducer的输入,  
  145.         // 但全局只有一个Reducer  
  146.         ChainMapper.addMapper(job1, Map1.class, LongWritable.class, Text.class, Text.class, IntWritable.class, conf);  
  147.         ChainMapper.addMapper(job1, Map2.class, Text.class, IntWritable.class, Text.class, IntWritable.class, conf);  
  148.         ChainMapper.addMapper(job1, Map3.class, Text.class, IntWritable.class, Text.class, IntWritable.class, conf);  
  149.           
  150.         // 执行顺序 map1 --> map2 --> map3 --> reduce1  
  151.         ChainReducer.setReducer(job1, Reduce1.class, Text.class, IntWritable.class, Text.class, IntWritable.class, conf);  
  152.           
  153.         job1.waitForCompletion(true);  
  154.           
  155.           
  156.     }  
  157.       
  158. }  

你可能感兴趣的:(mapreduce)