FileInputFormat.setInputPaths的执行原理

今天在看 MapReduce 源码的时候看了一下 FileInputFormat 的 setInputPaths 方法,内容如下:

  /**
   * Set the array of {@link Path}s as the list of inputs
   * for the map-reduce job.
   * 
   * @param job The job to modify 
   * @param inputPaths the {@link Path}s of the input directories/files 
   * for the map-reduce job.
   */ 
  public static void setInputPaths(Job job, 
                                   Path... inputPaths) throws IOException {
    Configuration conf = job.getConfiguration();
    Path path = inputPaths[0].getFileSystem(conf).makeQualified(inputPaths[0]);
    StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
    for(int i = 1; i < inputPaths.length;i++) {
    
      // final public static String COMMA_STR = ",";
      // 用逗号分隔每个输入路径
      str.append(StringUtils.COMMA_STR);
      path = inputPaths[i].getFileSystem(conf).makeQualified(inputPaths[i]);
      str.append(StringUtils.escapeString(path.toString()));
    }
    
    // 向Configuration中设置了如下key-value对
    //   public static final String INPUT_DIR = 
    // "mapreduce.input.fileinputformat.inputdir";
    conf.set(INPUT_DIR, str.toString());
  }

发现其实调用 setInputPaths 方法就是向 Configuration 中设置了一个 key-value对。于是就猜测在编写 MapReduce 的驱动程序的时候可以不用调用这个方法,直接向 Configuration 中设置相应的 key-value 对可以达到同样的效果,Driver 代码做如下修改:

package practice.top1_wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Driver {

	public static void main(String[] args) throws Exception {
		
		args = new String[]{"D:/hello.txt",
			"D:/friends.txt",
			"D:/output"};
		
		// 1.设置配置信息
		Configuration conf = new Configuration();
		conf.set(FileInputFormat.INPUT_DIR, args[0] + "," + args[1]);
		conf.set(FileOutputFormat.OUTDIR, args[2]);
		
		Job job = Job.getInstance(conf);
		
		// 2.设置jar加载路径
		job.setJarByClass(Driver.class);
		
		// 3.设置mapper
		job.setMapperClass(WordCountMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		
		// 4.设置reducer
		job.setReducerClass(WordCountReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		// 5.设置输入和输出路径
//		FileInputFormat.setInputPaths(job, new Path(args[0]));
//		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		// 6.提交
		boolean result = job.waitForCompletion(true);
		System.exit(result ? 0 : 1);
		
	}
}

运行成功,FileOutputFormat 也是同样的道理。

关注我的微信公众号,观看更多精彩内容:

你可能感兴趣的:(【hadoop】)