map阶段动态获取CombineTextInputFormat各输入文件路径

老mr程序中map中conf的map.input.file参数只能获取获取CombineTextInputFormat的第一个输入文件,而新版mr程序则连第一个输入文件也无法获取,这是因为createRecordReader中的TaskAttemptContext context参数与map中的context参数不是一个对象。

解决方案:

如果需要动态获取Combine的输入文件,可以扩展CombineTextInputFormat,重写createRecordReader方法,从中获取context的Configuration对象

示例:

 1 public class MyCombineTextInputFormat extends CombineTextInputFormat {
 2    private static Configuration conf;
 3    
 4     public static Configuration conf() {
 5         return conf;
 6     }
 7    
 8    @Override
 9     public RecordReader<LongWritable, Text> createRecordReader(InputSplit split, TaskAttemptContext context) throws IOException,InterruptedException{
10         conf = context.getConfiguration();
11         return super.createRecordReader(split, context);
12     }
13 }
14 
15 class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
16     @Override
17     public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
18           Configuration conf = MyCombineTextInputFormat.conf();
19           String path_str = conf.get("map.input.file", "");
20     }
21 }

 

你可能感兴趣的:(map阶段动态获取CombineTextInputFormat各输入文件路径)