MapReduce:自定义RecordReader阅读器、自定义Partitioner分区器案例

需求

源文件中每行为一个数字,分别计算其中奇偶行数字之和

分析

默认的TextInputFormat会使Mapper接受到字符偏移量为K1,则需要自定义阅读器使K1为行号,在自定义分区器(也可以分组)根据行号将奇偶行分开进行累加

代码

阅读器:

public class MyRecordReader extends RecordReader {
	//分片开始的偏移量
	private long start; 
	//行号
	private long lineNum;
	//分片结束的偏移量,end = start + filesplit.getLength();
	private long end;   
	//行阅读器
	private LineReader in;
	private FSDataInputStream fileIn;
	private LongWritable key;
	private Text value;

	@Override
	public void initialize(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {
		FileSplit filesplit = (FileSplit) split;
		Path file = filesplit.getPath();
		start = filesplit.getStart();
		end = start+filesplit.getLength();
		Configuration conf = context.getConfiguration();
		FileSystem fs = file.getFileSystem(conf);
		fileIn =fs.open(file);
		fileIn.seek(start);
		in = new LineReader(fileIn);
		lineNum = 1;
	}

	@Override
	public boolean nextKeyValue() throws IOException, InterruptedException {
		if (key ==null) {
			key = new LongWritable();
		}
		key.set(lineNum);
		if(value == null) {
			value = new Text();
		}
		//一行一行地读
		if(in.readLine(value)==0) {
			return false;
		}
		//行号加1
		lineNum++;
		return true;
	}

	@Override
	public LongWritable getCurrentKey() throws IOException, InterruptedException {
		return key;
	}

	@Override
	public Text getCurrentValue() throws IOException, InterruptedException {
		return value;
	}

	@Override
	public float getProgress() throws IOException, InterruptedException {
		return 0;
	}

	@Override
	public void close() throws IOException {
		in.close();

	}

}

输入格式:

public class MyTextInputFormat extends FileInputFormat {

	@Override
	public RecordReader createRecordReader(InputSplit split, TaskAttemptContext context)
			throws IOException, InterruptedException {
		return new MyRecordReader();
	}

	@Override
	protected boolean isSplitable(JobContext context, Path filename) {
		return false;
	}

}

Mapper为默认

分区器:

public class MyPartitioner extends Partitioner{

	@Override
	public int getPartition(LongWritable key, Text value, int numPartitions) {
	    if(key.get()%2==0) {
	    	key.set(2);
	    	return 0;
	    }else {
	    	key.set(1);
	    	return 1;
	    }
	}

}

Reducer:

public class MyReducer extends Reducer{
	
	private int sum = 0;
	
	@Override
	protected void reduce(LongWritable key, Iterable vs,
			Context context) throws IOException, InterruptedException {
			for(Text t:vs) {
				sum+=Integer.parseInt(t.toString());
			}
			Text t = new Text();
			t.set(sum+"");
			context.write(key, t);
	}
	
}

 

 

你可能感兴趣的:(Hadoop)