hadoop版本:
$ hadoop version Hadoop 0.20.2-cdh3u4 Subversion git://ubuntu-slave01/var/lib/jenkins/workspace/CDH3u4-Full-RC/build/cdh3/hadoop20/0.20.2-cdh3u4/source -r 214dd731e3bdb687cb55988d3f47dd9e248c5690 Compiled by jenkins on Mon May 7 13:01:39 PDT 2012 From source with checksum a60c9795e41a3248b212344fb131c12c
根据版本的不同采用的实现写法略有不同,此处采用的版本详情如下:
<dependency> <groupId>org.apache.mrunit</groupId> <artifactId>mrunit</artifactId> <version>1.0.0</version> <classifier>hadoop1</classifier> </dependency>
其中常用的类如下:
org.apache.hadoop.mrunit.mapreduce.MapDriver; org.apache.hadoop.mrunit.mapreduce.MapReduceDriver; org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
mapper,combiner和reducer实现的含义描述如下:
CompMapper:把222-333##id1##id2 处理成key为id1##id2,value为1L(出现一次) CompCombiner:对相同的key进行累加 CompReducer:吧key为id1##id2,value为long类型的数据进行累加然后除以某一个定值,以double的形式输出
测试mapper,combiner和reducer的代码如下
private MapDriver<Text, LongWritable, Text, LongWritable> mapDriver; private ReduceDriver<Text, LongWritable, Text, DoubleWritable> reduceDriver; private ReduceDriver<Text, LongWritable, Text, LongWritable> combinerDriver; private MapReduceDriver<Text, LongWritable, Text, LongWritable, Text, LongWritable> mapCombinerDriver; private MapReduceDriver<Text, LongWritable, Text, LongWritable, Text, DoubleWritable> mapReducerDriver; @Before public void setUp() { CompMapper mapper = new CompMapper(); CompCombiner combiner = new CompCombiner(); CompReducer reducer = new CompReducer(); mapDriver = new MapDriver<Text, LongWritable, Text, LongWritable>(mapper); reduceDriver = new ReduceDriver<Text, LongWritable, Text, DoubleWritable>(reducer); combinerDriver = new ReduceDriver<Text, LongWritable, Text, LongWritable>(combiner); mapCombinerDriver = new MapReduceDriver<Text, LongWritable, Text, LongWritable, Text, LongWritable>( mapper, combiner); mapReducerDriver = new MapReduceDriver<Text, LongWritable, Text, LongWritable, Text, DoubleWritable>( mapper, reducer); } @Test public void testMapper() throws IOException { mapDriver.setInput(new Text("222-333##id1##id2"), new LongWritable(1L)); mapDriver.withOutput(new Text("id1##id2"), new LongWritable(1L)); mapDriver.runTest(); } @Test public void testCombiner() throws IOException { List<LongWritable> values = new ArrayList<LongWritable>(); for (int i = 0; i < 5; i++) { values.add(new LongWritable(NumberUtils.toLong(i + ""))); } combinerDriver.addInput(new Text("id1##id2"), values); combinerDriver.withOutput(new Text("id1##id2"), new LongWritable(10L)); combinerDriver.runTest(); } @Test public void testReducer() throws IOException { List<LongWritable> values = new ArrayList<LongWritable>(); long count = 0; for (int i = 0; i < 5; i++) { count = count + (long) i; values.add(new LongWritable(NumberUtils.toLong(i + ""))); } reduceDriver.addInput(new Text("id1##id2"), values); int numHash = reduceDriver.getConfiguration().getInt( MinhashOptionCreator.NUM_HASH_FUNCTIONS, 10); DoubleWritable dw = new DoubleWritable(); BigDecimal b1 = new BigDecimal(count); BigDecimal b2 = new BigDecimal(numHash); dw.set(b1.divide(b2).doubleValue()); reduceDriver.withOutput(new Text("id1##id2"), dw); reduceDriver.runTest(); } @Test public void tetMapCombiner() throws IOException { mapCombinerDriver.addInput(new Text("222-333##id1##id2"), new LongWritable(1L)); mapCombinerDriver.addInput(new Text("111-333##id1##id2"), new LongWritable(1L)); mapCombinerDriver.withOutput(new Text("id1##id2"), new LongWritable(2L)); mapCombinerDriver.runTest(); } @Test public void tetMapReducer() throws IOException { mapReducerDriver.addInput(new Text("222-333##id1##id2"), new LongWritable(1L)); mapReducerDriver.addInput(new Text("111-333##id1##id2"), new LongWritable(1L)); int numHash = reduceDriver.getConfiguration().getInt( "NUM", 10); DoubleWritable dw = new DoubleWritable(); BigDecimal b1 = new BigDecimal(2L); BigDecimal b2 = new BigDecimal(numHash); dw.set(b1.divide(b2).doubleValue()); mapReducerDriver.withOutput(new Text("id1##id2"), dw); mapReducerDriver.runTest(); }
注意事宜:
1.MRUnit与Hadoop的版本对应关系 2.如果报java.lang.IncompatibleClassChangeError错那么就是版本的问题