MapReduce一旦打包提交到分布式环境,如果出了问题,需要要定位调试,然后再打包发布。如果在发布MapRduce之前其做单元测试,消除明显的代码bug和逻辑错误,可以提高开发效率。
MRUnit是一款由Couldera公司开发的专门针对Hadoop中编写MapReduce单元测试的框架。可以用MapDriver单独测试Map,用ReduceDriver单独测试Reduce,用MapReduceDriver测试MapReduce作业。(Apache MRUnit ™ is a Java library that helps developers unit test Apache Hadoop map reduce jobs.)
下面使用MRUnit对Hadoop开发周期(二):编写mapper和reducer程序 中的单词字数统计例子进行单元测试。在进行单元测试前需要导入MRUnit的jar包。单元测试代码如下:
package cn.com.yz.test; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mrunit.mapreduce.MapDriver; import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver; import org.apache.hadoop.mrunit.mapreduce.ReduceDriver; import org.apache.hadoop.mrunit.types.Pair; import org.junit.Assert; import org.junit.Before; import org.junit.Test; import cn.com.yz.mapreduce.WordCountMapper; import cn.com.yz.mapreduce.WordCountReducer; public class WordCountMapperReducerTest { MapDriver<Object, Text, Text, IntWritable> mapDriver; ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; MapReduceDriver<Object, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver; @Before public void setUp() { WordCountMapper mapper = new WordCountMapper(); WordCountReducer reducer = new WordCountReducer(); mapDriver = MapDriver.newMapDriver(mapper); reduceDriver = ReduceDriver.newReduceDriver(reducer); mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer); }// end setUp() @Test public void testMapper() { String line = "Google coorperates with IBM in cloud area"; mapDriver.withInput(new Object(), new Text(line)); mapDriver.withOutput(new Text("Google"), new IntWritable(1)) .withOutput(new Text("coorperates"), new IntWritable(1)) .withOutput(new Text("with"), new IntWritable(1)) .withOutput(new Text("IBM"), new IntWritable(1)) .withOutput(new Text("in"), new IntWritable(1)) .withOutput(new Text("cloud"), new IntWritable(1)) .withOutput(new Text("area"), new IntWritable(1)); mapDriver.runTest(); }// end testMapper() @Test public void testReducer() { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(1)); reduceDriver.withInput(new Text("Google"), values); reduceDriver.withOutput(new Text("Google"), new IntWritable(2)); reduceDriver.runTest(); }// end testReducer() public void testMapperReducer() throws IOException{ String line = "Google uses Map Reduce Model."; List<Pair<Text, IntWritable>> out=null; List<Pair> expected=new ArrayList<Pair>(); mapReduceDriver.withInput(new Object(), new Text(line)); out=mapReduceDriver.run(); expected.add(new Pair(new Text("Google"), new IntWritable(1))); expected.add(new Pair(new Text("uses"), new IntWritable(1))); expected.add(new Pair(new Text("Map"), new IntWritable(1))); expected.add(new Pair(new Text("Reduce"), new IntWritable(1))); expected.add(new Pair(new Text("Model"), new IntWritable(1))); Assert.assertEquals(expected, out); }//end testMapperReducer() }
测试结果如下
常见错误解决:
错误:
java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskInputOutputContext, but interface was expected
解决办法
changed the jar from mrunit-0.9.0-incubating-hadoop2.jar(which is suggested in Tutorial[hadoop-1.0.4]) to mrunit-0.9.0-incubating-hadoop1.jar and now it works well.