Hadoop从入门到精通38：使用MRUnit进行MapReduce的单元测试

Hadoop提供了类似于JUnit的MapReduce单元测试模块MRUnit，借助于该模块可以对MapReduce程序的各个部分单独进行测试，而不必每次都将完整的程序打包到服务器上运行。

本节用到的介质：

mrunit-1.1.0-hadoop2.jar 提取码：ftjw
apache-mrunit-1.1.0-hadoop2-bin.tar.gz 提取码：3q2n

1.环境准备

1.1 下载依赖的jar包

(1)下载MapReduce依赖的jar包：

$HADOOP_HOME/share/hadoop/common
$HADOOP_HOME/share/hadoop/common/lib
$HADOOP_HOME/share/hadoop/mapreduce
$HADOOP_HOME/share/hadoop/mapreduce/lib

(2)下载单元测试依赖的jar包：

mrunit-1.1.0-hadoop2.jar
apache-mrunit-1.1.0-hadoop2-bin\lib

1.2 将下载好的所有jar包加入到工程的buildpath

(1)在工程目录下新建一个lib目录，将所有jar复制粘贴进去；
(2)选择lib中的所有jar包，右键，Build Path，Add to Build Path

1.3将冲突的jar从工程的Build Path中移除

在Referenced Libraries中找到mockito-all-1.8.5.jar，右键，Build Path，Remove from Build Path

2.测试代码

案例：以WordCount程序为例，来分别对Mapper、Reducer、Job进行单元测试。

//WordCountMapper.java
package demo.mrunit;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper {
    @Override
    protected void map(LongWritable key1, Text value1, Context context) 
      throws IOException, InterruptedException {
        //得到从HDFS读入的数据: I love Beijing
        String str = value1.toString();
        //分词
        String[] words = str.split(" ");
        //输出到Reducer：元组对：(I,1)
        for(String w:words){
            // k2:单词 v2:记一次数
            context.write(new Text(w), new LongWritable(1));
        }
    }
}

//WordCountReducer.java
package demo.mrunit;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer{
    @Override
    protected void reduce(Text k3, Iterable v3,Context context) 
      throws IOException, InterruptedException {
        //对v3集合中的元素进行求和
        long total = 0;
        for(LongWritable v:v3){
            total = total + v.get();
        }
        //输出:
        context.write(k3, new LongWritable(total));
    }
}

//WordCountUnitTest.java
package demo.mrunit;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.junit.Test;
public class WordCountUnitTest {
  @Test
  public void testMapper() throws Exception{
    //设置一个环境变量：hadoop home
    System.setProperty("hadoop.home.dir", "E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //测试Mapper
    //创建一个Map对象
    WordCountMapper mapper = new WordCountMapper();
    //创建一个Mapper的Driver
    MapDriver driver = new MapDriver<>(mapper);
    //指定Map的输入的数据
    driver.withInput(new LongWritable(1), new Text("I love Beijing"));//---> 指定输入的数据：k1和v1
    //指定Map输出的数据：k2 和v2 -----> 期望得到的数据
    driver.withOutput(new Text("I"),new LongWritable(1))
          .withOutput(new Text("love"), new LongWritable(1))
          .withOutput(new Text("Beijing"), new LongWritable(1));
    //执行单元测试：对比期望的数据和实际运行的结果
    driver.runTest();
  }
  @Test
  public void testReduce() throws Exception{
    //设置一个环境变量：hadoop home
    System.setProperty("hadoop.home.dir", "E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //测试Reducer
    //创建一个Reducer的对象
    WordCountReducer reducer = new WordCountReducer();
    //创建一个Reducer的Driver
    ReduceDriver driver = new ReduceDriver<>(reducer);
    //指定Reducer输入的数据:  k3   v3(集合)
    //构造v3
    List v3 = new ArrayList<>();
    //往v3中添加v2
    v3.add(new LongWritable(1));
    v3.add(new LongWritable(1));
    v3.add(new LongWritable(1));
    //指定reducer的输入的数据
    driver.withInput(new Text("Beijing"), v3); //---> 模拟map的输出送到了reduder
    //指定reducer输出的数据 -----> 指定：期望的结果
    driver.withOutput(new Text("Beijing"), new LongWritable(3));
    //执行测试
    driver.runTest();
  }
  @Test
  public void testJob() throws Exception{
    //设置一个环境变量：hadoop home
    System.setProperty("hadoop.home.dir", "E:\\hadoop-2.4.1\\hadoop-2.4.1");
    //把WordCountMapper和WordCountReducer作为一个job进行测试
    //创建测试的对象
    WordCountMapper mapper = new WordCountMapper();
    WordCountReducer reducer = new WordCountReducer();
    //创建一个driver
    //MapReduceDriver
    MapReduceDriver
       driver = new MapReduceDriver<>(mapper,reducer);
    //指定map输入的数据
    driver.withInput(new LongWritable(1), new Text("I love Beijing"))
          .withInput(new LongWritable(4), new Text("I love China"))
          .withInput(new LongWritable(7), new Text("Beijing is the capital of China"));
    //需要排序
    driver.withOutput(new Text("Beijing"), new LongWritable(2))
          .withOutput(new Text("China"), new LongWritable(2))
          .withOutput(new Text("I"), new LongWritable(2))
          .withOutput(new Text("capital"), new LongWritable(1))
          .withOutput(new Text("is"), new LongWritable(1))
          .withOutput(new Text("love"), new LongWritable(2))
          .withOutput(new Text("of"), new LongWritable(1))
          .withOutput(new Text("the"), new LongWritable(1));
    //执行测试
    driver.runTest();
  }
}

关于使用MRUnit的几点说明：

需要创建待测试的Mapper、Reducer的对象；
将待测试Mapper或者Reducer对象传给相应的驱动器；
指定驱动器的输入数据，withInput可以连续使用，但是要注意和withOutput的顺序保持一致；
指定驱动器的预想输出结果，withOutput可以连续使用，但是要注意和withInput的顺序保持一致；
指定驱动器的预想输出结果还要考虑MapReduce内部的排序；
执行单元测试：driver.runTest();
执行MRUnit就是执行JUnit，很简单；
如果单元测试通过，即驱动器根据输入的数据计算得到的输出数据，和指定的预想输出结果一致，程序给出绿色的进度条；
如果单元测试失败，即驱动器根据输入的数据计算得到的输出数据，和指定的预想输出结果不一致，则程序会打印出相应的错误信息，根据错误信息可以修改MapReduce程序或测试用例。