一些算法的MapReduce实现——MapReduce Job的单元测试实例

暂时先放到这个系列里面吧,勿怪!!!

MRUnit:Hadoop Testing tool

mrunit,MapReduce的一个测试库,由Cloudera开发,集成了Junit的标准测试工具包和MapReduce的测试。
使用MRUnit能够很简单的对MapReduce的各个部分进行测试,它很好的把Map和Reduce分离,以便于我们分别对Map阶段和Reduce阶段进行逻辑测试。还可以测试整个MapReduce运行过程。这样就为我们开发人员节省了很多工作。加快开发进度

Setup of development environment

1、下载JUnit jar包
2、下载MRUint jar包,下载这个jar时要注意你所使用的Hadoop版本,现在分为hadoop1和hadoop2两个版本,1是针对于比较旧的hadoop版本——1.x.x等等,2是针对于新的版本,本测试使用的hadoop版本是1.2.0,使用的MRUnit是hadoop1

Hadoop Codes

已word count为例吧
package com.joey.mapred.wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;

public class WordCount extends Configured implements Tool {

   static public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
      final private static LongWritable ONE = new LongWritable(1);
      private Text tokenValue = new Text();

      @Override
      protected void map(LongWritable offset, Text text, Context context) throws IOException, InterruptedException {
         for (String token : text.toString().split("\\s+")) {
            tokenValue.set(token);
            context.write(tokenValue, ONE);
         }
      }
   }

   static public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
      private LongWritable total = new LongWritable();

      @Override
      protected void reduce(Text token, Iterable<LongWritable> counts, Context context)
            throws IOException, InterruptedException {
         long n = 0;
         for (LongWritable count : counts)
            n += count.get();
         total.set(n);
         context.write(token, total);
      }
   }

   public int run(String[] args) throws Exception {
      Configuration configuration = getConf();

      Job job = new Job(configuration, "Word Count");
      job.setJarByClass(WordCount.class);

      job.setMapperClass(WordCountMapper.class);
      job.setCombinerClass(WordCountReducer.class);
      job.setReducerClass(WordCountReducer.class);

      job.setInputFormatClass(TextInputFormat.class);
      job.setOutputFormatClass(TextOutputFormat.class);

      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(LongWritable.class);

      return job.waitForCompletion(true) ? 0 : -1;
   }

   public static void main(String[] args) throws Exception {
      System.exit(ToolRunner.run(new WordCount(), args));
   }
}



TestCase Codes


package com.joey.mapred.wordcount;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.junit.Before;
import org.junit.Test;

public class TestWordCount {
               
                /*We declare three variables for Mapper Driver , Reducer Driver , MapReduceDrivers
                Generics parameters for each of them is point worth noting
                MapDriver generics matches with our test Mapper generics

                SMSCDRMapper extends Mapper<LongWritable, Text, Text, IntWritable>
                Similarly for ReduceDriver we have same matching generics declaration with

                SMSCDRReducer extends Reducer<Text, IntWritable, Text, IntWritable>*/
               
   MapReduceDriver<LongWritable, Text, Text, LongWritable, Text, LongWritable> mapReduceDriver;
   MapDriver<LongWritable, Text, Text, LongWritable> mapDriver;
   ReduceDriver<Text, LongWritable, Text, LongWritable> reduceDriver;

  
   //create instances of our Mapper , Reducer .
   //Set the corresponding mappers and reducers using setXXX() methods
   @Before
   public void setUp() {
      WordCount.WordCountMapper mapper = new WordCount.WordCountMapper();
      WordCount.WordCountReducer reducer = new WordCount.WordCountReducer();
      mapDriver = new MapDriver<LongWritable, Text, Text, LongWritable>();
      mapDriver.setMapper(mapper);
      reduceDriver = new ReduceDriver<Text, LongWritable, Text, LongWritable>();
      reduceDriver.setReducer(reducer);
      mapReduceDriver = new MapReduceDriver<LongWritable, Text, Text, LongWritable, Text, LongWritable>();
      mapReduceDriver.setMapper(mapper);
      mapReduceDriver.setReducer(reducer);
   }

   @Test
   public void testMapper() throws IOException {
                   //gave one sample line input to the mapper
      mapDriver.withInput(new LongWritable(1), new Text("sky sky sky oh my beautiful sky"));
      //expected output for the mapper
      mapDriver.withOutput(new Text("sky"), new LongWritable(1));
      mapDriver.withOutput(new Text("sky"), new LongWritable(1));
      mapDriver.withOutput(new Text("sky"), new LongWritable(1));
      mapDriver.withOutput(new Text("oh"), new LongWritable(1));
      mapDriver.withOutput(new Text("my"), new LongWritable(1));
      mapDriver.withOutput(new Text("beautiful"), new LongWritable(1));
      mapDriver.withOutput(new Text("sky"), new LongWritable(1));
      //runTest() method run the Mapper test with input
      mapDriver.runTest();
   }

   @Test
   public void testReducer() throws IOException {
      List<LongWritable> values = new ArrayList<LongWritable>();
      values.add(new LongWritable(1));
      values.add(new LongWritable(1));
      reduceDriver.withInput(new Text("sky"), values);
      reduceDriver.withOutput(new Text("sky"), new LongWritable(2));
      reduceDriver.runTest();
   }

 @Test
   public void testMapReduce() throws IOException {
      mapReduceDriver.withInput(new LongWritable(1), new Text("sky sky sky"));
      mapReduceDriver.addOutput(new Text("sky"), new LongWritable(3));
   
      mapReduceDriver.runTest();
   }
}

Problem with MRUnit

1、runTest() 方法暂时给不了一些meaningful information on failure,所以最好用run()和assert
2、文档缺失
3、runXxx()方法调的是setup()方法,setup()方法是针对新版HadoopAPI而言的,不是old API
4、测试用例没有以分布式的形式执行

Reference

1、 http://www.eriwen.com/hadoop/testing-with-mrunit/
2、 http://www.javacodegeeks.com/2012/11/testing-hadoop-programs-with-mrunit.html
3、 http://www.infoq.com/articles/HadoopMRUnit
4、 https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial
5、 http://ratankrnath.blogspot.com/2012/03/hadoop-testing-mrunit.html

你可能感兴趣的:(mapreduce,hadoop,单元测试,测试,mrunit)