[置顶] 一步一步学习hadoop(九)

  Reducer的实现

    map任务读取数据,解析数据,按照键值将数据分成一组一组的,reduce任务收集map任务的输出,通过合并、排序和归约三个过程对map的输出数据进行进一步的处理。现在我们只关心归约过程即reduce函数的实现。

    实际上我们不用重新去实现,只需继承Hadoop提供的Mapper类即可,Mapper类的几个主要函数如下:
 
protected void setup(Context context
                       ) throws IOException, InterruptedException {
    //添加自己的初始化程序,比如读取作业的配置,自定义参数,读取DistrubteCache等
  }

  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    //reduce主要业务流,下面的是默认实现,即老版本的IdentityReduce,数据原样输出。通过覆写,实现自己的业务流程
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }


  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // 所有的清理操作
  }



下面还是以一个例子作为结束,该类实现了liunx的cut工具的功能的Reducer,和上一节中的FieldSelectionMapper刚好是一套,所有的设置也是类似的,
输出的key/value对,以mapreduce.fieldsel.reduce.output.key.value.fields.spec来指定,格式和FieldSelectionMapper一样

public class FieldSelectionReducer<K, V>
    extends Reducer<Text, Text, Text, Text> {

  private String fieldSeparator = "\t";

  private String reduceOutputKeyValueSpec;

  private List<Integer> reduceOutputKeyFieldList = new ArrayList<Integer>();

  private List<Integer> reduceOutputValueFieldList = new ArrayList<Integer>();

  private int allReduceValueFieldsFrom = -1;

  public static final Log LOG = LogFactory.getLog("FieldSelectionMapReduce");

  public void setup(Context context)
      throws IOException, InterruptedException {
    Configuration conf = context.getConfiguration();

    this.fieldSeparator =
      conf.get(FieldSelectionHelper.DATA_FIELD_SEPERATOR, "\t");
    
    this.reduceOutputKeyValueSpec =
      conf.get(FieldSelectionHelper.REDUCE_OUTPUT_KEY_VALUE_SPEC, "0-:");
    
    allReduceValueFieldsFrom = FieldSelectionHelper.parseOutputKeyValueSpec(
      reduceOutputKeyValueSpec, reduceOutputKeyFieldList,
      reduceOutputValueFieldList);
  }

  public void reduce(Text key, Iterable<Text> values, Context context)
      throws IOException, InterruptedException {
    String keyStr = key.toString() + this.fieldSeparator;
    
    for (Text val : values) {
      FieldSelectionHelper helper = new FieldSelectionHelper();
      helper.extractOutputKeyValue(keyStr, val.toString(),
        fieldSeparator, reduceOutputKeyFieldList,
        reduceOutputValueFieldList, allReduceValueFieldsFrom, false, false);
      context.write(helper.getKey(), helper.getValue());
    }
  }
}


你可能感兴趣的:([置顶] 一步一步学习hadoop(九))