distributed

Path 包含一个uri
其中,path( Path parent, Path child)构造函数 将 parent 和 child的uri生成新的uri

Mapper and Reducer,一个完成映射,一个完成计算

Reduce:
ReduceTask.java:
// apply reduce function
    try {
      Class keyClass = job.getMapOutputKeyClass();
      Class valClass = job.getMapOutputValueClass();
      ReduceValuesIterator values = new ReduceValuesIterator(rIter, comparator,
                                  keyClass, valClass, umbilical, job);
      values.informReduceProgress();
      while (values.more()) {
        reporter.incrCounter(REDUCE_INPUT_RECORDS, 1);
        reducer.reduce(values.getKey(), values, collector, reporter);
        values.nextKey();
        values.informReduceProgress();
      }

      //Clean up: repeated in catch block below
      reducer.close();
      out.close(reporter);
Collector:
ReduceTask.java:
final RecordWriter out =
      job.getOutputFormat().getRecordWriter(fs, job, finalName, reporter) ; 
   
    OutputCollector collector = new OutputCollector() {
        public void collect(WritableComparable key, Writable value)
          throws IOException {
          out.write(key, value);
          reporter.incrCounter(REDUCE_OUTPUT_RECORDS, 1);
          reportProgress(umbilical);
        }
      };
Map:
call SquenceFile.MergeQueue.merge() to merge all the maps


Iter:
different to the iterator in java SDK,按照一个一个key来操作value
try {
      Class keyClass = job.getMapOutputKeyClass();
      Class valClass = job.getMapOutputValueClass();
      ReduceValuesIterator values = new ReduceValuesIterator(rIter, comparator,
                                  keyClass, valClass, umbilical, job);
      values.informReduceProgress();
      while (values.more()) {
        reporter.incrCounter(REDUCE_INPUT_RECORDS, 1);
        reducer.reduce(values.getKey(), values, collector, reporter);
        values.nextKey();
        values.informReduceProgress();
      }

private void getNext() throws IOException {
      Writable lastKey = key;                     // save previous key
      try {
        key = (WritableComparable)ReflectionUtils.newInstance(keyClass, this.conf);
        value = (Writable)ReflectionUtils.newInstance(valClass, this.conf);
      } catch (Exception e) {
        throw new RuntimeException(e);
      }
      more = in.next();
      if (more) {
        //de-serialize the raw key/value
        keyIn.reset(in.getKey().getData(), in.getKey().getLength());
        key.readFields(keyIn);
        valOut.reset();
        (in.getValue()).writeUncompressedBytes(valOut);
        valIn.reset(valOut.getData(), valOut.getLength());
        value.readFields(valIn);

        if (lastKey == null) {
          hasNext = true;
        } else {
          hasNext = (comparator.compare(key, lastKey) == 0);
        }
      } else {
        hasNext = false;
      }
    }
  }
how Many Reduces?
The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

你可能感兴趣的:(UP)