reduce源码分析

初学mapreduce源码分析
reduce
reducetask.run();
在类reducetask中run方法:由yarnchild调用
run方法中:RawKeyValueIterator rIter = null;此类为迭代器reducer类中的reduce方法中参数(key , Iteractor values,context)中的 Iteractor values也是一个一个的keyvalue传过来的
run方法中:ShuffleConsumerPlugin shuffleConsumerPlugin = null;Class clazz =
job.getClass(MRConfig.SHUFFLE_CONSUMER_PLUGIN, Shuffle.class, ShuffleConsumerPlugin.class);

shuffleConsumerPlugin = ReflectionUtils.newInstance(clazz, job);负责洗牌、可以自定义洗牌规则
		Shuffle.class中的run方法
			Start the map-completion events fetcher thread:eventFetcher.start();map任务结束后启动提取线程
			 Start the map-output fetcher threads
			Wait for shuffle to complete successfully
	 Class keyClass = job.getMapOutputKeyClass();
Class valueClass = job.getMapOutputValueClass();
RawComparator comparator = job.getOutputValueGroupingComparator();

if (useNewApi) {
  runNewReducer(job, umbilical, reporter, rIter, comparator, 
                keyClass, valueClass);
} 
		  RawComparator comparator = job.getOutputValueGroupingComparator();分组比较器的自定义;作用定义怎么分组setGroupingComparatorClass(Class):比如默认分组相同key为1组:此类的比较器issamekey决定是否适用1个reduce方法:
			此代码可实现RawComparator getOutputValueGroupingComparator() {
Class theClass = getClass(
  JobContext.GROUP_COMPARATOR_CLASS, null, RawComparator.class);
if (theClass == null) {
  return getOutputKeyComparator();
}

return ReflectionUtils.newInstance(theClass, this);

}
runNewReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
类中的方法private
void runNewReducer(JobConf job,
final TaskUmbilicalProtocol umbilical,
final TaskReporter reporter,
RawKeyValueIterator rIter,
RawComparator comparator,
Class keyClass,
Class valueClass
)
// wrap value iterator to report progress.
// make a task context so we can get the classes
/ make a reducer
org.apache.hadoop.mapreduce.Reducer reducer =
(org.apache.hadoop.mapreduce.Reducer)
ReflectionUtils.newInstance(taskContext.getReducerClass(), job);
org.apache.hadoop.mapreduce.RecordWriter trackedRW =
new NewTrackingRecordWriter(this, taskContext);
job.setBoolean(“mapred.skip.on”, isSkipping());
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());
org.apache.hadoop.mapreduce.Reducer.Context
reducerContext = createReduceContext(reducer, job, getTaskID(),
rIter, reduceInputKeyCounter,
reduceInputValueCounter,
trackedRW,//含有write方法的方法
committer,
reporter, comparator, keyClass,
valueClass);
try {
reducer.run(reducerContext);
reducerContext是 ReduceContextImpl类的对象:public ReduceContextImpl(Configuration conf, TaskAttemptID taskid,
RawKeyValueIterator input,
Counter inputKeyCounter,
Counter inputValueCounter,
RecordWriter output,
OutputCommitter committer,
StatusReporter reporter,
RawComparator comparator,
Class keyClass,
Class valueClass
)
ReduceContextImpl类的对象:nextKeyValue() getCurrentKey() getCurrentValue()
nextKeyValue()
public boolean nextKeyValue() throws IOException, InterruptedException {
if (!hasMore) {
key = null;
value = null;
return false;
}
firstValue = !nextKeyIsSame;
DataInputBuffer nextKey = input.getKey();// //是RawKeyValueIterator input一个迭代器使用迭代器指定//在keyvalue的一个值上相当于1个指针在内存地址上
currentRawKey.set(nextKey.getData(), nextKey.getPosition(),
nextKey.getLength() - nextKey.getPosition());
buffer.reset(currentRawKey.getBytes(), 0, currentRawKey.getLength());
key = keyDeserializer.deserialize(key);//反序列化key值
DataInputBuffer nextVal = input.getValue();
buffer.reset(nextVal.getData(), nextVal.getPosition(), nextVal.getLength()
- nextVal.getPosition());
value = valueDeserializer.deserialize(value);//反序列化value值返回value值

currentKeyLength = nextKey.getLength() - nextKey.getPosition();
currentValueLength = nextVal.getLength() - nextVal.getPosition();

if (isMarked) {
  backupStore.write(nextKey, nextVal);
}

hasMore = input.next();
if (hasMore) {
  nextKey = input.getKey();//比较前后2个值是否相同
  nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0, 
                                 currentRawKey.getLength(),
                                 nextKey.getData(),
                                 nextKey.getPosition(),
                                 nextKey.getLength() - nextKey.getPosition()
                                     ) == 0;
} else {
  nextKeyIsSame = false;
}
inputValueCounter.increment(1);
return true;

}此方法取得当前key、value返回boolean
getCurrentKey()
getCurrentValue()
run方法中:public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
// If a back up store is used, reset it
Iterator iter = context.getValues().iterator();
if(iter instanceof ReduceContext.ValueIterator) {
((ReduceContext.ValueIterator)iter).resetBackupStore();
}
}
} finally {
cleanup(context);
}其中的ruduce方法是ruducer中的方法可以自定义
此方法为我们自定义的ruduce方法:reduce(Text key, Iterable values,
Context context) throws IOException, InterruptedException {
// <“hello”, [1,1,1,1,1,1,1,1,1]>
//获取迭代器,遍历values
Iterator iterator = values.iterator();

	long sum = 0L;
	
	while (iterator.hasNext()) {
		LongWritable num = iterator.next();
		sum += num.get();
	}
	//将求和的总数封装为LongWritable类型,并输出到HDFS
	outValue.set(sum);
	context.write(key, outValue);
}
							其中的Iterable values为context.getValues()返回一个iterable:private ValueIterable iterable = new ValueIterable();
								iterable为ValueIterable 此类的实现
									hasNext()
										 return firstValue || nextKeyIsSame;返回true或者false: nextKeyIsSame如果下一个与当前key相同返回true执行next方法:1个reduce方法是执行的相同的key,不同的key是有可能给不同的reduce方法
											nextKeyIsSame: nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0, 
                                 currentRawKey.getLength(),
                                 nextKey.getData(),
                                 nextKey.getPosition(),
                                 nextKey.getLength() - nextKey.getPosition()
                                     ) == 0;是否适用相同的reduce方法取决于比较器
									next()
												nextKeyValue();此方法取得下一个key、value返回boolean再调用就是取得下一个值
    return value;

你可能感兴趣的:(reduce源码分析)