在reduce端的文件拷贝阶段,会将数据放入内存或直接放入磁盘中,如果文件全部拷贝完再进行合并那样必然降低作业效率,所以在拷贝进行到一定阶段,数据的合并就开始了,负责该工作的有两个线程:InMemFSMergeThread和LocalFSMerger,分别针对内存和磁盘Segment的合并。
首先看内存合并线程InMemFSMergeThread的run函数
public void run() {
LOG.info(reduceTask.getTaskID() + " Thread started: " + getName());
try {
boolean exit = false;
do {
exit = ramManager.waitForDataToMerge(); //检测是否需要合并
if (!exit) {
doInMemMerge();//执行合并
}
} while (!exit);
} catch (Exception e) {
LOG.warn(reduceTask.getTaskID() +
" Merge of the inmemory files threw an exception: "
+ StringUtils.stringifyException(e));
ReduceCopier.this.mergeThrowable = e;
} catch (Throwable t) {
String msg = getTaskID() + " : Failed to merge in memory"
+ StringUtils.stringifyException(t);
reportFatalError(getTaskID(), t, msg);
}
}
下面是内存合并的条件,注释写的已经很清楚了,这里需要注意的是内存的使用量、拷贝完毕的文件数、挂起线程数,线程挂起的判断条件是用于保留map端数据的内存超过阈值,可参考ShuffleRamManager.reserve()函数
public boolean waitForDataToMerge() throws InterruptedException {
boolean done = false;
synchronized (dataAvailable) {
// Start in-memory merge if manager has been closed or...
while (!closed
&&
// In-memory threshold exceeded and at least two segments
// have been fetched
(getPercentUsed() < maxInMemCopyPer || numClosed < 2)
&&
// More than "mapred.inmem.merge.threshold" map outputs
// have been fetched into memory
(maxInMemOutputs <= 0 || numClosed < maxInMemOutputs)
&&
// More than MAX... threads are blocked on the RamManager
// or the blocked threads are the last map outputs to be
// fetched. If numRequiredMapOutputs is zero, either
// setNumCopiedMapOutputs has not been called (no map ouputs
// have been fetched, so there is nothing to merge) or the
// last map outputs being transferred without
// contention, so a merge would be premature.
(numPendingRequests <
numCopiers*MAX_STALLED_SHUFFLE_THREADS_FRACTION &&
(0 == numRequiredMapOutputs ||
numPendingRequests < numRequiredMapOutputs))) {
dataAvailable.wait();
}
done = closed;
}
return done;
}
这里的合并可以和map端的合并对比来看,逻辑大同小异,确定文件名、构建写入器,将segment放入合并队列中,如果有本地合并函数则先合并否则直接写入文件。
private void doInMemMerge() throws IOException{
if (mapOutputsFilesInMemory.size() == 0) {
return;
}
//name this output file same as the name of the first file that is
//there in the current list of inmem files (this is guaranteed to
//be absent on the disk currently. So we don't overwrite a prev.
//created spill). Also we need to create the output file now since
//it is not guaranteed that this file will be present after merge
//is called (we delete empty files as soon as we see them
//in the merge method)
//figure out the mapId
TaskID mapId = mapOutputsFilesInMemory.get(0).mapId;
List<Segment<K, V>> inMemorySegments = new ArrayList<Segment<K,V>>();
long mergeOutputSize = createInMemorySegments(inMemorySegments, 0);
int noInMemorySegments = inMemorySegments.size();
Path outputPath =
mapOutputFile.getInputFileForWrite(mapId, mergeOutputSize);
Writer writer =
new Writer(conf, rfs, outputPath,
conf.getMapOutputKeyClass(),
conf.getMapOutputValueClass(),
codec, null);
RawKeyValueIterator rIter = null;
try {
LOG.info("Initiating in-memory merge with " + noInMemorySegments +
" segments...");
rIter = Merger.merge(conf, rfs,
(Class<K>)conf.getMapOutputKeyClass(),
(Class<V>)conf.getMapOutputValueClass(),
inMemorySegments, inMemorySegments.size(),
new Path(reduceTask.getTaskID().toString()),
conf.getOutputKeyComparator(), reporter,
spilledRecordsCounter, null);
if (combinerRunner == null) {
Merger.writeFile(rIter, writer, reporter, conf);
} else {
combineCollector.setWriter(writer);
combinerRunner.combine(rIter, combineCollector);
}
writer.close();
LOG.info(reduceTask.getTaskID() +
" Merge of the " + noInMemorySegments +
" files in-memory complete." +
" Local file is " + outputPath + " of size " +
localFileSys.getFileStatus(outputPath).getLen());
} catch (Exception e) {
//make sure that we delete the ondisk file that we created
//earlier when we invoked cloneFileAttributes
localFileSys.delete(outputPath, true);
throw (IOException)new IOException
("Intermediate merge failed").initCause(e);
}
// Note the output of the merge
FileStatus status = localFileSys.getFileStatus(outputPath);
synchronized (mapOutputFilesOnDisk) {
addToMapOutputFilesOnDisk(status);
}
}
}
磁盘文件的合并与此大致相同,可以具体细节可以查看org.apache.hadoop.mapred.ReduceTask.ReduceCopier.LocalFSMerger