首先看内存合并线程InMemFSMergeThread的run函数。
ReduceTask.java 2699行
public void run() { LOG.info(reduceTask.getTaskID() + " Thread started: " + getName()); try { boolean exit = false; do { exit = ramManager.waitForDataToMerge(); //检测是否需要合并 if (!exit) { doInMemMerge();//执行合并 } } while (!exit); } catch (Exception e) { LOG.warn(reduceTask.getTaskID() + " Merge of the inmemory files threw an exception: " + StringUtils.stringifyException(e)); ReduceCopier.this.mergeThrowable = e; } catch (Throwable t) { String msg = getTaskID() + " : Failed to merge in memory" + StringUtils.stringifyException(t); reportFatalError(getTaskID(), t, msg); } }
下面是内存合并的条件,注释写的已经很清楚了,这里需要注意的是内存的使用量、拷贝完毕的文件数、挂起线程数,线程挂起的判断条件是用于保留map端数据的内存超过阈值,可参考ShuffleRamManager.reserve()函数 。ReduceTask.java 1165行
public boolean waitForDataToMerge() throws InterruptedException { boolean done = false; synchronized (dataAvailable) { // Start in-memory merge if manager has been closed or... while (!closed && // In-memory threshold exceeded and at least two segments // have been fetched (getPercentUsed() < maxInMemCopyPer || numClosed < 2) && // More than "mapred.inmem.merge.threshold" map outputs // have been fetched into memory (maxInMemOutputs <= 0 || numClosed < maxInMemOutputs) && // More than MAX... threads are blocked on the RamManager // or the blocked threads are the last map outputs to be // fetched. If numRequiredMapOutputs is zero, either // setNumCopiedMapOutputs has not been called (no map ouputs // have been fetched, so there is nothing to merge) or the // last map outputs being transferred without // contention, so a merge would be premature. (numPendingRequests < numCopiers*MAX_STALLED_SHUFFLE_THREADS_FRACTION && (0 == numRequiredMapOutputs || numPendingRequests < numRequiredMapOutputs))) { dataAvailable.wait(); } done = closed; } return done; }
这里的合并可以和map端的合并对比来看,逻辑大同小异,确定文件名、构建写入器,将segment放入合并队列中,如果有本地合并函数则先合并否则直接写入文件。
ReduceTask.java 2722行
private void doInMemMerge() throws IOException{ if (mapOutputsFilesInMemory.size() == 0) { return; } //name this output file same as the name of the first file that is //there in the current list of inmem files (this is guaranteed to //be absent on the disk currently. So we don't overwrite a prev. //created spill). Also we need to create the output file now since //it is not guaranteed that this file will be present after merge //is called (we delete empty files as soon as we see them //in the merge method) //figure out the mapId TaskID mapId = mapOutputsFilesInMemory.get(0).mapId; List<Segment<k v="">> inMemorySegments = new ArrayList<Segment<k v="">>(); long mergeOutputSize = createInMemorySegments(inMemorySegments, 0); int noInMemorySegments = inMemorySegments.size(); Path outputPath = mapOutputFile.getInputFileForWrite(mapId, mergeOutputSize); Writer writer = new Writer(conf, rfs, outputPath, conf.getMapOutputKeyClass(), conf.getMapOutputValueClass(), codec, null); RawKeyValueIterator rIter = null; try { LOG.info("Initiating in-memory merge with " + noInMemorySegments + " segments..."); rIter = Merger.merge(conf, rfs, (Class<k>)conf.getMapOutputKeyClass(), (Class<v>)conf.getMapOutputValueClass(), inMemorySegments, inMemorySegments.size(), new Path(reduceTask.getTaskID().toString()), conf.getOutputKeyComparator(), reporter, spilledRecordsCounter, null); if (combinerRunner == null) { Merger.writeFile(rIter, writer, reporter, conf); } else { combineCollector.setWriter(writer); combinerRunner.combine(rIter, combineCollector); } writer.close(); LOG.info(reduceTask.getTaskID() + " Merge of the " + noInMemorySegments + " files in-memory complete." + " Local file is " + outputPath + " of size " + localFileSys.getFileStatus(outputPath).getLen()); } catch (Exception e) { //make sure that we delete the ondisk file that we created //earlier when we invoked cloneFileAttributes localFileSys.delete(outputPath, true); throw (IOException)new IOException ("Intermediate merge failed").initCause(e); } // Note the output of the merge FileStatus status = localFileSys.getFileStatus(outputPath); synchronized (mapOutputFilesOnDisk) { addToMapOutputFilesOnDisk(status); } } } </v></k></k></k>
磁盘文件的合并与此大致相同,可以具体细节可以查看org.apache.hadoop.mapred.ReduceTask.ReduceCopier.LocalFSMerger。