下面是我们的ShuffleMapTask当中的runTask的方法,在这个方法当中主要是调用了我们的HashShuffleWrite当中的write方法来进行具体的写出操作
/**
*
*/
override def runTask(context: TaskContext): MapStatus = {
// Deserialize the RDD using the broadcast variable.
//反序列化的起始时间
val deserializeStartTime = System.currentTimeMillis()
// 获得反序列化器closureSerializer
val ser = SparkEnv.get.closureSerializer.newInstance()
// 调用反序列化器closureSerializer的deserialize()进行RDD和ShuffleDependency的反序列化,数据来源于taskBinary
val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
//计算Executor进行反序列化的时间
_executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime
metrics = Some(context.taskMetrics)
var writer: ShuffleWriter[Any, Any] = null
try {
//获得shuffleManager
val manager = SparkEnv.get.shuffleManager
//根据partition指定分区的Shufflea获取Shuffle Writer,shuffleHandle是shuffle ID
//partitionId表示的是当前RDD的某个partition,也就是说write操作作用于partition之上
writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
//针对RDD中的分区partition,调用rdd的iterator()方法后,再调用writer的write()方法,写数据
writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
//停止writer,并返回标志位
writer.stop(success = true).get
} catch {
case e: Exception =>
try {
if (writer != null) {
writer.stop(success = false)
}
} catch {
case e: Exception =>
log.debug("Could not stop writer", e)
}
throw e
}
}
下面这个代码是我们的HashShuffleWrite的写方法的代码如下:
/**
* Write a bunch of records to this task's output
* 将一堆记录写入此任务的输出*/
/**
* 主要处理两件事:
* 1)判断是否需要进行聚合,比如和都要写入的话,那么先生成
* 然后再进行后续的写入工作
* 2)利用Partition函数来决定写入哪一个文件中.
*/
override def write(records: Iterator[Product2[K, V]]): Unit = {
//判断aggregator是否被定义,需要做Map端聚合操作
val iter = if (dep.aggregator.isDefined) {
if (dep.mapSideCombine) {//判断是否需要聚合,如果需要,聚合records执行map端的聚合
//汇聚工作,reducebyKey是一分为二的,一部在ShuffleMapTask中进行聚合
//另一部分在resultTask中聚合
dep.aggregator.get.combineValuesByKey(records, context)
} else {
records
}
} else {
require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")
records
}
//利用getPartition函数来决定写入哪一个文件中.
for (elem <- iter) {
//elem是类似于的键值对,以K为参数用partitioner计算其对应的值,
val bucketId = dep.partitioner.getPartition(elem._1)//获得该element需要写入的partitioner
//实际调用FileShuffleBlockManager.forMapTask进入数据写入
//bucketId文件名称,key elem._1,value elem._2
shuffle.writers(bucketId).write(elem._1, elem._2)
}
}
FileShuffleBlockResolver类的主要解析如下:
/**
* Manages assigning disk-based block writers to shuffle tasks. Each shuffle task gets one file
* per reducer (this set of files is called a ShuffleFileGroup).
* 管理分配基于磁盘的块写入器来随机播放任务,每个shuffle任务每个reducer获取一个文件(这组文件称为ShuffleFileGroup)
*
* As an optimization to reduce the number of physical shuffle files produced, multiple shuffle
* blocks are aggregated into the same file. There is one "combined shuffle file" per reducer
* per concurrently executing shuffle task. As soon as a task finishes writing to its shuffle
* files, it releases them for another task.
*
* 作为减少生成的物理随机播放文件数量的优化,多个shuffle块被聚合到同一个文件中,每个并发执行随机播放任务,每个reducer有一个“组合shuffle文件”
* 一旦任务完成对其随机播放文件的写入,它将释放它们用于另一个任务。
*
* Regarding the implementation of this feature, shuffle files are identified by a 3-tuple:
* 关于此功能的实现,随机播放文件由3元组标识:
* - shuffleId: The unique id given to the entire shuffle stage.给予整个洗牌阶段的唯一身份
* - bucketId: The id of the output partition (i.e., reducer id)输出分区的id(即reducer id)
* - fileId: The unique id identifying a group of "combined shuffle files." Only one task at a
* time owns a particular fileId, and this id is returned to a pool when the task finishes.
* 识别一组“组合的shuffle文件”的唯一ID,一次只有一个任务拥有一个特定的fileId,当任务完成时,这个id返回给一个池
* Each shuffle file is then mapped to a FileSegment, which is a 3-tuple (file, offset, length)
* that specifies where in a given file the actual block data is located.
* 然后将每个随机shuffle文件映射到FileSegment,FileSegment是一个3元组(文件,偏移量,长度),用于指定给定文件中实际块数据所在的位置
*
* Shuffle file metadata is stored in a space-efficient manner. Rather than simply mapping
* ShuffleBlockIds directly to FileSegments, each ShuffleFileGroup maintains a list of offsets for
* each block stored in each file. In order to find the location of a shuffle block, we search the
* files within a ShuffleFileGroups associated with the block's reducer.
*
*Shuffle文件元数据以节省空间的方式存储,而不是简单的映射ShuffleBlock直接转到FileSegments,
* 每个ShuffleFileGroup为每个文件中存储的每个块维护一个偏移量列表,为了找到混洗块的位置,
* 我们搜索与块的reducer相关联的ShuffleFileGroup中的文件。
*/
上面这个类的 forMapTask方法如下
/**
*
* Get a ShuffleWriterGroup for the given map task, which will register it as complete
* when the writers are closed successfully
* 为给定的Map任务获取一个ShuffleWriterGroup,当写关闭成功时,它将注册为完整的
* mapId对应RDD的partionsID
*
*/
def forMapTask(shuffleId: Int, mapId: Int, numBuckets: Int, serializer: Serializer,
writeMetrics: ShuffleWriteMetrics): ShuffleWriterGroup = {
new ShuffleWriterGroup {
shuffleStates.putIfAbsent(shuffleId, new ShuffleState(numBuckets))
private val shuffleState = shuffleStates(shuffleId)
private var fileGroup: ShuffleFileGroup = null
val openStartTime = System.nanoTime
val serializerInstance = serializer.newInstance()
//如果consolidateShuffleFiles为true,那么在一个Task中,有多少个输出的Partition就会有多少个中间文件,默认为false
val writers: Array[DiskBlockObjectWriter] = if (consolidateShuffleFiles) {
fileGroup = getUnusedFileGroup()//获取没有使用的FileGroup
Array.tabulate[DiskBlockObjectWriter](numBuckets) { bucketId =>
//mapId对应RDD的partionsID
val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)
blockManager.getDiskWriter(blockId, fileGroup(bucketId), serializerInstance, bufferSize,
writeMetrics)
}
} else {
Array.tabulate[DiskBlockObjectWriter](numBuckets) { bucketId =>
//mapId对应RDD的partionsID
val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)
//如果blockFile已经存在,那么删除它并打印日志
val blockFile = blockManager.diskBlockManager.getFile(blockId)
val tmp = Utils.tempFileWith(blockFile)
//tmp也就是blockFile如果已经存在则,在后面追加数据
blockManager.getDiskWriter(blockId, tmp, serializerInstance, bufferSize, writeMetrics)
}
}
// Creating the file to write to and creating a disk writer both involve interacting with
// the disk, so should be included in the shuffle write time.
//创建要写入和创建磁盘刻录机的文件都涉及与磁盘交互,因此应该包含在shuffle写入的时间。
writeMetrics.incShuffleWriteTime(System.nanoTime - openStartTime)
override def releaseWriters(success: Boolean) {
if (consolidateShuffleFiles) {
if (success) {
val offsets = writers.map(_.fileSegment().offset)
val lengths = writers.map(_.fileSegment().length)
//mapId对应RDD的partionsID
fileGroup.recordMapOutput(mapId, offsets, lengths)
}
recycleFileGroup(fileGroup)
} else {
//mapId对应RDD的partionsID
shuffleState.completedMapTasks.add(mapId)
}
}