DT大数据梦工厂Spark定制班笔记(013)

Spark Streaming源码解读之Driver容错安全性

概述

Driver容错三个层面:

1. 数据层面: ReceivedBlockTracker负责管理Spark Streaming应用的元数据。

2. 逻辑层面: DStream

3. 作业调度层面,JobGenerator是Job调度层面的,负责监控具体调度到什么程度了。


源码分析

先进入ReceivedBlockTracker (ReceivedBlockTracker.scala 55-71)

/**

 * Class that keep track of all the received blocks, and allocate them to batches
 * when required. All actions taken by this class can be saved to a write ahead log
 * (if a checkpoint directory has been provided), so that the state of the tracker
 * (received blocks and block-to-batch allocations) can be recovered after driver failure.
 *
 * Note that when any instance of this class is created with a checkpoint directory,
 * it will try reading events from logs in the directory.
 */
private[streaming] class ReceivedBlockTracker(
    conf: SparkConf,
    hadoopConf: Configuration,
    streamIds: Seq[Int],
    clock: Clock,
    recoverFromWriteAheadLog: Boolean,
    checkpointDirOption: Option[String])
  extends Logging

其中receoverFromWriteAheadLog是其采用WAL的明证。

ReceivedBlockTracker的重要作用从其代码注释中可见一斑。

下面进入ReceiverBlockTracker接收并处理数据的代码部分
ReceiverBlockTracker.addBlock源码如下 (ReceiverBlockTracker.scala 87-106)
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
  try {
    val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo))
    if (writeResult) {
      synchronized {
        getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
      }
      logDebug(s"Stream ${receivedBlockInfo.streamId} received " +
        s"block ${receivedBlockInfo.blockStoreResult.blockId}")
    } else {
      logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " +
        s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.")
    }
    writeResult
  } catch {
    case NonFatal(e) =>
      logError(s"Error adding block $receivedBlockInfo", e)
      false
  }
}
ReceiverBlockTracker在收到元数据信息后就直接通过WAL进行容错, 成功后然后才会写进内存中。

下面考察一下allocateBlocksToBatch的源码 (receiverBlockTracker.scala 112-134)
def allocateBlocksToBatch(batchTime: Time): Unit = synchronized {
  if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) {
    val streamIdToBlocks = streamIds.map { streamId =>
        (streamId, getReceivedBlockQueue(streamId).dequeueAll(x => true))
    }.toMap
    val allocatedBlocks = AllocatedBlocks(streamIdToBlocks)
    if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) {
      timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
      lastAllocatedBatchTime = batchTime
    } else {
      logInfo(s"Possibly processed batch $batchTime needs to be processed again in WAL recovery")
    }
  } else {
    // This situation occurs when:
    // 1. WAL is ended with BatchAllocationEvent, but without BatchCleanupEvent,
    // possibly processed batch job or half-processed batch job need to be processed again,
    // so the batchTime will be equal to lastAllocatedBatchTime.
    // 2. Slow checkpointing makes recovered batch time older than WAL recovered
    // lastAllocatedBatchTime.
    // This situation will only occurs in recovery time.
    logInfo(s"Possibly processed batch $batchTime needs to be processed again in WAL recovery")
  }
}
其中writeToLog就是写入WAL操作。
而allocatedBlocks就是根据时间获取的一批元数据,交给相应的job使用。
这就意味着在Job使用元数据前先进行WAL,如果job出错恢复后,可以知道数据计算到什么位置

未完待续


你可能感兴趣的:(DT大数据梦工厂Spark定制班笔记(013))