DT大数据梦工厂Spark定制班笔记(012)

Spark Streaming源码解读之Executor容错安全性

Executor的容错性主要有两种方式

1) WAL日志

2) 借助Spark RDD自身的容错机制


分别体现在receivedBlockHandler的两种实现上(ReceiverSupervisorImpl.scala 55-68)

private val receivedBlockHandler: ReceivedBlockHandler = {
  if (WriteAheadLogUtils.enableReceiverLog(env.conf)) {
    if (checkpointDirOption.isEmpty) {
      throw new SparkException(
        "Cannot enable receiver write-ahead log without checkpoint directory set. " +
          "Please use streamingContext.checkpoint() to set the checkpoint directory. " +
          "See documentation for more details.")
    }
    new WriteAheadLogBasedBlockHandler(env.blockManager, env.serializerManager, receiver.streamId,
      receiver.storageLevel, env.conf, hadoopConf, checkpointDirOption.get)
  } else {
    new BlockManagerBasedBlockHandler(env.blockManager, receiver.storageLevel)
  }
}
 
  
第一种实现为WriteAheadLogBasedBlockHandler (ReceivedBlockHandler.scala 125-134
private[streaming] class WriteAheadLogBasedBlockHandler(
    blockManager: BlockManager,
    serializerManager: SerializerManager,
    streamId: Int,
    storageLevel: StorageLevel,
    conf: SparkConf,
    hadoopConf: Configuration,
    checkpointDir: String,
    clock: Clock = new SystemClock
  )
实例化时需要指明checkpoint所在路径。
注:checkpoint一般会在HDFS上,默认有3份副本;指定storgaeLevel时没有必要再制定副本数目。

第二种实现为BlockManagerBasedBlockHandler(ReceivedBlockHandler.scala 69-70)
private[streaming] class BlockManagerBasedBlockHandler(
    blockManager: BlockManager, storageLevel: StorageLevel)
借助RDD自身容错时,实例化要简单一些,重要的是指定 storageLevel
默认为MEMORY_AND_DISK_SER_2

storageLevel有如下选项可选择如下(StorageLevel.scala 39-45 属于Spark Core):
class StorageLevel private(
    private var _useDisk: Boolean,
    private var _useMemory: Boolean,
    private var _useOffHeap: Boolean,
    private var _deserialized: Boolean,
    private var _replication: Int = 1)
  extends Externalizabl


你可能感兴趣的:(DT大数据梦工厂Spark定制班笔记(012))