本期内容
本讲讲解sparkStreaming的driver部分的数据的接受和管理的部分,即receiverTracker,包括:
1.receiverTracker的架构设计
2.消息循环系统
3.receiverTracker的具体实现
通过前面的课程,我们知道:
receiverTracker以driver中具体自己的算法在具体的executor上启动receiver。
启动的方式是:把每个receiver封装成1个task,该tasks是job中唯一的task,也就是说,
有多少receiver就会分发多少个Job,每个Job只有一个task,该task内部只有1条数据即该receiver的数据。
receiverTracker在启动receiver的时候,有个receiverSupervisorImpl,它在启动的时候反过来帮我们启动receiver,receiver不断地接受数据,转过来通过blockGenerator把自己接受的数据封装成1个1个的block(背后有定时器),不断地把数据存储,有2中存储方式:第一种直接通过blockmanger存储,第二种先写WAL;存储过之后receiverSupervisorImpl会把存储的数据的元数据汇报给receiverTracker,(实际上汇报给的是receiverTracker的rpc通讯实体。),汇报的消息包括具体的数据的id,具体位置,多少条,大小等等。receiverTracker接受到这些数据后,转过来再进行一下步的数据管理工作。
本节讲解receiverTracker接受到这些数据后,会怎样具体进行一下步的处理。
ReceiverSupervisorImpl通过receivedBlockHandler来写数据,根据是否要写wal,receivedBlockHandler的具体实现有2种方式:
private val receivedBlockHandler: ReceivedBlockHandler = {
if (WriteAheadLogUtils.enableReceiverLog(env.conf)) {
if (checkpointDirOption.isEmpty) {
throw new SparkException(
"Cannot enable receiver write-ahead log without checkpoint directory set. " +
"Please use streamingContext.checkpoint() to set the checkpoint directory. " +
"See documentation for more details.")
}
//使用了wal时的实现
new WriteAheadLogBasedBlockHandler(env.blockManager, receiver.streamId,
receiver.storageLevel, env.conf, hadoopConf, checkpointDirOption.get)
} else {
//没使用wal时的实现
new BlockManagerBasedBlockHandler(env.blockManager, receiver.storageLevel)
}
}
接受到数据且封装成block后的处理:
/** Store an ArrayBuffer of received data as a data block into Spark's memory. */
def pushArrayBuffer(
arrayBuffer: ArrayBuffer[_],
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
//通过pushAndReportBlock()来存储数据且把元数据汇报给driver:
pushAndReportBlock(ArrayBufferBlock(arrayBuffer), metadataOption, blockIdOption)
}
/** Store a iterator of received data as a data block into Spark's memory. */
def pushIterator(
iterator: Iterator[_],
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
//通过pushAndReportBlock()来存储数据且把元数据汇报给driver:
pushAndReportBlock(IteratorBlock(iterator), metadataOption, blockIdOption)
}
/** Store the bytes of received data as a data block into Spark's memory. */
def pushBytes(
bytes: ByteBuffer,
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
//通过pushAndReportBlock()来存储数据且把元数据汇报给driver:
pushAndReportBlock(ByteBufferBlock(bytes), metadataOption, blockIdOption)
}
通过pushAndReportBlock()来存储数据且把元数据汇报给driver:
/** Store block and report it to driver */
def pushAndReportBlock(
receivedBlock: ReceivedBlock,
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
val blockId = blockIdOption.getOrElse(nextBlockId)
val time = System.currentTimeMillis
val blockStoreResult = receivedBlockHandler.storeBlock(blockId, receivedBlock)
logDebug(s"Pushed block $blockId in ${(System.currentTimeMillis - time)} ms")
val numRecords = blockStoreResult.numRecords
//把存储的block信息封装起来。
val blockInfo = ReceivedBlockInfo(streamId, numRecords, metadataOption, blockStoreResult)
//发消息给ReceiverTracker,askWithRetry确保发送成功
trackerEndpoint.askWithRetry[Boolean](AddBlock(blockInfo))
logDebug(s"Reported block $blockId")
}
其中用到了case class ReceivedBlockInfo()把欲存储的block信息封装起来。
/** Information about blocks received by the receiver */
private[streaming] case class ReceivedBlockInfo( streamId: Int, numRecords: Option[Long], metadataOption: Option[Any], blockStoreResult: ReceivedBlockStoreResult ) {
ReceivedBlockStoreResult封装了欲存储的block的元数据信息:
/** Trait that represents the metadata related to storage of blocks */
private[streaming] trait ReceivedBlockStoreResult {
// Any implementation of this trait will store a block id
def blockId: StreamBlockId
// Any implementation of this trait will have to return the number of records
//题外话:从这里得到一个启示,我们谈spark处理的数据量大小,专业的说法是处理多少条数据,而不是处理的数据的大小(因为每条数据可能有多个字段,当处理音频视频文件时数据量一般都很大)。
def numRecords: Option[Long]
}
我们说ReceiverSupervisorImpl发送的消息是发送给ReceiverTracker,以下源码是证据:
/* Remote RpcEndpointRef for the ReceiverTracker /
private val trackerEndpoint = RpcUtils.makeDriverRef(“ReceiverTracker”, env.conf, env.rpcEnv)
在ReceiverTracker启动时,新建的消息通讯体的名字确实是”ReceiverTracker”:
/** Start the endpoint and receiver execution thread. */
def start(): Unit = synchronized {
if (isTrackerStarted) {
throw new SparkException("ReceiverTracker already started")
}
if (!receiverInputStreams.isEmpty) {
endpoint = ssc.env.rpcEnv.setupEndpoint(
"ReceiverTracker", new ReceiverTrackerEndpoint(ssc.env.rpcEnv))
if (!skipReceiverLaunch) launchReceivers()
logInfo("ReceiverTracker started")
trackerState = Started
}
}
ReceiverTracker是整个block管理的中心,它使用ReceiverTrackerEndpoint消息循环体接受来自Receiver的消息,我们来看下该消息通讯体:
/** RpcEndpoint to receive messages from the receivers. */
private class ReceiverTrackerEndpoint(override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint {
// TODO Remove this thread pool after https://github.com/apache/spark/issues/7385 is merged
private val submitJobThreadPool = ExecutionContext.fromExecutorService(
ThreadUtils.newDaemonCachedThreadPool("submit-job-thread-pool"))
private val walBatchingThreadPool = ExecutionContext.fromExecutorService(
ThreadUtils.newDaemonCachedThreadPool("wal-batching-thread-pool"))
@volatile private var active: Boolean = true
override def receive: PartialFunction[Any, Unit] = {
// Local messages
case StartAllReceivers(receivers) =>
val scheduledLocations = schedulingPolicy.scheduleReceivers(receivers, getExecutors)
for (receiver <- receivers) {
val executors = scheduledLocations(receiver.streamId)
updateReceiverScheduledExecutors(receiver.streamId, executors)
receiverPreferredLocations(receiver.streamId) = receiver.preferredLocation
startReceiver(receiver, executors)
}
case RestartReceiver(receiver) =>
// Old scheduled executors minus the ones that are not active any more
val oldScheduledExecutors = getStoredScheduledExecutors(receiver.streamId)
val scheduledLocations = if (oldScheduledExecutors.nonEmpty) {
// Try global scheduling again
oldScheduledExecutors
} else {
val oldReceiverInfo = receiverTrackingInfos(receiver.streamId)
// Clear "scheduledLocations" to indicate we are going to do local scheduling
val newReceiverInfo = oldReceiverInfo.copy(
state = ReceiverState.INACTIVE, scheduledLocations = None)
receiverTrackingInfos(receiver.streamId) = newReceiverInfo
schedulingPolicy.rescheduleReceiver(
receiver.streamId,
receiver.preferredLocation,
receiverTrackingInfos,
getExecutors)
}
// Assume there is one receiver restarting at one time, so we don't need to update
// receiverTrackingInfos
startReceiver(receiver, scheduledLocations)
case c: CleanupOldBlocks =>
receiverTrackingInfos.values.flatMap(_.endpoint).foreach(_.send(c))
case UpdateReceiverRateLimit(streamUID, newRate) =>
for (info <- receiverTrackingInfos.get(streamUID); eP <- info.endpoint) {
eP.send(UpdateRateLimit(newRate))
}
// Remote messages
case ReportError(streamId, message, error) =>
reportError(streamId, message, error)
}
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
// Remote messages
case RegisterReceiver(streamId, typ, host, executorId, receiverEndpoint) =>
val successful =
registerReceiver(streamId, typ, host, executorId, receiverEndpoint, context.senderAddress)
context.reply(successful)
case AddBlock(receivedBlockInfo) =>
if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) {
walBatchingThreadPool.execute(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
if (active) {
context.reply(addBlock(receivedBlockInfo))
} else {
throw new IllegalStateException("ReceiverTracker RpcEndpoint shut down.")
}
}
})
} else {
context.reply(addBlock(receivedBlockInfo))
}
case DeregisterReceiver(streamId, message, error) =>
deregisterReceiver(streamId, message, error)
context.reply(true)
// Local messages
case AllReceiverIds =>
context.reply(receiverTrackingInfos.filter(_._2.state != ReceiverState.INACTIVE).keys.toSeq)
case StopAllReceivers =>
assert(isTrackerStopping || isTrackerStopped)
stopReceivers()
context.reply(true)
}
/** * Return the stored scheduled executors that are still alive. */
private def getStoredScheduledExecutors(receiverId: Int): Seq[TaskLocation] = {
if (receiverTrackingInfos.contains(receiverId)) {
val scheduledLocations = receiverTrackingInfos(receiverId).scheduledLocations
if (scheduledLocations.nonEmpty) {
val executors = getExecutors.toSet
// Only return the alive executors
scheduledLocations.get.filter {
case loc: ExecutorCacheTaskLocation => executors(loc)
case loc: TaskLocation => true
}
} else {
Nil
}
} else {
Nil
}
}
/** * Start a receiver along with its scheduled executors */
private def startReceiver(
receiver: Receiver[_],
scheduledLocations: Seq[TaskLocation]): Unit = {
def shouldStartReceiver: Boolean = {
// It's okay to start when trackerState is Initialized or Started
!(isTrackerStopping || isTrackerStopped)
}
val receiverId = receiver.streamId
if (!shouldStartReceiver) {
onReceiverJobFinish(receiverId)
return
}
val checkpointDirOption = Option(ssc.checkpointDir)
val serializableHadoopConf =
new SerializableConfiguration(ssc.sparkContext.hadoopConfiguration)
// Function to start the receiver on the worker node
val startReceiverFunc: Iterator[Receiver[_]] => Unit =
(iterator: Iterator[Receiver[_]]) => {
if (!iterator.hasNext) {
throw new SparkException(
"Could not start receiver as object not found.")
}
if (TaskContext.get().attemptNumber() == 0) {
val receiver = iterator.next()
assert(iterator.hasNext == false)
val supervisor = new ReceiverSupervisorImpl(
receiver, SparkEnv.get, serializableHadoopConf.value, checkpointDirOption)
supervisor.start()
supervisor.awaitTermination()
} else {
// It's restarted by TaskScheduler, but we want to reschedule it again. So exit it.
}
}
// Create the RDD using the scheduledLocations to run the receiver in a Spark job
val receiverRDD: RDD[Receiver[_]] =
if (scheduledLocations.isEmpty) {
ssc.sc.makeRDD(Seq(receiver), 1)
} else {
val preferredLocations = scheduledLocations.map(_.toString).distinct
ssc.sc.makeRDD(Seq(receiver -> preferredLocations))
}
receiverRDD.setName(s"Receiver $receiverId")
ssc.sparkContext.setJobDescription(s"Streaming job running receiver $receiverId")
ssc.sparkContext.setCallSite(Option(ssc.getStartSite()).getOrElse(Utils.getCallSite()))
val future = ssc.sparkContext.submitJob[Receiver[_], Unit, Unit](
receiverRDD, startReceiverFunc, Seq(0), (_, _) => Unit, ())
// We will keep restarting the receiver job until ReceiverTracker is stopped
future.onComplete {
case Success(_) =>
if (!shouldStartReceiver) {
onReceiverJobFinish(receiverId)
} else {
logInfo(s"Restarting Receiver $receiverId")
self.send(RestartReceiver(receiver))
}
case Failure(e) =>
if (!shouldStartReceiver) {
onReceiverJobFinish(receiverId)
} else {
logError("Receiver has been stopped. Try to restart it.", e)
logInfo(s"Restarting Receiver $receiverId")
self.send(RestartReceiver(receiver))
}
}(submitJobThreadPool)
logInfo(s"Receiver ${receiver.streamId} started")
}
override def onStop(): Unit = {
submitJobThreadPool.shutdownNow()
active = false
walBatchingThreadPool.shutdown()
}
/** * Call when a receiver is terminated. It means we won't restart its Spark job. */
private def onReceiverJobFinish(receiverId: Int): Unit = {
receiverJobExitLatch.countDown()
receiverTrackingInfos.remove(receiverId).foreach { receiverTrackingInfo =>
if (receiverTrackingInfo.state == ReceiverState.ACTIVE) {
logWarning(s"Receiver $receiverId exited but didn't deregister")
}
}
}
/** Send stop signal to the receivers. */
private def stopReceivers() {
receiverTrackingInfos.values.flatMap(_.endpoint).foreach { _.send(StopReceiver) }
logInfo("Sent stop signal to all " + receiverTrackingInfos.size + " receivers")
}
}
我们来过滤下ReceiverTracker的源码。
** Enumeration to identify current state of a Receiver */
private[streaming] object ReceiverState extends Enumeration {
//type是scala语法,别名
type ReceiverState = Value
val INACTIVE, SCHEDULED, ACTIVE = Value
}
/** * Messages used by the NetworkReceiver and the ReceiverTracker to communicate * with each other. */
//sealed:是scala语法,表示该类包含它所有的子类, 即所有的消息类型都在这里。
private[streaming] sealed trait ReceiverTrackerMessage
private[streaming] case class RegisterReceiver( streamId: Int, typ: String, host: String, executorId: String, receiverEndpoint: RpcEndpointRef ) extends ReceiverTrackerMessage
private[streaming] case class AddBlock(receivedBlockInfo: ReceivedBlockInfo)
extends ReceiverTrackerMessage
private[streaming] case class ReportError(streamId: Int, message: String, error: String)
private[streaming] case class DeregisterReceiver(streamId: Int, msg: String, error: String)
extends ReceiverTrackerMessage
/** * Messages used by the driver and ReceiverTrackerEndpoint to communicate locally. */
private[streaming] sealed trait ReceiverTrackerLocalMessage
/** * This message will trigger ReceiverTrackerEndpoint to restart a Spark job for the receiver. */
private[streaming] case class RestartReceiver(receiver: Receiver[_])
extends ReceiverTrackerLocalMessage
/** * This message is sent to ReceiverTrackerEndpoint when we start to launch Spark jobs for receivers * at the first time. */
private[streaming] case class StartAllReceivers(receiver: Seq[Receiver[_]])
extends ReceiverTrackerLocalMessage
/** * This message will trigger ReceiverTrackerEndpoint to send stop signals to all registered * receivers. */
private[streaming] case object StopAllReceivers extends ReceiverTrackerLocalMessage
/** * A message used by ReceiverTracker to ask all receiver's ids still stored in * ReceiverTrackerEndpoint. */
//ReceiverTrackerEndpoint存储了AllReceiverIds信息
private[streaming] case object AllReceiverIds extends ReceiverTrackerLocalMessage
private[streaming] case class UpdateReceiverRateLimit(streamUID: Int, newRate: Long)
extends ReceiverTrackerLocalMessage
receiver执行的管理包括3个方面:receiver的启动与重新启动,receiver的回收,receiver执行过程中接受数据的管理。
/** * This class manages the execution of the receivers of ReceiverInputDStreams. Instance of * this class must be created after all input streams have been added and StreamingContext.start() * has been called because it needs the final set of input streams at the time of instantiation. * * @param skipReceiverLaunch Do not launch the receiver. This is useful for testing. */
private[streaming]
class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false)
//ReceiverTracker是driver级别,不需要发送到executor,故不需要序列化。
extends Logging {
//从DstreamGraph中取得所有的input streams信息
private val receiverInputStreams = ssc.graph.getReceiverInputStreams()
private val receiverInputStreamIds = receiverInputStreams.map { _.id }
private val receivedBlockTracker = new ReceivedBlockTracker(
ssc.sparkContext.conf,
ssc.sparkContext.hadoopConfiguration,
receiverInputStreamIds,
ssc.scheduler.clock,
ssc.isCheckpointPresent,
Option(ssc.checkpointDir)
)
//listenerBus对监控很关键
private val listenerBus = ssc.scheduler.listenerBus
我们来看下ReceiverTracker对receiveAndReply消息的处理:
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
// Remote messages
case RegisterReceiver(streamId, typ, host, executorId, receiverEndpoint) =>
val successful =
registerReceiver(streamId, typ, host, executorId, receiverEndpoint, context.senderAddress)
context.reply(successful)
case AddBlock(receivedBlockInfo) =>
if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) {
//这里用到了线程池里的线程,因为wal的处理是耗时的,不能阻塞主线程
walBatchingThreadPool.execute(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
if (active) {
context.reply(addBlock(receivedBlockInfo))
} else {
throw new IllegalStateException("ReceiverTracker RpcEndpoint shut down.")
}
}
})
} else {
//非wal模式的处理:
context.reply(addBlock(receivedBlockInfo))
}
case DeregisterReceiver(streamId, message, error) =>
deregisterReceiver(streamId, message, error)
context.reply(true)
// Local messages
case AllReceiverIds =>
context.reply(receiverTrackingInfos.filter(_._2.state != ReceiverState.INACTIVE).keys.toSeq)
case StopAllReceivers =>
assert(isTrackerStopping || isTrackerStopped)
stopReceivers()
context.reply(true)
}
无论是否启动了wal,都会调用receivedBlockTracker.addBlock()来处理:
/** Add new blocks for the given stream */
private def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
receivedBlockTracker.addBlock(receivedBlockInfo)
}
来看下receivedBlockTracker:
/**
* Class that keep track of all the received blocks, and allocate them to batches
* when required. All actions taken by this class can be saved to a write ahead log
* (if a checkpoint directory has been provided), so that the state of the tracker
* (received blocks and block-to-batch allocations) can be recovered after driver failure.
*
* Note that when any instance of this class is created with a checkpoint directory,
* it will try reading events from logs in the directory.
*/
private[streaming] class ReceivedBlockTracker(
conf: SparkConf,
hadoopConf: Configuration,
streamIds: Seq[Int],
clock: Clock,
recoverFromWriteAheadLog: Boolean,
checkpointDirOption: Option[String])
extends Logging {
来看下receivedBlockTracker.addBlock(receivedBlockInfo):
/** Add received block. This event will get written to the write ahead log (if enabled). */
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
try {
val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo))
if (writeResult) {
synchronized {
getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
}
logDebug(s"Stream ${receivedBlockInfo.streamId} received " +
s"block ${receivedBlockInfo.blockStoreResult.blockId}")
} else {
logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " +
s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.")
}
writeResult
} catch {
case NonFatal(e) =>
logError(s"Error adding block $receivedBlockInfo", e)
false
}
}
receivedBlockTracker把unallocated blocks 分配给batch:
/** * Allocate all unallocated blocks to the given batch. * This event will get written to the write ahead log (if enabled). */
def allocateBlocksToBatch(batchTime: Time): Unit = synchronized {
if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) {
val streamIdToBlocks = streamIds.map { streamId =>
(streamId, getReceivedBlockQueue(streamId).dequeueAll(x => true))
}.toMap
val allocatedBlocks = AllocatedBlocks(streamIdToBlocks)
if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) {
timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
lastAllocatedBatchTime = batchTime
} else {
logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery")
}
} else {
// This situation occurs when:
// 1. WAL is ended with BatchAllocationEvent, but without BatchCleanupEvent,
// possibly processed batch job or half-processed batch job need to be processed again,
// so the batchTime will be equal to lastAllocatedBatchTime.
// 2. Slow checkpointing makes recovered batch time older than WAL recovered
// lastAllocatedBatchTime.
// This situation will only occurs in recovery time.
logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery")
}
}
我们来看下ReceiverTracker对receive消息的处理:
override def receive: PartialFunction[Any, Unit] = {
// Local messages
case StartAllReceivers(receivers) =>
val scheduledLocations = schedulingPolicy.scheduleReceivers(receivers, getExecutors)
for (receiver <- receivers) {
val executors = scheduledLocations(receiver.streamId)
updateReceiverScheduledExecutors(receiver.streamId, executors)
receiverPreferredLocations(receiver.streamId) = receiver.preferredLocation
startReceiver(receiver, executors)
}
//当receiver出故障时,需要发消息重启
case RestartReceiver(receiver) =>
// Old scheduled executors minus the ones that are not active any more
val oldScheduledExecutors = getStoredScheduledExecutors(receiver.streamId)
val scheduledLocations = if (oldScheduledExecutors.nonEmpty) {
// Try global scheduling again
oldScheduledExecutors
} else {
val oldReceiverInfo = receiverTrackingInfos(receiver.streamId)
// Clear "scheduledLocations" to indicate we are going to do local scheduling
val newReceiverInfo = oldReceiverInfo.copy(
state = ReceiverState.INACTIVE, scheduledLocations = None)
receiverTrackingInfos(receiver.streamId) = newReceiverInfo
schedulingPolicy.rescheduleReceiver(
receiver.streamId,
receiver.preferredLocation,
receiverTrackingInfos,
getExecutors)
}
// Assume there is one receiver restarting at one time, so we don't need to update
// receiverTrackingInfos
startReceiver(receiver, scheduledLocations)
//清楚掉处理过了的不需要的block信息
case c: CleanupOldBlocks =>
receiverTrackingInfos.values.flatMap(_.endpoint).foreach(_.send(c))
//限制数据处理速度,限流
case UpdateReceiverRateLimit(streamUID, newRate) =>
for (info <- receiverTrackingInfos.get(streamUID); eP <- info.endpoint) {
eP.send(UpdateRateLimit(newRate))
}
// Remote messages
//报错
case ReportError(streamId, message, error) =>
reportError(streamId, message, error)
}
总结:receiver接受到数据,合并并存储数据之后,ReceiverSupervisorImpl会把元数据汇报给ReceiverTracker,ReceiverTracker接受到元数据之后,内部有个ReceivedBlockTracker来会管理接受的数据的分配,JobGenerator会在每个batchDuartion内从ReceiverTracker获取元数据信息,据此生成rdd。
ReceivedBlockTracker在内部专门来管理block的元数据信息。
从设计模式角度讲,表面干活的是ReceiverTracker,而实际干活的是ReceivedBlockTracker,这是门面模式 Facade模式。
本次分享来自于王家林老师的课程‘源码版本定制发行班’,在此向王家林老师表示感谢!
王家林老师新浪微博:http://weibo.com/ilovepains
王家林老师博客:http://blog.sina.com.cn/ilovepains
欢迎大家交流技术知识!一起学习,共同进步!
笔者的微博:http://weibo.com/keepstriving