Spark Streaming是Spark Core API的一种扩展,它可以用于进行大规模、高吞吐量、容错的实时数据流的处理。它支持从很多种数据源中读取数据,比如Kafka、Flume、Twitter、ZeroMQ、Kinesis或者是TCP Socket。并且能够使用类似高阶函数的复杂算法来进行数据处理,比如map、reduce、join和window。处理后的数据可以被保存到文件系统、数据库、Dashboard等存储中
Spark Streaming有以下特点:
Spark Streaming内部的基本工作原理如下:接收实时输入数据流,然后将数据拆分成多个batch,比如每收集1秒的数据封装为一个batch,然后将每个batch交给Spark的计算引擎进行处理,最后会生产出一个结果数据流,其中的数据,也是由一个一个的batch所组成的。
从图中也能看出它将输入的数据分成多个batch进行处理,严格来说spark streaming 并不是一个真正的实时框架,因为他是分批次进行处理的
Spark Streaming提供了一种高级的抽象,叫做DStream,英文全称为Discretized Stream,中文翻译为“离散流”,它代表了一个持续不断的数据流。DStream可以通过输入数据源来创建,比如Kafka、Flume和Kinesis;也可以通过对其他DStream应用高阶函数来创建,比如map、reduce、join、window。
DStream的内部,其实一系列持续不断产生的RDD。RDD是Spark Core的核心抽象,即,不可变的,分布式的数据集。DStream中的每个RDD都包含了一个时间段内的数据。
对DStream应用的算子,比如map,其实在底层会被翻译为对DStream中每个RDD的操作。比如对一个DStream执行一个map操作,会产生一个新的DStream。但是,在底层,其实其原理为,对输入DStream中每个时间段的RDD,都应用一遍map操作,然后生成的新的RDD,即作为新的DStream中的那个时间段的一个RDD。底层的RDD的transformation操作,其实,还是由Spark Core的计算引擎来实现的。Spark Streaming对Spark Core进行了一层封装,隐藏了细节,然后对开发人员提供了方便易用的高层次的API。
Spark Streaming是将流式计算分解成一系列短小的批处理作业。这里的批处理引擎是Spark Core,也就是把Spark Streaming的输入数据按照batch size(如1秒)分成一段一段的数据(Discretized Stream),每一段数据都转换成Spark中的RDD(Resilient Distributed Dataset),然后将Spark Streaming中对DStream的Transformation操作变为针对Spark中对RDD的Transformation操作,将RDD经过操作变成中间结果保存在内存中。
StreamingContext 初始化
第一步:StreamingContext
源码地址:org.apache.spark.streaming.StreamingContext.scala
class StreamingContext private[streaming] (
_sc: SparkContext,
_cp: Checkpoint,
_batchDur: Duration
) extends Logging {
.......
/**
* 重要组件DStreamGraph
* Spark Streaming Application中,一系列的DStream的依赖关系
* 以及互相之间算子的应用
*/
private[streaming] val graph: DStreamGraph = {
if (isCheckpointPresent) {
_cp.graph.setContext(this)
_cp.graph.restoreCheckpointData()
_cp.graph
} else {
require(_batchDur != null, "Batch duration for StreamingContext cannot be null")
val newGraph = new DStreamGraph()
newGraph.setBatchDuration(_batchDur)
newGraph
}
}
//核心组件,涉及到job调度
//底层还是基于Spark的核心计算引擎
private[streaming] val scheduler = new JobScheduler(this)
JobScheduler主要负责以下几种任务:
第二步:在创建和完成StreamingContext 的初始化之后,创建了DStreamGraph JobScheduler等关键组件之后
就会调用StreamingContext 的socketTextStream等方法,来创建输入DStream
def socketTextStream(
hostname: String,
port: Int,
storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
): ReceiverInputDStream[String] = withNamedScope("socket text stream") {
socketStream[String](hostname, port, SocketReceiver.bytesToLines, storageLevel)
}
def socketStream[T: ClassTag](
hostname: String,
port: Int,
converter: (InputStream) => Iterator[T],
storageLevel: StorageLevel
): ReceiverInputDStream[T] = {
//new 出来一个 DStream 具体子类 SocketInputDStream 的实例
new SocketInputDStream[T](this, hostname, port, converter, storageLevel)
}
第三步:SocketInputDStream
private[streaming]
class SocketInputDStream[T: ClassTag](
_ssc: StreamingContext,
host: String,
port: Int,
bytesToObjects: InputStream => Iterator[T],
storageLevel: StorageLevel
) extends ReceiverInputDStream[T](_ssc) {
/**
* 输入DStream,一定都会有一个重要方法getReceiver()
* 这个方法负责返回DStream的Receiver
*/
def getReceiver(): Receiver[T] = {
new SocketReceiver(host, port, bytesToObjects, storageLevel)
}
}
第一步:StreamingContext初始化完成之后,会调用StreamingContext.start()方法
def start(): Unit = synchronized {
state match {
case INITIALIZED =>
startSite.set(DStream.getCreationSite())
StreamingContext.ACTIVATION_LOCK.synchronized {
StreamingContext.assertNoOtherContextIsActive()
try {
validate()
// Start the streaming scheduler in a new thread, so that thread local properties
// like call sites and job groups can be reset without affecting those of the
// current thread.
ThreadUtils.runInNewThread("streaming-start") {
sparkContext.setCallSite(startSite.get)
sparkContext.clearJobGroup()
sparkContext.setLocalProperty(SparkContext.SPARK_JOB_INTERRUPT_ON_CANCEL, "false")
savedProperties.set(SerializationUtils.clone(sparkContext.localProperties.get()))
//核心代码,启动子线程,为了本地初始化工作,另外一方面是不要阻塞主线程
//启动JobScheduler的start方法
scheduler.start()
}
state = StreamingContextState.ACTIVE
scheduler.listenerBus.post(
StreamingListenerStreamingStarted(System.currentTimeMillis()))
} catch {
case NonFatal(e) =>
logError("Error starting the context, marking it as stopped", e)
scheduler.stop(false)
state = StreamingContextState.STOPPED
throw e
}
StreamingContext.setActiveContext(this)
}
logDebug("Adding shutdown hook") // force eager creation of logger
shutdownHookRef = ShutdownHookManager.addShutdownHook(
StreamingContext.SHUTDOWN_HOOK_PRIORITY)(() => stopOnShutdown())
// Registering Streaming Metrics at the start of the StreamingContext
assert(env.metricsSystem != null)
env.metricsSystem.registerSource(streamingSource)
uiTab.foreach(_.attach())
logInfo("StreamingContext started")
case ACTIVE =>
logWarning("StreamingContext has already been started")
case STOPPED =>
throw new IllegalStateException("StreamingContext has already been stopped")
}
}
第二步:scheduler.start()方法
def start(): Unit = synchronized {
if (eventLoop != null) return // scheduler has already been started
logDebug("Starting JobScheduler")
eventLoop = new EventLoop[JobSchedulerEvent]("JobScheduler") {
override protected def onReceive(event: JobSchedulerEvent): Unit = processEvent(event)
override protected def onError(e: Throwable): Unit = reportError("Error in job scheduler", e)
}
// 启动消息循环处理线程。用于处理JobScheduler的各种事件
eventLoop.start()
// attach rate controllers of input streams to receive batch completion updates
for {
inputDStream <- ssc.graph.getInputStreams
rateController <- inputDStream.rateController
} ssc.addStreamingListener(rateController)
listenerBus.start()
//ReceiverTracker核心组件,数据接收相关
receiverTracker = new ReceiverTracker(ssc)
inputInfoTracker = new InputInfoTracker(ssc)
val executorAllocClient: ExecutorAllocationClient = ssc.sparkContext.schedulerBackend match {
case b: ExecutorAllocationClient => b.asInstanceOf[ExecutorAllocationClient]
case _ => null
}
executorAllocationManager = ExecutorAllocationManager.createIfEnabled(
executorAllocClient,
receiverTracker,
ssc.conf,
ssc.graph.batchDuration.milliseconds,
clock)
executorAllocationManager.foreach(ssc.addStreamingListener)
//启动
receiverTracker.start()
//创建JobScheduler的时候,直接JobGenerator创建出来了
//启动
jobGenerator.start()
executorAllocationManager.foreach(_.start())
logInfo("Started JobScheduler")
}
第三步: receiverTracker.start()
def start(): Unit = synchronized {
if (isTrackerStarted) {
throw new SparkException("ReceiverTracker already started")
}
//Receiver的启动是依据数据流的
if (!receiverInputStreams.isEmpty) {
endpoint = ssc.env.rpcEnv.setupEndpoint(
"ReceiverTracker", new ReceiverTrackerEndpoint(ssc.env.rpcEnv)) //汇报状态信息
//内部的launchReceivers()方法,启动Receivers
if (!skipReceiverLaunch) launchReceivers()
logInfo("ReceiverTracker started")
trackerState = Started
}
}
//每次调用StreamingContext创建一个DStream时,都会放入DStreamGraph的ReceiverInputStreams
private val receiverInputStreams = ssc.graph.getReceiverInputStreams()
private def launchReceivers(): Unit = {
//将程序中创建的所有DStream,调用getReceiver()拿到一个receivers集合
val receivers = receiverInputStreams.map { nis =>
//一个数据输入来源(receiverInputDStream)只产生一个Receiver
val rcvr = nis.getReceiver()
rcvr.setReceiverId(nis.id)
rcvr
}
//启动虚拟Job来分配Receiver到不同的executor上
runDummySparkJob()
logInfo("Starting " + receivers.length + " receivers")
//给消息接收处理器 endpoint 发送 StartAllReceivers(receivers)消息
endpoint.send(StartAllReceivers(receivers))
}
//确保所有节点活着,而且避免所有的receivers集中在一个节点上
private def runDummySparkJob(): Unit = {
if (!ssc.sparkContext.isLocal) {
ssc.sparkContext.makeRDD(1 to 50, 50).map(x => (x, 1)).reduceByKey(_ + _, 20).collect()
}
assert(getExecutors.nonEmpty)
}
第六步:StartAllReceivers消息
// schedulingPolicy调度策略
private val schedulingPolicy = new ReceiverSchedulingPolicy()
override def receive: PartialFunction[Any, Unit] = {
// Local messages
case StartAllReceivers(receivers) =>
/**
* scheduleReceivers就可以确定receiver可以运行在哪些Executor上
* receivers:要启动的receiver
* getExecutors:获得集群中的Executors的列表
*/
val scheduledLocations = schedulingPolicy.scheduleReceivers(receivers, getExecutors)
for (receiver <- receivers) {
// scheduledLocations根据receiver的Id就找到了当前那些Executors可以运行Receiver
val executors = scheduledLocations(receiver.streamId)
updateReceiverScheduledExecutors(receiver.streamId, executors)
// 保存流数据接收器Receiver首选位置
receiverPreferredLocations(receiver.streamId) = receiver.preferredLocation
//循环receivers,每次将一个receiver传入过去
startReceiver(receiver, executors)
}
//从新启动Receiver
case RestartReceiver(receiver) =>
// Old scheduled executors minus the ones that are not active any more
//如果Receiver失败的话,从可选列表中减去
//刚在调度为Receiver分配给哪个Executor的时候会有一些列可选的Executor列表
val oldScheduledExecutors = getStoredScheduledExecutors(receiver.streamId)
//从新获取Executors
val scheduledLocations = if (oldScheduledExecutors.nonEmpty) {
// Try global scheduling again
oldScheduledExecutors
} else {
//如果可选的Executor使用完了,则会重新执行rescheduleReceiver重新获取Executor.
val oldReceiverInfo = receiverTrackingInfos(receiver.streamId)
// Clear "scheduledLocations" to indicate we are going to do local scheduling
val newReceiverInfo = oldReceiverInfo.copy(
state = ReceiverState.INACTIVE, scheduledLocations = None)
receiverTrackingInfos(receiver.streamId) = newReceiverInfo
schedulingPolicy.rescheduleReceiver(
receiver.streamId,
receiver.preferredLocation,
receiverTrackingInfos,
getExecutors)
}
// Assume there is one receiver restarting at one time, so we don't need to update
// receiverTrackingInfos
//重复调用startReceiver
startReceiver(receiver, scheduledLocations)
case c: CleanupOldBlocks =>
receiverTrackingInfos.values.flatMap(_.endpoint).foreach(_.send(c))
case UpdateReceiverRateLimit(streamUID, newRate) =>
for (info <- receiverTrackingInfos.get(streamUID); eP <- info.endpoint) {
eP.send(UpdateRateLimit(newRate))
}
// Remote messages
case ReportError(streamId, message, error) =>
reportError(streamId, message, error)
}
从注释中可以看到,Spark Streaming指定receiver在那些Executors运行,而不是基于Spark Core中的Task来指定。
第七步:schedulingPolicy.scheduleReceivers(receivers, getExecutors)
/**
* scheduleReceivers在执行时,首先将从executors列表转换成Map格式,单个元素host->”host:port”格式。
* 然后遍历receivers列表,为其逐个分配节点,分配过程如下:
*
* 从列表取出一个receiver,判断其是否具有preferredLocation.(可优先选择机器) 此方法在Receiver基类中声明,要求子类进行实现。
* 举列中SocketReceiver未重写此方法,因此不具preferredLocation,因此未执行Receiver的Executor节点可以随意选取。
*
* 这边可以得到一个结论就一个receiver对应一个Executor
*/
def scheduleReceivers(
receivers: Seq[Receiver[_]],
executors: Seq[ExecutorCacheTaskLocation]): Map[Int, Seq[TaskLocation]] = {
//ExecutorCacheTaskLocation是TaskLocation的子类
if (receivers.isEmpty) {
return Map.empty
}
if (executors.isEmpty) {
//如果没有指定泛型,默认是Nothing
return receivers.map(_.streamId -> Seq.empty).toMap
}
//groupBy()按相同的host分组,返回Map[host,List[ExecutorCacheTaskLocation]]
val hostToExecutors = executors.groupBy(_.host)
//按receivers有多少个,返回的List[ArrayBuffer[TaskLocation]]元素就有多少个取决于receivers.length
val scheduledLocations = Array.fill(receivers.length)(new mutable.ArrayBuffer[TaskLocation])
/**
* 代表数据存储在 executor 的内存中,也就是这个 partition的recode 被 cache到内存了
* 比如 KafkaRDD 会将 partitions 都 cache 到内存,其 toString 方法返回的格式如 executor_$host_$executorId
*/
val numReceiversOnExecutor = mutable.HashMap[ExecutorCacheTaskLocation, Int]()
// Set the initial value to 0
//将HashMap中的每个key(ExecutorCacheTaskLocation)对应的值设置成0
executors.foreach(e => numReceiversOnExecutor(e) = 0)
// Firstly, we need to respect "preferredLocation". So if a receiver has "preferredLocation",
// we need to make sure the "preferredLocation" is in the candidate scheduled executor list.
/**
* 查看Receiver是否preferredLocation数据本地性,如果有要将有数据本地性的receiver放到executor调度列表中
* 对于SocketReceiver它是没有实现这个方法,返回的是None,所以不会进入receivers(i).preferredLocation.foreach方法中
*/
for (i <- 0 until receivers.length) {
// Note: preferredLocation is host but executors are host_executorId
//数据本地性只指明的是host的主机名,而executors除了有host还有executorId 如:host_executorId
receivers(i).preferredLocation.foreach { host =>
hostToExecutors.get(host) match {
//executorsOnHost若有值得到List[ExecutorCacheTaskLocation]
case Some(executorsOnHost) =>
// preferredLocation is a known host. Select an executor that has the least receivers in
// this host
//minBy方法返回numReceiversOnExecutor的value元素中最小值对应executorsOnHost集合中元素
val leastScheduledExecutor =
executorsOnHost.minBy(executor => numReceiversOnExecutor(executor))
scheduledLocations(i) += leastScheduledExecutor
numReceiversOnExecutor(leastScheduledExecutor) =
numReceiversOnExecutor(leastScheduledExecutor) + 1
case None =>
// preferredLocation is an unknown host.
// Note: There are two cases:
// 1. This executor is not up. But it may be up later.
// 2. This executor is dead, or it's not a host in the cluster.
// Currently, simply add host to the scheduled executors.
// Note: host could be `HDFSCacheTaskLocation`, so use `TaskLocation.apply` to handle
// this case
scheduledLocations(i) += TaskLocation(host)
}
}
}
// For those receivers that don't have preferredLocation, make sure we assign at least one
// executor to them.
//对于那些没有preferredLocation的接收器,确保至少分配一个执行者给他们
for (scheduledLocationsForOneReceiver <- scheduledLocations.filter(_.isEmpty)) {
// Select the executor that has the least receivers
//会将numReceiversOnExecutor这个mutable.HashMap[ExecutorCacheTaskLocation, Int]()拿第1个元素
//给ordering比较取出最小的元素 map的kv元素可以转成tuple
val (leastScheduledExecutor, numReceivers) = numReceiversOnExecutor.minBy(_._2)
scheduledLocationsForOneReceiver += leastScheduledExecutor
numReceiversOnExecutor(leastScheduledExecutor) = numReceivers + 1
}
// Assign idle executors to receivers that have less executors
//赋空闲的executors给哪些没有executors的receivers
val idleExecutors = numReceiversOnExecutor.filter(_._2 == 0).map(_._1)
for (executor <- idleExecutors) {
// Assign an idle executor to the receiver that has least candidate executors.
val leastScheduledExecutors = scheduledLocations.minBy(_.size)
leastScheduledExecutors += executor
}
//zip会将两个集合中索引相同的元素放在新集合的tuple元素中
receivers.map(_.streamId).zip(scheduledLocations).toMap
}
调度过程如下:
如果Receiver设置了preferredLocation且preferredLocation所对应的主机存在此应用的Executor的情况下,也不一定保证Receiver调度至此Executor
第八步:startReceiver(receiver, executors)
private def startReceiver(
receiver: Receiver[_],
//scheduledLocations 指定的是在具体的那台物理机器上执行
scheduledLocations: Seq[TaskLocation]): Unit = {
//判断下Receiver的状态是否正常
def shouldStartReceiver: Boolean = {
// It's okay to start when trackerState is Initialized or Started
!(isTrackerStopping || isTrackerStopped)
}
val receiverId = receiver.streamId
//如果不需要启动Receiver则会调用onReceiverJobFinish()
if (!shouldStartReceiver) {
onReceiverJobFinish(receiverId)
return
}
val checkpointDirOption = Option(ssc.checkpointDir)
val serializableHadoopConf =
new SerializableConfiguration(ssc.sparkContext.hadoopConfiguration)
// Function to start the receiver on the worker node
//startReceiverFunc封装了在worker上启动receiver的动作
val startReceiverFunc: Iterator[Receiver[_]] => Unit =
(iterator: Iterator[Receiver[_]]) => {
if (!iterator.hasNext) {
throw new SparkException(
"Could not start receiver as object not found.")
}
if (TaskContext.get().attemptNumber() == 0) {
val receiver = iterator.next()
assert(iterator.hasNext == false)
// ReceiverSupervisorImpl是Receiver的监控器,同时负责数据的写等操作
val supervisor = new ReceiverSupervisorImpl(
receiver, SparkEnv.get, serializableHadoopConf.value, checkpointDirOption)
supervisor.start()//启动Receiver
supervisor.awaitTermination()
} else {
//如果你想重新启动receiver的话,你需要重新完成上面的调度,从新schedule,而不是Task重试
// It's restarted by TaskScheduler, but we want to reschedule it again. So exit it.
}
}
// Create the RDD using the scheduledLocations to run the receiver in a Spark job
//依据preferredLocation将Receiver包装成RDD
val receiverRDD: RDD[Receiver[_]] =
if (scheduledLocations.isEmpty) {
ssc.sc.makeRDD(Seq(receiver), 1)
} else {
val preferredLocations = scheduledLocations.map(_.toString).distinct
ssc.sc.makeRDD(Seq(receiver -> preferredLocations))
}
//receiverId可以看出,receiver只有一个
receiverRDD.setName(s"Receiver $receiverId")
ssc.sparkContext.setJobDescription(s"Streaming job running receiver $receiverId")
ssc.sparkContext.setCallSite(Option(ssc.getStartSite()).getOrElse(Utils.getCallSite()))
//每个Receiver的启动都会触发一个Job,而不是一个作业的Task去启动所有的Receiver
//应用程序一般会有很多Receiver,调用SparkContext的submitJob,为了启动Receiver,启动了Spark一个作业
val future = ssc.sparkContext.submitJob[Receiver[_], Unit, Unit](
receiverRDD, startReceiverFunc, Seq(0), (_, _) => Unit, ())
// We will keep restarting the receiver job until ReceiverTracker is stopped
future.onComplete {
case Success(_) =>
if (!shouldStartReceiver) {
onReceiverJobFinish(receiverId)
} else {
logInfo(s"Restarting Receiver $receiverId")
self.send(RestartReceiver(receiver))
}
case Failure(e) =>
//shouldStartReceiver默认是true
if (!shouldStartReceiver) {
onReceiverJobFinish(receiverId)
} else {
logError("Receiver has been stopped. Try to restart it.", e)
logInfo(s"Restarting Receiver $receiverId")
//当Receiver启动失败的话,就会调用ReceiverTrackEndpoint重新启动一个Spark Job去启动Receiver
self.send(RestartReceiver(receiver))
}
//使用线程池的方式提交Job,这样的好处是可以并发的启动Receiver。
}(ThreadUtils.sameThread)
logInfo(s"Receiver ${receiver.streamId} started")
}
在startReceiver方法中,依据preferredLocation将Receiver包装成RDD,以SparkJob的形式提交作业, Receiver作为Task 以线程方式执行,Task 被发送到Executor中,从RDD中取出“Receiver”然后对它执行startReceiverFunc函数,在函数中创建了一个ReceiverSupervisorImpl对象。它用来管理具体的Receiver。
第九步:new ReceiverSupervisorImpl(receiver, SparkEnv.get, serializableHadoopConf.value, checkpointDirOption).start()
def start() {
//创建BlockGenerator并启动
onStart()
//启动Receiver
startReceiver()
}
第十步:ReceiverSupervisorImpl中 onStart()方法
override protected def onStart() {
//运行worker的Executor端负责数据接收后的存取工作
registeredBlockGenerators.asScala.foreach { _.start() }
}
第十一步: startReceiver()
/** Start receiver */
def startReceiver(): Unit = synchronized {
try {
//验证Receiver是否合法
if (onReceiverStart()) {
logInfo(s"Starting receiver $streamId")
receiverState = Started
receiver.onStart()
logInfo(s"Called receiver $streamId onStart")
} else {
// The driver refused us
stop("Registered unsuccessfully because Driver refused to start receiver " + streamId, None)
}
} catch {
case NonFatal(t) =>
stop("Error starting receiver " + streamId, Some(t))
}
}
第十二步: onReceiverStart()
//向ReceiverTracker注册Receiver信息,并验证Receiver是否合法
override protected def onReceiverStart(): Boolean = {
val msg = RegisterReceiver(
streamId, receiver.getClass.getSimpleName, host, executorId, endpoint)
trackerEndpoint.askSync[Boolean](msg)
}
第十三步: receiver.onStart()(SocketInputDStream中的SocketReceiver为例进行说明)
private[streaming]
class SocketReceiver[T: ClassTag](
host: String,
port: Int,
bytesToObjects: InputStream => Iterator[T],
storageLevel: StorageLevel
) extends Receiver[T](storageLevel) with Logging {
private var socket: Socket = _
def onStart() {
logInfo(s"Connecting to $host:$port")
try {
//程序传入的host和post创建socket对象
socket = new Socket(host, port)
} catch {
case e: ConnectException =>
restart(s"Error connecting to $host:$port", e)
return
}
logInfo(s"Connected to $host:$port")
// Start the thread that receives data over a connection
//启动后台线程,调用receive方法
new Thread("Socket Receiver") {
setDaemon(true)
override def run() { receive() }
}.start()
}
def onStop() {
// in case restart thread close it twice
synchronized {
if (socket != null) {
socket.close()
socket = null
logInfo(s"Closed socket to $host:$port")
}
}
}
/** Create a socket connection and receive data until receiver is stopped */
//启动socket开始接收数据
def receive() {
try {
val iterator = bytesToObjects(socket.getInputStream())
while(!isStopped && iterator.hasNext) {
store(iterator.next()) //接收后的数据进行存储
}
if (!isStopped()) {
restart("Socket data stream had no more data")
} else {
logInfo("Stopped receiving")
}
} catch {
case NonFatal(e) =>
logWarning("Error receiving data", e)
restart("Error receiving data", e)
} finally {
onStop()
}
}
}
第十四步:store()方法
/**
* Store a single item of received data to Spark's memory.
* These single items will be aggregated together into data blocks before
* being pushed into Spark's memory.
*/
def store(dataItem: T) {
supervisor.pushSingle(dataItem)
}
第十五步:supervisor的pushSingle()方法
源码地址:org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.class
private val defaultBlockGenerator = createBlockGenerator(defaultBlockGeneratorListener)
def pushSingle(data: Any) {
//defaultBlockGenerator 即为BlockGenerator
defaultBlockGenerator.addData(data)
}
第十六步:BlockGenerator.addData(data)方法
/**
* Push a single data item into the buffer.
* defaultBlockGenerator.addData(data)调用
*/
def addData(data: Any): Unit = {
if (state == Active) {
waitToPush()
synchronized {
if (state == Active) {
//存放ArrayBuffer[Any]中
currentBuffer += data
} else {
throw new SparkException(
"Cannot add data as BlockGenerator has not been started or has been stopped")
}
}
} else {
throw new SparkException(
"Cannot add data as BlockGenerator has not been started or has been stopped")
}
}
第一步:Receiver启动之后,其将开始接收外部数据源的数据,ReceiverSupervisor的onStart()方法
private val registeredBlockGenerators = new ConcurrentLinkedQueue[BlockGenerator]()
override protected def onStart() {
//运行worker的Executor端负责数据接收后的存取工作
registeredBlockGenerators.asScala.foreach { _.start() }
}
override def createBlockGenerator(
blockGeneratorListener: BlockGeneratorListener): BlockGenerator = {
// Cleanup BlockGenerators that have already been stopped
val stoppedGenerators = registeredBlockGenerators.asScala.filter{ _.isStopped() }
stoppedGenerators.foreach(registeredBlockGenerators.remove(_))
val newBlockGenerator = new BlockGenerator(blockGeneratorListener, streamId, env.conf)
registeredBlockGenerators.add(newBlockGenerator)
newBlockGenerator
}
第二步:BlockGenerator的start方法
//默认值200毫秒
private val blockIntervalMs = conf.getTimeAsMs("spark.streaming.blockInterval", "200ms")
require(blockIntervalMs > 0, s"'spark.streaming.blockInterval' should be a positive value")
//blockIntervalTimer为定时器任务,每隔200毫秒就会去调用一个updateCurrentBuffer函数
private val blockIntervalTimer =
new RecurringTimer(clock, blockIntervalMs, updateCurrentBuffer, "BlockGenerator")
//blockPushingThread为新线程,负载不断的从阻塞队列中取出打包的数据
private val blockPushingThread = new Thread() { override def run() { keepPushingBlocks() } }
//currentBuffer,就是存放原始的数据
@volatile private var currentBuffer = new ArrayBuffer[Any]
/** Start block generating and pushing threads. */
def start(): Unit = synchronized {
if (state == Initialized) {
state = Active
//blockIntervalTimer启动,负责将currentBuffer中的原始数据打包封装成一个个block
blockIntervalTimer.start()
//blockPushingThread负责blocksForPushing中的block,调用pushArrayBuffer()方法
blockPushingThread.start()
logInfo("Started BlockGenerator")
} else {
throw new SparkException(
s"Cannot start BlockGenerator as its not in the Initialized state [state = $state]")
}
}
/**
* Push a single data item into the buffer.
* defaultBlockGenerator.addData(data)调用
*/
def addData(data: Any): Unit = {
if (state == Active) {
//控制接收速率
waitToPush()
synchronized {
if (state == Active) {
currentBuffer += data
} else {
throw new SparkException(
"Cannot add data as BlockGenerator has not been started or has been stopped")
}
}
} else {
throw new SparkException(
"Cannot add data as BlockGenerator has not been started or has been stopped")
}
}
第三步:updateCurrentBuffer函数
//blocksForPushing长度,默认10个,可调节
private val blockQueueSize = conf.getInt("spark.streaming.blockQueueSize", 10)
//blocksForPushing 阻塞队列
private val blocksForPushing = new ArrayBlockingQueue[Block](blockQueueSize)
/** Change the buffer to which single records are added to. */
//将currentBuffer中的数据进行打包,并添加到阻塞队列blocksForPushing中
private def updateCurrentBuffer(time: Long): Unit = {
try {
var newBlock: Block = null
synchronized {
//如果buffer空,则不生成block.
if (currentBuffer.nonEmpty) {
val newBlockBuffer = currentBuffer
currentBuffer = new ArrayBuffer[Any]
//根据时间创建一个唯一的BlockId
val blockId = StreamBlockId(receiverId, time - blockIntervalMs)
listener.onGenerateBlock(blockId)
//创建Block
newBlock = new Block(blockId, newBlockBuffer)
}
}
if (newBlock != null) {
//Block推入blocksForPushing队列
blocksForPushing.put(newBlock) // put is blocking when queue is full
}
} catch {
case ie: InterruptedException =>
logInfo("Block updating timer thread was interrupted")
case e: Exception =>
reportError("Error in block updating thread", e)
}
}
第四步:keepPushingBlocks()方法,每隔一段时间,去blocksForPushing队列中取Block
//从阻塞队列中取出切片后的数据,并通过defaultBlockGeneratorListener(ReceiverSupervisorImpl)转发
//等待下一步存储、分发操作
private def keepPushingBlocks() {
logInfo("Started block pushing thread")
def areBlocksBeingGenerated: Boolean = synchronized {
state != StoppedGeneratingBlocks
}
try {
// While blocks are being generated, keep polling for to-be-pushed blocks and push them.
while (areBlocksBeingGenerated) {
//每隔10ms从blocksForPushing队列中poll出来当前队首的block
Option(blocksForPushing.poll(10, TimeUnit.MILLISECONDS)) match {
//pushBlock 推送Block
case Some(block) => pushBlock(block)
case None =>
}
}
// At this point, state is StoppedGeneratingBlock. So drain the queue of to-be-pushed blocks.
logInfo("Pushing out the last " + blocksForPushing.size() + " blocks")
while (!blocksForPushing.isEmpty) {
val block = blocksForPushing.take()
logDebug(s"Pushing block $block")
pushBlock(block)
logInfo("Blocks left to push " + blocksForPushing.size())
}
logInfo("Stopped block pushing thread")
} catch {
case ie: InterruptedException =>
logInfo("Block pushing thread was interrupted")
case e: Exception =>
reportError("Error in block pushing thread", e)
}
}
第五步:pushBlock(block)方法
private def pushBlock(block: Block) {
listener.onPushBlock(block.id, block.buffer)
logInfo("Pushed block " + block.id)
}
第六步:BlockGeneratorListener 监控到onPushBlock事件
源码地址:org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.scala
private val defaultBlockGeneratorListener = new BlockGeneratorListener {
def onAddData(data: Any, metadata: Any): Unit = { }
def onGenerateBlock(blockId: StreamBlockId): Unit = { }
def onError(message: String, throwable: Throwable) {
reportError(message, throwable)
}
//BlockGeneratorListener 监控到onPushBlock事件后,会对传输的数据分片进行存储操作,并向ReceiverTracker汇报
def onPushBlock(blockId: StreamBlockId, arrayBuffer: ArrayBuffer[_]) {
//推送block
pushArrayBuffer(arrayBuffer, None, Some(blockId))
}
}
第七步:pushArrayBuffer()方法
def pushArrayBuffer(
arrayBuffer: ArrayBuffer[_],
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
//封装成ArrayBufferBlock
pushAndReportBlock(ArrayBufferBlock(arrayBuffer), metadataOption, blockIdOption)
}
第八步:pushAndReportBlock()方法
//对数据分片进行存储
def pushAndReportBlock(
receivedBlock: ReceivedBlock,
metadataOption: Option[Any],
blockIdOption: Option[StreamBlockId]
) {
val blockId = blockIdOption.getOrElse(nextBlockId)
val time = System.currentTimeMillis
//数据通过receivedBlockHandler存储为Block
val blockStoreResult = receivedBlockHandler.storeBlock(blockId, receivedBlock)
logDebug(s"Pushed block $blockId in ${(System.currentTimeMillis - time)} ms")
val numRecords = blockStoreResult.numRecords
//封装成ReceivedBlockInfo对象
val blockInfo = ReceivedBlockInfo(streamId, numRecords, metadataOption, blockStoreResult)
if (!trackerEndpoint.askSync[Boolean](AddBlock(blockInfo))) {
throw new SparkException("Failed to add block to receiver tracker.")
}
logDebug(s"Reported block $blockId")
}
第九步:receivedBlockHandler.storeBlock,ReceivedBlockHandler有两种实现
private val receivedBlockHandler: ReceivedBlockHandler = {
if (WriteAheadLogUtils.enableReceiverLog(env.conf)) {
if (checkpointDirOption.isEmpty) {
throw new SparkException(
"Cannot enable receiver write-ahead log without checkpoint directory set. " +
"Please use streamingContext.checkpoint() to set the checkpoint directory. " +
"See documentation for more details.")
}
//开启WAL(预写日志)时会使用此实现
new WriteAheadLogBasedBlockHandler(env.blockManager, env.serializerManager, receiver.streamId,
receiver.storageLevel, env.conf, hadoopConf, checkpointDirOption.get)
} else {
//默认情况下会使用此实现
new BlockManagerBasedBlockHandler(env.blockManager, receiver.storageLevel)
}
}
第十步:WriteAheadLogBasedBlockHandler中的storeBlock()方法
def storeBlock(blockId: StreamBlockId, block: ReceivedBlock): ReceivedBlockStoreResult = {
var numRecords = Option.empty[Long]
// Serialize the block so that it can be inserted into both
//序列化数据
val serializedBlock = block match {
case ArrayBufferBlock(arrayBuffer) =>
numRecords = Some(arrayBuffer.size.toLong)
serializerManager.dataSerialize(blockId, arrayBuffer.iterator)
case IteratorBlock(iterator) =>
val countIterator = new CountingIterator(iterator)
val serializedBlock = serializerManager.dataSerialize(blockId, countIterator)
numRecords = countIterator.count
serializedBlock
case ByteBufferBlock(byteBuffer) =>
new ChunkedByteBuffer(byteBuffer.duplicate())
case _ =>
throw new Exception(s"Could not push $blockId to block manager, unexpected block type")
}
// Store the block in block manager
//将数据保存到BlockManager中,默认的持久化策略,StorageLevel是_SER2的,会复制一份副本到其他Executor以供容错
val storeInBlockManagerFuture = Future {
val putSucceeded = blockManager.putBytes(
blockId,
serializedBlock,
effectiveStorageLevel,
tellMaster = true)
if (!putSucceeded) {
throw new SparkException(
s"Could not store $blockId to block manager with storage level $storageLevel")
}
}
// Store the block in write ahead log
//将Block存入预写日志
val storeInWriteAheadLogFuture = Future {
writeAheadLog.write(serializedBlock.toByteBuffer, clock.getTimeMillis())
}
// Combine the futures, wait for both to complete, and return the write ahead log record handle
val combinedFuture = storeInBlockManagerFuture.zip(storeInWriteAheadLogFuture).map(_._2)
val walRecordHandle = ThreadUtils.awaitResult(combinedFuture, blockStoreTimeout)
WriteAheadLogBasedStoreResult(blockId, numRecords, walRecordHandle)
}
第十一步:BlockManagerBasedBlockHandler中的storeBlock()方法
def storeBlock(blockId: StreamBlockId, block: ReceivedBlock): ReceivedBlockStoreResult = {
var numRecords: Option[Long] = None
//将数据保存到BlockManager中
val putSucceeded: Boolean = block match {
case ArrayBufferBlock(arrayBuffer) =>
numRecords = Some(arrayBuffer.size.toLong)
blockManager.putIterator(blockId, arrayBuffer.iterator, storageLevel,
tellMaster = true)
case IteratorBlock(iterator) =>
val countIterator = new CountingIterator(iterator)
val putResult = blockManager.putIterator(blockId, countIterator, storageLevel,
tellMaster = true)
numRecords = countIterator.count
putResult
case ByteBufferBlock(byteBuffer) =>
blockManager.putBytes(
blockId, new ChunkedByteBuffer(byteBuffer.duplicate()), storageLevel, tellMaster = true)
case o =>
throw new SparkException(
s"Could not store $blockId to block manager, unexpected block type ${o.getClass.getName}")
}
if (!putSucceeded) {
throw new SparkException(
s"Could not store $blockId to block manager with storage level $storageLevel")
}
BlockManagerBasedStoreResult(blockId, numRecords)
}
BlockManagerBasedBlockHandler通过BlockManager的接口对数据在Receiver所在节点进行保存,并依据StorageLevel 设置的副本数,在其它Executor中保存副本
第十二步:BlockManager保存副本的replicate()方法
源码地址:org.apache.spark.storage.BlockManager.class
private def replicate(
blockId: BlockId,
data: BlockData,
level: StorageLevel,
classTag: ClassTag[_],
existingReplicas: Set[BlockManagerId] = Set.empty): Unit = {
......
var peersForReplication = blockReplicationPolicy.prioritize(
blockManagerId,
initialPeers,
peersReplicatedTo,
blockId,
numPeersToReplicateTo)
......
}
第十三步: blockReplicationPolicy.prioritize() 副本策略采用,随机取样的方式进行
override def prioritize(
blockManagerId: BlockManagerId,
peers: Seq[BlockManagerId],
peersReplicatedTo: mutable.HashSet[BlockManagerId],
blockId: BlockId,
numReplicas: Int): List[BlockManagerId] = {
//随机
val random = new Random(blockId.hashCode)
logDebug(s"Input peers : ${peers.mkString(", ")}")
val prioritizedPeers = if (peers.size > numReplicas) {
BlockReplicationUtils.getRandomSample(peers, numReplicas, random)
} else {
if (peers.size < numReplicas) {
logWarning(s"Expecting ${numReplicas} replicas with only ${peers.size} peer/s.")
}
random.shuffle(peers).toList
}
logDebug(s"Prioritized peers : ${prioritizedPeers.mkString(", ")}")
prioritizedPeers
}
第十四步:Block保存完成,并且副本制作完成后,将通过trackerEndpoint向ReceiverTrack进行汇报,第八步的pushAndReportBlock()方法
//封装成ReceivedBlockInfo对象
val blockInfo = ReceivedBlockInfo(streamId, numRecords, metadataOption, blockStoreResult)
//发送AddBlock消息
if (!trackerEndpoint.askSync[Boolean](AddBlock(blockInfo))) {
throw new SparkException("Failed to add block to receiver tracker.")
}
第十五步:ReceiverTrackEndpoint收到"AddBlock"信息
case AddBlock(receivedBlockInfo) =>
if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) {
walBatchingThreadPool.execute(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
if (active) {
context.reply(addBlock(receivedBlockInfo))
} else {
context.sendFailure(
new IllegalStateException("ReceiverTracker RpcEndpoint already shut down."))
}
}
})
} else {
context.reply(addBlock(receivedBlockInfo))
}
第十六步:addBlock(receivedBlockInfo)方法
private val receivedBlockTracker = new ReceivedBlockTracker(
ssc.sparkContext.conf,
ssc.sparkContext.hadoopConfiguration,
receiverInputStreamIds,
ssc.scheduler.clock,
ssc.isCheckpointPresent,
Option(ssc.checkpointDir)
)
//receivedBlockTracker将block信息保存入队列streamIdToUnallocatedBlockQueues中,以用于生成Job
private def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
receivedBlockTracker.addBlock(receivedBlockInfo)
}
第十七步: receivedBlockTracker.addBlock(receivedBlockInfo)方法
//封装了streamId到Block的映射
private val streamIdToUnallocatedBlockQueues = new mutable.HashMap[Int, ReceivedBlockQueue]
//封装了Time到Block的映射
private val timeToAllocatedBlocks = new mutable.HashMap[Time, AllocatedBlocks]
//如果开启了预写日志机制,ReceivedTracker也会写一份到预写日志中
private val writeAheadLogOption = createWriteAheadLog()
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = {
try {
//预写日志机制
val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo))
if (writeResult) {
synchronized {
//加入队列
getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
}
logDebug(s"Stream ${receivedBlockInfo.streamId} received " +
s"block ${receivedBlockInfo.blockStoreResult.blockId}")
} else {
logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " +
s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.")
}
writeResult
} catch {
case NonFatal(e) =>
logError(s"Error adding block $receivedBlockInfo", e)
false
}
}
第十八步:getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo
private def getReceivedBlockQueue(streamId: Int): ReceivedBlockQueue = {
streamIdToUnallocatedBlockQueues.getOrElseUpdate(streamId, new ReceivedBlockQueue)
}
保存在streamIdToUnallocatedBlockQueues中的数据信息,在下一个批次生成Job时会被取出用于封装成RDD,且注册数据信息会转移至timeToAllocatedBlocks中。
第一步:JobGenerator用于定期生成Job并进行提交,在启动JobScheduler时,其会调用JobGenerator的start方法
def start(): Unit = synchronized {
if (eventLoop != null) return // generator has already been started
// Call checkpointWriter here to initialize it before eventLoop uses it to avoid a deadlock.
// See SPARK-10125
checkpointWriter
eventLoop = new EventLoop[JobGeneratorEvent]("JobGenerator") {
override protected def onReceive(event: JobGeneratorEvent): Unit = processEvent(event)
override protected def onError(e: Throwable): Unit = {
jobScheduler.reportError("Error in job generator", e)
}
}
//eventLoop对象并启动,其中eventLoop定义事件交由processEvent(event),
//processEvent其依据事件的类型,对其进行不同的处理
eventLoop.start()
if (ssc.isCheckpointPresent) {
restart()
} else {
//定期生成Job
startFirstTime()
}
}
第二步:startFirstTime()方法
//其按设置的时间周期,重复的执行计划的任务
private val timer = new RecurringTimer(clock, ssc.graph.batchDuration.milliseconds,
longTime => eventLoop.post(GenerateJobs(new Time(longTime))), "JobGenerator")
/**
* 初始化开始时间,根据我们设置的batch interval,每到batch interval
* 都会从上一个time,也就是这里的startTime,开始将batch interval内的数据封装为一个batch
*/
private def startFirstTime() {
val startTime = new Time(timer.getStartTime())
graph.start(startTime - graph.batchDuration)
timer.start(startTime.milliseconds)
logInfo("Started JobGenerator at " + startTime)
}
第三步:每个batchDuration规定时间,都会向eventLoop发送一GenerateJobs事件,eventLoop收到GenerateJobs事件,则使用processEvent进行相应处理,此处为调用 generateJobs()方法 ,生成Job
/** Processes all events */
private def processEvent(event: JobGeneratorEvent) {
logDebug("Got event " + event)
event match {
case GenerateJobs(time) => generateJobs(time)
case ClearMetadata(time) => clearMetadata(time)
case DoCheckpoint(time, clearCheckpointDataLater) =>
doCheckpoint(time, clearCheckpointDataLater)
case ClearCheckpointData(time) => clearCheckpointData(time)
}
}
第四步:generateJobs(time) 生成Job
/**
* 定时调度generateJobs方法
* time:Batch Interval内的时间段
*/
private def generateJobs(time: Time) {
// Checkpoint all RDDs marked for checkpointing to ensure their lineages are
// truncated periodically. Otherwise, we may run into stack overflows (SPARK-6847).
ssc.sparkContext.setLocalProperty(RDD.CHECKPOINT_ALL_MARKED_ANCESTORS, "true")
Try {
//找到ReceiverTracker,调用allocateBlocksToBatch方法,将当前时间段内的block分配给一个batch
//并为其创建一个RDD
jobScheduler.receiverTracker.allocateBlocksToBatch(time) // allocate received blocks to batch
//调用DStreamGraph的generateJobs方法,根据定义的DStream关系和算子,生成Job
graph.generateJobs(time) // generate jobs using allocated block
} match {
//成功转化为Job之后,然后通过JobScheduler对JobSet进行提交
case Success(jobs) =>
val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time)
jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos))
case Failure(e) =>
jobScheduler.reportError("Error generating jobs for time " + time, e)
PythonDStream.stopStreamingContextIfPythonProcessIsDead(e)
}
eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = false))
}
第五步: jobScheduler.receiverTracker.allocateBlocksToBatch(time)
def allocateBlocksToBatch(batchTime: Time): Unit = {
if (receiverInputStreams.nonEmpty) {
//将未分配数据信息取出,并划分给batchTime所指批次
receivedBlockTracker.allocateBlocksToBatch(batchTime)
}
}
第六步:receivedBlockTracker.allocateBlocksToBatch(batchTime)
def allocateBlocksToBatch(batchTime: Time): Unit = synchronized {
if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) {
val streamIdToBlocks = streamIds.map { streamId =>
(streamId, getReceivedBlockQueue(streamId).clone())
}.toMap
//streamIdToUnallocatedBlockQueues中取出未分配的block信息,将其包装为AllocatedBlocks
val allocatedBlocks = AllocatedBlocks(streamIdToBlocks)
if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) {
streamIds.foreach(getReceivedBlockQueue(_).clear())
//保存到timeToAllocatedBlocks中,等待某批次(batchTime)生成Job时,与Job进行绑定
timeToAllocatedBlocks.put(batchTime, allocatedBlocks)
lastAllocatedBatchTime = batchTime
} else {
logInfo(s"Possibly processed batch $batchTime needs to be processed again in WAL recovery")
}
} else {
// This situation occurs when:
// 1. WAL is ended with BatchAllocationEvent, but without BatchCleanupEvent,
// possibly processed batch job or half-processed batch job need to be processed again,
// so the batchTime will be equal to lastAllocatedBatchTime.
// 2. Slow checkpointing makes recovered batch time older than WAL recovered
// lastAllocatedBatchTime.
// This situation will only occurs in recovery time.
logInfo(s"Possibly processed batch $batchTime needs to be processed again in WAL recovery")
}
}
第七步:第四步的graph.generateJobs(time)方法分别将DStreamGraph中的每个OutputStream转换了一个Job(如果应用中有多个OutputStream算子,则一个批次会生成多个Job)
def generateJobs(time: Time): Seq[Job] = {
logDebug("Generating jobs for time " + time)
val jobs = this.synchronized {
outputStreams.flatMap { outputStream =>
//调用OutputStream的generateJob方法来将每个OutputStream转化为Job
val jobOption = outputStream.generateJob(time)
jobOption.foreach(_.setCallSite(outputStream.creationSite))
jobOption
}
}
logDebug("Generated " + jobs.length + " jobs for time " + time)
jobs
}
第八步: outputStream.generateJob(time)
/**
* 所有的output操作都会调用ForEachDStream的generateJob方法
* 底层就会触发Job提交
*/
override def generateJob(time: Time): Option[Job] = {
parent.getOrCompute(time) match {
case Some(rdd) =>
val jobFunc = () => createRDDWithLocalProperties(time, displayInnerRDDOps) {
foreachFunc(rdd, time)
}
//创建Job
Some(new Job(time, jobFunc))
case None => None
}
}
第九步:parent.getOrCompute(time) 调用父DStream中的getOrCompute方法,生成RDD
WordCount应用中DStream的转换
转换关系链中ForEachDStream的父亲-ShuffledDStream,发现其未重写getOrCompute方法,因此将使用继承自基类DStream中的getOrCompute
/**
* Get the RDD corresponding to the given time; either retrieve it from cache
* or compute-and-cache it.
*/
//getOrCompute( compute方法与之类似)方法由DStream基类创建,
//如果子类重写该方法,则执行子类方法;若未重写,则执行基类中的方法
private[streaming] final def getOrCompute(time: Time): Option[RDD[T]] = {
// If RDD was already generated, then retrieve it from HashMap,
// or else compute the RDD
generatedRDDs.get(time).orElse {
// Compute the RDD if time is valid (e.g. correct time in a sliding window)
// of RDD generation, else generate nothing.
if (isTimeValid(time)) {
val rddOption = createRDDWithLocalProperties(time, displayInnerRDDOps = false) {
// Disable checks for existing output directories in jobs launched by the streaming
// scheduler, since we may need to write output to an existing directory during checkpoint
// recovery; see SPARK-4835 for more details. We need to have this call here because
// compute() might cause Spark jobs to be launched.
SparkHadoopWriterUtils.disableOutputSpecValidation.withValue(true) {
//递归调用生成RDD
compute(time)
}
}
rddOption.foreach { case newRDD =>
// Register the generated RDD for caching and checkpointing
if (storageLevel != StorageLevel.NONE) {
newRDD.persist(storageLevel)
logDebug(s"Persisting RDD ${newRDD.id} for time $time to $storageLevel")
}
if (checkpointDuration != null && (time - zeroTime).isMultipleOf(checkpointDuration)) {
newRDD.checkpoint()
logInfo(s"Marking RDD ${newRDD.id} for time $time for checkpointing")
}
generatedRDDs.put(time, newRDD)
}
rddOption
} else {
None
}
}
}
第十步:ShuffledDStream.compute(time)生成RDD
override def compute(validTime: Time): Option[RDD[(K, C)]] = {
//调用其父DStream的compute方法,其父DStream继续递归向上调用父DStream的compute直到源头DStream
parent.getOrCompute(validTime) match {
case Some(rdd) => Some(rdd.combineByKey[C](
createCombiner, mergeValue, mergeCombiner, partitioner, mapSideCombine))
case None => None
}
}
第十一步:SocketInputDStream.compute(time)方法
SocketInputDStream的compute方法继承自ReceiverInputDStream,其compute方法将生成源头RDD,并按DStream递归逆向生成RDD Graph
override def compute(validTime: Time): Option[RDD[T]] = {
val blockRDD = {
if (validTime < graph.startTime) {
// If this is called for any time before the start time of the context,
// then this returns an empty RDD. This may happen when recovering from a
// driver failure without any write ahead log to recover pre-failure data.
new BlockRDD[T](ssc.sc, Array.empty)
} else {
// Otherwise, ask the tracker for all the blocks that have been allocated to this stream
// for this batch
//划分过批次的数据信息(timeToAllocatedBlocks)取出ReceivedBlockInfo
val receiverTracker = ssc.scheduler.receiverTracker
val blockInfos = receiverTracker.getBlocksOfBatch(validTime).getOrElse(id, Seq.empty)
// Register the input blocks information into InputInfoTracker
//包装成StreamInputInfo
val inputInfo = StreamInputInfo(id, blockInfos.flatMap(_.numRecords).sum)
ssc.scheduler.inputInfoTracker.reportInfo(validTime, inputInfo)
// Create the BlockRDD
//生成RDD
createBlockRDD(validTime, blockInfos)
}
}
Some(blockRDD)
}
第十二步:createBlockRDD(validTime, blockInfos)
private[streaming] def createBlockRDD(time: Time, blockInfos: Seq[ReceivedBlockInfo]): RDD[T] = {
//判断blockInfos是否为空
if (blockInfos.nonEmpty) {
val blockIds = blockInfos.map { _.blockId.asInstanceOf[BlockId] }.toArray
// Are WAL record handles present with all the blocks
val areWALRecordHandlesPresent = blockInfos.forall { _.walRecordHandleOption.nonEmpty }
if (areWALRecordHandlesPresent) {
// If all the blocks have WAL record handle, then create a WALBackedBlockRDD
val isBlockIdValid = blockInfos.map { _.isBlockIdValid() }.toArray
val walRecordHandles = blockInfos.map { _.walRecordHandleOption.get }.toArray
// 从WAL恢复
new WriteAheadLogBackedBlockRDD[T](
ssc.sparkContext, blockIds, walRecordHandles, isBlockIdValid)
} else {
// Else, create a BlockRDD. However, if there are some blocks with WAL info but not
// others then that is unexpected and log a warning accordingly.
if (blockInfos.exists(_.walRecordHandleOption.nonEmpty)) {
if (WriteAheadLogUtils.enableReceiverLog(ssc.conf)) {
logError("Some blocks do not have Write Ahead Log information; " +
"this is unexpected and data may not be recoverable after driver failures")
} else {
logWarning("Some blocks have Write Ahead Log information; this is unexpected")
}
}
//校验数据是否还存在,不存在就过滤掉,此时的master是BlockManager
val validBlockIds = blockIds.filter { id =>
ssc.sparkContext.env.blockManager.master.contains(id)
}
if (validBlockIds.length != blockIds.length) {
logWarning("Some blocks could not be recovered as they were not found in memory. " +
"To prevent such data loss, enable Write Ahead Log (see programming guide " +
"for more details.")
}
//根据有效的BlockIds生成BlockRDD
new BlockRDD[T](ssc.sc, validBlockIds)
}
} else {
// If no block is ready now, creating WriteAheadLogBackedBlockRDD or BlockRDD
// according to the configuration
// 从WAL中创建空RDD
if (WriteAheadLogUtils.enableReceiverLog(ssc.conf)) {
new WriteAheadLogBackedBlockRDD[T](
ssc.sparkContext, Array.empty, Array.empty, Array.empty)
} else {
//生成空的BlockRDD
new BlockRDD[T](ssc.sc, Array.empty)
}
}
}
第十三步:返回第四步,通过JobScheduler对JobSet进行提交
//成功创建了job
case Success(jobs) =>
////从ReceiverTracker中获取当前batch interval对应的数据
val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time)
//所有Job组成一个JobSet,使用JobScheduler的submitJobSet进行批量Job提交
jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos))
第十四步:jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos))
private val numConcurrentJobs = ssc.conf.getInt("spark.streaming.concurrentJobs", 1)
//默认情况下每次只能提交一个Job
private val jobExecutor =
ThreadUtils.newDaemonFixedThreadPool(numConcurrentJobs, "streaming-job-executor")
def submitJobSet(jobSet: JobSet) {
if (jobSet.jobs.isEmpty) {
logInfo("No jobs added for time " + jobSet.time)
} else {
listenerBus.post(StreamingListenerBatchSubmitted(jobSet.toBatchInfo))
jobSets.put(jobSet.time, jobSet)
//将Job通过JobHandler进行包装,然后由ThreadPoolExecutor的execute增加到其workQueue中,等待被调度执行
//如果线程池有空闲线程,则其将被调度
jobSet.jobs.foreach(job => jobExecutor.execute(new JobHandler(job)))
logInfo("Added jobs for time " + jobSet.time)
}
}
第十五步:new JobHandler(job)
/**
* JobHandler是ThreadPoolExecutor中Executor运行的主要任务,其功能是对提交的Job进行处理
*/
private class JobHandler(job: Job) extends Runnable with Logging {
import JobScheduler._
def run() {
val oldProps = ssc.sparkContext.getLocalProperties
try {
ssc.sparkContext.setLocalProperties(SerializationUtils.clone(ssc.savedProperties.get()))
val formattedTime = UIUtils.formatBatchTime(
job.time.milliseconds, ssc.graph.batchDuration.milliseconds, showYYYYMMSS = false)
val batchUrl = s"/streaming/batch/?id=${job.time.milliseconds}"
val batchLinkText = s"[output operation ${job.outputOpId}, batch time ${formattedTime}]"
ssc.sc.setJobDescription(
s"""Streaming job from $batchLinkText""")
ssc.sc.setLocalProperty(BATCH_TIME_PROPERTY_KEY, job.time.milliseconds.toString)
ssc.sc.setLocalProperty(OUTPUT_OP_ID_PROPERTY_KEY, job.outputOpId.toString)
// Checkpoint all RDDs marked for checkpointing to ensure their lineages are
// truncated periodically. Otherwise, we may run into stack overflows (SPARK-6847).
ssc.sparkContext.setLocalProperty(RDD.CHECKPOINT_ALL_MARKED_ANCESTORS, "true")
// We need to assign `eventLoop` to a temp variable. Otherwise, because
// `JobScheduler.stop(false)` may set `eventLoop` to null when this method is running, then
// it's possible that when `post` is called, `eventLoop` happens to null.
var _eventLoop = eventLoop
if (_eventLoop != null) {
_eventLoop.post(JobStarted(job, clock.getTimeMillis()))
// Disable checks for existing output directories in jobs launched by the streaming
// scheduler, since we may need to write output to an existing directory during checkpoint
// recovery; see SPARK-4835 for more details.
//其将通过EventLoop对Job状态进行管理,并通过调用job.run方法,使用Job开始运行
SparkHadoopWriterUtils.disableOutputSpecValidation.withValue(true) {
job.run()
}
_eventLoop = eventLoop
if (_eventLoop != null) {
_eventLoop.post(JobCompleted(job, clock.getTimeMillis()))
}
} else {
// JobScheduler has been stopped.
}
} finally {
ssc.sparkContext.setLocalProperties(oldProps)
}
}
}
第十六步: job.run()方法
def run() {
//DStream的jobFunc函数
_result = Try(func())
}
第十七步:DStream的jobFunc函数,
private[streaming] def generateJob(time: Time): Option[Job] = {
getOrCompute(time) match {
case Some(rdd) =>
val jobFunc = () => {
val emptyFunc = { (iterator: Iterator[T]) => {} }
//会触发SparkJob的提交,接下来的处理流程与spark 批处理相同
context.sparkContext.runJob(rdd, emptyFunc)
}
Some(new Job(time, jobFunc))
case None => None
}
}
前述生成的Job,只是Streaming中定义的抽象,与SparkJob(真正进行调度,生成Task)不同。