DT大数据梦工厂Spark定制班笔记(011)

Spark Streaming源码解读之Driver中的ReceiverTracker架构设计以及具体实现彻底研究

ReceiverTracker主要的功能:

1.  Executor上启动Receivers。

2.  停止Receivers 。

3.  更新Receiver接收数据的速率(可以实现限流)

4.  接收Receivers的运行状态,只要Receiver停止运行,就重新启动Receiver。也就是Receiver的容错功能。

5.  接受Receiver的注册。

6.  借助ReceivedBlockTracker来管理Receiver接收数据的元数据。

7.  汇报Receiver发送过来的错误信息

 

启动receiver

ReceiverTracker的start方法中,实例化了ReceiverTrackerEndpoint,并且在Executor上启动Receivers:

ReceiverTracker.scala(149-161)

def start(): Unit = synchronized {
  if (isTrackerStarted){
    throw new SparkException("ReceiverTracker already started")
  }

  if (!receiverInputStreams.isEmpty) {
    endpoint = ssc.env.rpcEnv.setupEndpoint(
      "ReceiverTracker", new ReceiverTrackerEndpoint(ssc.env.rpcEnv))
    if (!skipReceiverLaunch)launchReceivers()
    logInfo("ReceiverTracker started")
    trackerState = Started
 
}
}

 

 

ReceiverTracker.scala(413-424)

private def launchReceivers(): Unit = {
  val receivers = receiverInputStreams.map(nis => {
    val rcvr= nis.getReceiver()
    rcvr.setReceiverId(nis.id)
    rcvr
  })

  runDummySparkJob()

  logInfo("Starting " + receivers.length +" receivers")
  endpoint.send(StartAllReceivers(receivers))
}

一直追踪函数调用过程receiver被封装为RDD (详见第5讲 ReceiverTracker.scala 583-589)

 

val receiverRDD: RDD[Receiver[_]] =
  if (scheduledLocations.isEmpty){
    ssc.sc.makeRDD(Seq(receiver),1)
  } else {
    val preferredLocations= scheduledLocations.map(_.toString).distinct
    ssc.sc.makeRDD(Seq(receiver -> preferredLocations))
  }

并被提交 (ReceiverTracker.scala 591)

ssc.sparkContext.setJobDescription(s"Streaming job running receiver$receiverId")

 

Receiver启动的注册

Receiver启动后,会向ReceiverTracker注册,注册成功才算正式启动了。

(未完待续)

你可能感兴趣的:(DT大数据梦工厂Spark定制班笔记(011))