StreamingListener-监控流式处理-更新广播变量-利器

SparkStreaming中常常遇到需要监控每个批次的运行情况,当出现不正常的情况需要及时反馈。就需要用到org.apache.spark.streaming.scheduler.StreamingListener这个类来进行处理。
一下为其源码部分

/**
 * :: DeveloperApi ::
 * A listener interface for receiving information about an ongoing streaming
 * computation.
 */
@DeveloperApi
trait StreamingListener {

  /** Called when the streaming has been started */
  def onStreamingStarted(streamingStarted: StreamingListenerStreamingStarted) { }

  /** Called when a receiver has been started */
  def onReceiverStarted(receiverStarted: StreamingListenerReceiverStarted) { }

  /** Called when a receiver has reported an error */
  def onReceiverError(receiverError: StreamingListenerReceiverError) { }

  /** Called when a receiver has been stopped */
  def onReceiverStopped(receiverStopped: StreamingListenerReceiverStopped) { }

  /** Called when a batch of jobs has been submitted for processing. */
  def onBatchSubmitted(batchSubmitted: StreamingListenerBatchSubmitted) { }

  /** Called when processing of a batch of jobs has started.  */
  def onBatchStarted(batchStarted: StreamingListenerBatchStarted) { }

  /** Called when processing of a batch of jobs has completed. */
  def onBatchCompleted(batchCompleted: StreamingListenerBatchCompleted) { }

  /** Called when processing of a job of a batch has started. */
  def onOutputOperationStarted(
      outputOperationStarted: StreamingListenerOutputOperationStarted) { }

  /** Called when processing of a job of a batch has completed. */
  def onOutputOperationCompleted(
      outputOperationCompleted: StreamingListenerOutputOperationCompleted) { }
}

对于流式处理的每个批次的详细数据其实SparkUI都需要经过StreamingListener拿一遍,如下,是将spark streaming处理的每个批次数据信息发往redis的部分代码,这个类中也可以进行一些广播变量的定时更新

部分代码如下:

class ABCStreamingListener(private val appName: String, private val duration: Int,spark:SparkSession) extends SparkListener  with StreamingListener{

  private var refreshUserInfoTime = new Date
  private var refreshSenceInfoTime = new Date
  private val logger = LoggerFactory.getLogger("BoncListener")
  private val jedisCon = JedisUtil()
  private val sc: SparkContext = spark.sparkContext

  override def onReceiverStarted(receiverStarted: StreamingListenerReceiverStarted): Unit = {
    super.onReceiverStarted(receiverStarted)
  }

  override def onReceiverError(receiverError: StreamingListenerReceiverError): Unit = super.onReceiverError(receiverError)

  override def onReceiverStopped(receiverStopped: StreamingListenerReceiverStopped): Unit = super.onReceiverStopped(receiverStopped)


  override def onBatchStarted(batchStarted: StreamingListenerBatchStarted): Unit = {
    val batchInfo = batchStarted.batchInfo
    val processingStartTime = batchInfo.processingStartTime
    logger.info("BoncListener  processingStartTime : ", processingStartTime)
  }

  override def onBatchCompleted(batchCompleted: StreamingListenerBatchCompleted): Unit = {
    val batchInfo = batchCompleted.batchInfo
    val processingDelay = batchInfo.processingDelay.get
    val totalDelay = batchInfo.totalDelay.get
    val numRecords = batchInfo.numRecords
    val schedulingDelay = batchInfo.schedulingDelay.get
    val batchTime = DateFormatUtils.format(batchInfo.batchTime.milliseconds,"yyyy-MM-dd HH:mm:ss")
    val key = "sparkAPP_" + appName
    val value = (batchTime,numRecords,schedulingDelay,processingDelay,totalDelay).productIterator.mkString("|")
    jedisCon.jedisFlow(jedis=>{
      jedis.lpush(key, value)
      val length = jedis.llen(key)
      if (length >= duration) {
        jedis.rpop(key);
      }
    })


    val executeTime = new Date
    if(executeTime.getTime - refreshUserInfoTime.getTime > 12 * 60 * 60 * 1000) {
      ExportRedisToHDFS.conventer050UserInfoToParquet(spark)
      refreshUserInfoTime = executeTime
      logger.error("==============execute the update of userInfoDIM=======================")
    }
    if(executeTime.getTime - refreshSenceInfoTime.getTime > 6 * 24 * 60 * 60 * 1000) {
      ExportRedisToHDFS.conventer050SenceInfoToParquet(spark)
      logger.error("==============execute the update of senceInfoDim=======================")
      refreshSenceInfoTime = executeTime
    }


  }

}

最后需要在spark Streaming程序中注册自己的Listener

ssc.addStreamingListener(new ABCStreamingListener("cons_cxdr", 50, spark))

你可能感兴趣的:(Spark)