sparkstreaming kafka Failed to get records for after polling for 512

这个错误上次说的解决方案是设置heartbeat.interval.ms 和 session.timeout.ms这两个参数,但发下效果不理想,错误还是会出现。

从错误日志里翻阅源码,发现了问题所在,报错的代码是:

 at org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)

查看CachedKafkaConsumer类的get方法:

/**
   * Get the record for the given offset, waiting up to timeout ms if IO is necessary.
   * Sequential forward access will use buffers, but random access will be horribly inefficient.
   */
  def get(offset: Long, timeout: Long): ConsumerRecord[K, V] = {
    logDebug(s"Get $groupId $topic $partition nextOffset $nextOffset requested $offset")
    if (offset != nextOffset) {
      logInfo(s"Initial fetch for $groupId $topic $partition $offset")
      seek(offset)
      poll(timeout)
    }

    if (!buffer.hasNext()) { poll(timeout) }
    assert(buffer.hasNext(),
      s"Failed to get records for $groupId $topic $partition $offset after polling for $timeout")
    var record = buffer.next()

    if (record.offset != offset) {
      logInfo(s"Buffer miss for $groupId $topic $partition $offset")
      seek(offset)
      poll(timeout)
      assert(buffer.hasNext(),
        s"Failed to get records for $groupId $topic $partition $offset after polling for $timeout")
      record = buffer.next()
      assert(record.offset == offset,
        s"Got wrong record for $groupId $topic $partition even after seeking to offset $offset")
    }

    nextOffset = offset + 1
    record
  }
该函数是取固定offset的数据,若是连续取数,则会用到buffer,效率会很高,类似于批量取数,若随机取数,则效率会很低。

继续往下看 at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)

查看KafkaRDDIterator的next方法,调用了上面的get方法,参数设置为pollTimeout512ms,也就是报错中的512

override def next(): ConsumerRecord[K, V] = {
  assert(hasNext(), "Can't call getNext() once untilOffset has been reached")
  val r = consumer.get(requestOffset, pollTimeout)
  requestOffset += 1
  r
}

pollTimeout = conf.getLong("spark.streaming.kafka.consumer.poll.ms", 512)
在初始sparkconf的时候,将spark.streaming.kafka.consumer.poll.ms设置为10000,就再也不会报该错误了。
除非你的kafka服务器很不稳定,导致poll取数超时。

 
  

你可能感兴趣的:(kafka,spark)