Kafka中生产者Producer消息发送流程源码解析

Kafka的Java客户端通过封装类kafka.producer.Producer来提供消息发送服务,所以消息发送的逻辑主要是在kafka.producer.Producer中完成。Producer的代码如下:

class Producer[K,V](val config: ProducerConfig,
                    private val eventHandler: EventHandler[K,V])  // only for unit testing
  extends Logging {

  private val hasShutdown = new AtomicBoolean(false)
  //异步发送消息时,queue用来接收用户的消息,发送线程从queue中取消息发送给kafka broker。
  private val queue = new LinkedBlockingQueue[KeyedMessage[K,V]](config.queueBufferingMaxMessages)

  private var sync: Boolean = true
  //消息发送线程
  private var producerSendThread: ProducerSendThread[K,V] = null
  private val lock = new Object()

  config.producerType match {
    case "sync" =>
    case "async" =>
      sync = false
      producerSendThread = new ProducerSendThread[K,V]("ProducerSendThread-" + config.clientId,
                                                       queue,
                                                       eventHandler,
                                                       config.queueBufferingMaxMs,
                                                       config.batchNumMessages,
                                                       config.clientId)
      producerSendThread.start()
  }
 
  //利用了ProducerConfig初始化Producer对象,ProducerPool是通信连接池
  def this(config: ProducerConfig) =
    this(config,
         new DefaultEventHandler[K,V](config,
                                      Utils.createObject[Partitioner](config.partitionerClass, config.props),
                                      Utils.createObject[Encoder[V]](config.serializerClass, config.props),
                                      Utils.createObject[Encoder[K]](config.keySerializerClass, config.props),
                                      new ProducerPool(config)))

  //发送消息的函数
  def send(messages: KeyedMessage[K,V]*) {
    lock synchronized {
      if (hasShutdown.get)
        throw new ProducerClosedException
      recordStats(messages)
      sync match {
        //同步发送消息
        case true => eventHandler.handle(messages)
        //异步发送消息
        case false => asyncSend(messages)
      }
    }
  }
}

通过上面的代码可以看出,客户端主要是通过函数send将消息发送出去的,Producer内部有几个主要的模块:

       1)ProducerSendThread:消息发送线程。当消息是异步发送时,ProducerSendThread主要用于缓存客户端的KeyedMessage,然后累计到配置的数量,或者间隔一定时间(queue.enqueue.timeout.ms)还没有获取到新的消息,则调用DefaultEventHandler的函数将KeyedMessage发送出去。

     2)ProducerPool:缓存客户端和各个broker的连接,DefaultEventHandler从ProducerPool获取和某个broker的通信对象SyncProducer,然后通过SyncProducer将KeyedMessage发送到指定的broker。

    3)DefaultEventHandler:将KeyedMessage消息按照分区规则计算不同的Broker Server所应接收的部分KeyedMessage消息,然后通过SyncProducer将KeyedMessage发送到指定的broker。在内部,在DefaultEventHandler模块内部提供了SyncProducer发送失败的重试机制和平滑扩容broker的机制。

ProducerSendThread

当用户配置producer.type等于async,表示消息异步发送,Producer客户端会启动ProducerSendThread线程,该线程负责从存放消息的阻塞队列BlockingQueue中取出消息,当超过一定数据或者间隔一段时间没有获取到消息,将累计到的消息调用DefaultEventHandler模板发送出去。

class ProducerSendThread[K,V](
      val threadName: String,
      //存放消息的队列
      val queue: BlockingQueue[KeyedMessage[K,V]],
      val handler: EventHandler[K,V],
      val queueTime: Long,   //消息间隔的时间,默认为5000ms
      val batchSize: Int,    //消息累计触发发送的数量,默认为10000条
      val clientId: String) extends Thread(threadName) with Logging with KafkaMetricsGroup {

  private val shutdownLatch = new CountDownLatch(1)
  private val shutdownCommand = new KeyedMessage[K,V]("shutdown", null.asInstanceOf[K], null.asInstanceOf[V])

  newGauge("ProducerQueueSize",
          new Gauge[Int] {
            def value = queue.size
          },
          Map("clientId" -> clientId))

  override def run {
    try {
      processEvents
    }catch {
      case e: Throwable => error("Error in sending events: ", e)
    }finally {
      shutdownLatch.countDown
    }
  }

  private def processEvents() {
    var lastSend = SystemTime.milliseconds
    var events = new ArrayBuffer[KeyedMessage[K,V]]
    var full: Boolean = false

    // 从阻塞队列中拉取消息
    Stream.continually(queue.poll(scala.math.max(0, (lastSend + queueTime) - SystemTime.milliseconds), TimeUnit.MILLISECONDS))
                      .takeWhile(item => if(item != null) item ne shutdownCommand else true).foreach {
      currentQueueItem =>
        val elapsed = (SystemTime.milliseconds - lastSend)
        // 检查是否超时
        val expired = currentQueueItem == null
        if(currentQueueItem != null) {
          //消息累加
          events += currentQueueItem
        }

        // 判断是否超过设置的个数
        full = events.size >= batchSize

        if(full || expired) {
          if(expired)
            debug(elapsed + " ms elapsed. Queue time reached. Sending..")
          if(full)
            debug("Batch full. Sending..")
          //发送消息
          tryToHandle(events)
          lastSend = SystemTime.milliseconds
          //清空发送的消息
          events = new ArrayBuffer[KeyedMessage[K,V]]
        }
    }
    // 发送最后一批消息
    tryToHandle(events)
    if(queue.size > 0)
      throw new IllegalQueueStateException("Invalid queue state! After queue shutdown, %d remaining items in the queue"
        .format(queue.size))
  }

  def tryToHandle(events: Seq[KeyedMessage[K,V]]) {
    val size = events.size
    try {
      debug("Handling " + size + " events")
      if(size > 0)
        handler.handle(events)
    }catch {
      case e: Throwable => error("Error in handling batch of " + size + " events", e)
    }
  }

}

ProducerPool

producerPool缓存了和不同Broker Server的通信链路,每个通信链路用SyncProducer对象表示,该对象通过SyncProducer,该对象通过doSend接口将ProducerRequest和TopicMetadataRequest发送出去。其中ProducerRequest为生产者将消息发送客户端消息的请求,TopicMetadataRequest为生产者获取Topic元数据的请求。doSend的代码如下:

class SyncProducer(val config: SyncProducerConfig) extends Logging {
     private def doSend(request: RequestOrResponse, readResponse: Boolean = true): Receive = {
    lock synchronized {
      //验证请求
      verifyRequest(request)
      //如果没有和当前broker建立连接,就创建连接
      getOrMakeConnection()

      var response: Receive = null
      try {
        //调用阻塞通道,将请求发送出去
        blockingChannel.send(request)
        if(readResponse)
          //从阻塞通道获取数据
          response = blockingChannel.receive()
        else
          trace("Skipping reading response")
      } catch {
        case e: java.io.IOException =>
          // 发送失败,断开连接
          disconnect()
          throw e
        case e: Throwable => throw e
      }
       //返回响应
      response
    }
  }
}

producerPool通过BrokerId将不同的SyncProducer一一映射,并且通过updateProducer来更新内部的SyncProducer连接池。

class ProducerPool(val config: ProducerConfig) extends Logging {
  //syncProducers的key为BrokerId,value为SyncProducer
  private val syncProducers = new HashMap[Int, SyncProducer]
  private val lock = new Object()

  def updateProducer(topicMetadata: Seq[TopicMetadata]) {
    val newBrokers = new collection.mutable.HashSet[Broker]
    //统计Topic所有分区的Leader Replica
    topicMetadata.foreach(tmd => {
      tmd.partitionsMetadata.foreach(pmd => {
        if(pmd.leader.isDefined)
          newBrokers+=(pmd.leader.get)
      })
    })
    lock synchronized {
      newBrokers.foreach(b => {
        if(syncProducers.contains(b.id)){
          //已经存在的SyncProducer,关闭连接后重新创建连接
          syncProducers(b.id).close()
          syncProducers.put(b.id, ProducerPool.createSyncProducer(config, b))
        } else
          //不存在连接,则新建
          syncProducers.put(b.id, ProducerPool.createSyncProducer(config, b))
      })
    }
  }

  def getProducer(brokerId: Int) : SyncProducer = {
    lock.synchronized {
      val producer = syncProducers.get(brokerId)
      producer match {
        case Some(p) => p
        case None => throw new UnavailableProducerException("Sync producer for broker id %d does not exist".format(brokerId))
      }
    }
  }
}

DefaultEventHandler

DefaultEventHandler决定了发送消息的具体逻辑,主要分以下几个步骤:

    1)、通过分区规则首先将KeyedMessage分组,不同的KeyedMessage落入到不同的topic分区,然后按照Leader Replcia所在的Broker Server分组,每个Broker Server对应不同的KeyedMessage

    2)、从producerPool取出不同的Broker Server对应的SyncProducer对象,通过SyncProducer对象将消息发送出去。

 首先看一下DefaultEventHandler的消息处理逻辑

class DefaultEventHandler[K,V](
      config: ProducerConfig,
      private val partitioner: Partitioner,
      private val encoder: Encoder[V],
      private val keyEncoder: Encoder[K],
      private val producerPool: ProducerPool,
      private val topicPartitionInfos: HashMap[String, TopicMetadata] = new HashMap[String, TopicMetadata])
  extends EventHandler[K,V] with Logging {
   //是否同步
  val isSync = ("sync" == config.producerType)
  val correlationId = new AtomicInteger(0)
   
  val brokerPartitionInfo = new BrokerPartitionInfo(config, producerPool, topicPartitionInfos)

  private val topicMetadataRefreshInterval = config.topicMetadataRefreshIntervalMs
  private var lastTopicMetadataRefreshTime = 0L
  private val topicMetadataToRefresh = Set.empty[String]
  private val sendPartitionPerTopicCache = HashMap.empty[String, Int]

  private val producerStats = ProducerStatsRegistry.getProducerStats(config.clientId)
  private val producerTopicStats = ProducerTopicStatsRegistry.getProducerTopicStats(config.clientId)

  def handle(events: Seq[KeyedMessage[K,V]]) {
    //序列化events
    val serializedData = serialize(events)
    serializedData.foreach {
      keyed =>
        val dataSize = keyed.message.payloadSize
        producerTopicStats.getProducerTopicStats(keyed.topic).byteRate.mark(dataSize)
        producerTopicStats.getProducerAllTopicsStats.byteRate.mark(dataSize)
    }
    var outstandingProduceRequests = serializedData
    //设置失败重试次数,默认为3
    var remainingRetries = config.messageSendMaxRetries + 1
    //获取客户端发送消息的correlationId,相同的correlationId表示对应的请求和响应
    val correlationIdStart = correlationId.get()
    //重试次数没有达到次数 且还有待发送的数据
    while (remainingRetries > 0 && outstandingProduceRequests.size > 0) {
      //获取待发送的Topic的集合
      topicMetadataToRefresh ++= outstandingProduceRequests.map(_.topic)
      //如果长时间没有刷新Topic元数据,则主动刷新元数据
      if (topicMetadataRefreshInterval >= 0 &&
          SystemTime.milliseconds - lastTopicMetadataRefreshTime > topicMetadataRefreshInterval) {
        //如果有新增的partition到Broker,此时就可以被发现,并在ProducerPool中缓存下来
        Utils.swallowError(brokerPartitionInfo.updateInfo(topicMetadataToRefresh.toSet, correlationId.getAndIncrement))
        sendPartitionPerTopicCache.clear()
        topicMetadataToRefresh.clear
        lastTopicMetadataRefreshTime = SystemTime.milliseconds
      }
     //发送消息,返回发送失败的消息,需要重新发送
      outstandingProduceRequests = dispatchSerializedData(outstandingProduceRequests)
      if (outstandingProduceRequests.size > 0) {
     
        //发送失败,sleep一段时间
        Thread.sleep(config.retryBackoffMs)
Utils.swallowError(brokerPartitionInfo.updateInfo(outstandingProduceRequests.map(_.topic).toSet, correlationId.getAndIncrement))
        sendPartitionPerTopicCache.clear()
        remainingRetries -= 1
        producerStats.resendRate.mark()
      }
    }
    //超过重试次数还是没有发送成功,抛出异常
    if(outstandingProduceRequests.size > 0) {
      producerStats.failedSendRate.mark()
      val correlationIdEnd = correlationId.get()
      throw new FailedToSendMessageException("Failed to send messages after " + config.messageSendMaxRetries + " tries.", null)
    }
  }

DefaultEventHandler内部会定期刷新Topic的元数据,并更新ProducerPool。因为Kafka集群中元数据会发生变化,比如新增Broker Server,或者Topic新增partition,partition迁移到其它Broker Server上,这些变化都需要客户端能感知到,所以采用定期发送元数据获取请求来感知,这样DefaultEventHandler就可以实现对Broker Server的平滑扩容。

DefaultEventHandler更新元数据的流程如下:

class BrokerPartitionInfo(producerConfig: ProducerConfig,
                          producerPool: ProducerPool,
                          topicPartitionInfo: HashMap[String, TopicMetadata])
        extends Logging {
  val brokerList = producerConfig.brokerList
  val brokers = ClientUtils.parseBrokerList(brokerList)
  
  def updateInfo(topics: Set[String], correlationId: Int) {
    var topicsMetadata: Seq[TopicMetadata] = Nil
    //发送TopicMetadataRequest请求获取元数据
    val topicMetadataResponse = ClientUtils.fetchTopicMetadata(topics, brokers, producerConfig, correlationId)
    topicsMetadata = topicMetadataResponse.topicsMetadata
    // throw partition specific exception
    topicsMetadata.foreach(tmd =>{

      if(tmd.errorCode == ErrorMapping.NoError) {
        //没有异常,更新Topic元数据
        topicPartitionInfo.put(tmd.topic, tmd)
      } else
        
      tmd.partitionsMetadata.foreach(pmd =>{
        if (pmd.errorCode != ErrorMapping.NoError && pmd.errorCode == ErrorMapping.LeaderNotAvailableCode) {
          warn("Error while fetching metadata %s for topic partition [%s,%d]: [%s]".format(pmd, tmd.topic, pmd.partitionId,
            ErrorMapping.exceptionFor(pmd.errorCode).getClass))
        } // any other error code (e.g. ReplicaNotAvailable) can be ignored since the producer does not need to access the replica and isr metadata
      })
    })
    producerPool.updateProducer(topicsMetadata)
  }
}

从上述流程可知,DefaultEventHandler主要通过发送TopicMetadataRequest请求来获取元数据。更新元数据后还需要更新ProducerPool中的通信连接。

DefaultEventHandler的主要流程会进入函数dispatchSerializedData,在dispatchSerializedData函数中会先将KeyedMessage按照发送的Broker Server进行分组,然后调用ProducerPool中的连接进行消息的发送。其具体实现如下:

private def dispatchSerializedData(messages: Seq[KeyedMessage[K,Message]]): Seq[KeyedMessage[K, Message]] = {
    /*
    * 将消息分组,发往相同Broker Server的消息分为一组。
    * partitionedDataOpt 类型为Map[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]
    */
    val partitionedDataOpt = partitionAndCollate(messages)
    partitionedDataOpt match {
      case Some(partitionedData) =>
        val failedProduceRequests = new ArrayBuffer[KeyedMessage[K,Message]]
        try {
          for ((brokerid, messagesPerBrokerMap) <- partitionedData) {
            if (logger.isTraceEnabled)
              messagesPerBrokerMap.foreach(partitionAndEvent =>
            //聚合消息,将发往同一个Broker Server不同TopicAndPartition的消息聚会在一起
            val messageSetPerBroker = groupMessagesToSet(messagesPerBrokerMap)
            //发送消息,将发送失败的消息返回
            val failedTopicPartitions = send(brokerid, messageSetPerBroker)
            failedTopicPartitions.foreach(topicPartition => {
              messagesPerBrokerMap.get(topicPartition) match {
                case Some(data) => failedProduceRequests.appendAll(data)
                case None => // nothing
              }
            })
          }
        } catch {
          case t: Throwable => error("Failed to send messages", t)
        }
        //返回发送失败的消息集合
        failedProduceRequests
      case None => // all produce requests failed
        messages
    }
  }

可见dispatchSerializedData关键是对KeyedMessage进行分组,分组后找到所对应的Broker Server然后发送消息。

DefaultEventHandler内部提供了getPartition函数,输入topic,key,topicPartitionList,输出partition索引,具体实现如下:

private def getPartition(
       topic: String, 
       key: Any,
       topicPartitionList: Seq[PartitionAndLeader]): Int = {
    //计算topic的分区个数
    val numPartitions = topicPartitionList.size
    if(numPartitions <= 0)
      throw new UnknownTopicOrPartitionException("Topic " + topic + " doesn't exist")
    val partition =
      if(key == null) {
        //分区键为空,则从cache中获取分区,每个topic发往一个分区
        val id = sendPartitionPerTopicCache.get(topic)
        id match {
          //cache中存在,直接返回
          case Some(partitionId) =>
            
            partitionId
          //cache不存在,则需要重新计算
          case None =>
            //筛选出leader partition列表
            val availablePartitions = topicPartitionList.filter(_.leaderBrokerIdOpt.isDefined)
            if (availablePartitions.isEmpty)
              throw new LeaderNotAvailableException("No leader for any partition in topic " + topic)
             //针对在线的 leader replica随机选择其中一个,写入缓存并返回
            val index = Utils.abs(Random.nextInt) % availablePartitions.size
            val partitionId = availablePartitions(index).partitionId
            sendPartitionPerTopicCache.put(topic, partitionId)
            partitionId
        }
      } else
        /分区键不为空,则根据分区函数选择其中的一个分区
        partitioner.partition(key, numPartitions)
    if(partition < 0 || partition >= numPartitions)
      throw new UnknownTopicOrPartitionException("Invalid partition id: " + partition + " for topic " + topic +
        "; Valid values are in the inclusive range of [0, " + (numPartitions-1) + "]")
    trace("Assigning message of topic %s and key %s to a selected partition %d".format(topic, if (key == null) "[none]" else key.toString, partition))
    partition
  }

可见如果没有传入分区键,系统会从在线的leader replica中选择一个发送,并且在一个周期内固定的往这个partition发送,直到下一次重新发送Topic元数据请求,重新刷新本地的元数据数据,清除掉之前的映射关系,重新计算下一个周期内Topic选定的Partition。因此客户端看到的现象是一段时间内集中发往某个分区,过一段时间又集中发往另一个分区。如果用户传入了分区键,就会调用分区函数进行分区,这样相同分区键的消息就可以发往同一个分区。

你可能感兴趣的:(消息队列,kafka,分布式)