你绝对能看懂的Kafka源代码分析-Sender类代码分析

目录:

《Kafka Producer设计分析》

《KafkaProducer类代码分析》

《RecordAccumulator类代码分析》

《Sender类代码分析》

《NetworkClient类代码分析》

-------------------------------------------------------------------

上一节《RecordAccumulator类代码分析》

上文我们讲到待发送的消息已经缓存在RecordAccumulator中。KafkaProducer得知有Batch已经满了,就会通知Sender开始干活。本篇我们就来看一看Sender的设计和实现。

Sender在KafkaProducer中以一个单独的线程运行,把RecordAccumulator中封箱后的ProducerBatch发送出去。网络IO是由Sender触发的。

Sender把ProducerBatch按照要发往的broker node进行分类,发往同一个node的ProducerBatch会被打包在一起,只触发一次网络IO。例如主题A分区1和主题B分区2存放在同一个node上,那么Send在IO前会把这两个主题分区的消息打包在一起,生成一个请求对象,一并发过去,这样大大减少了网络IO的开销。

我还是以发送快递举例类比,这次是把货箱装车的场景:

你绝对能看懂的Kafka源代码分析-Sender类代码分析_第1张图片

绿色衣服的小人是装箱工作人员,负责把货箱装到对应的车上。可以看到车是按照快递站点区分的,那么朝阳北京和河北唐山的货箱都要装到华北一区的车上。这样一趟车就可以把北京和唐山的包裹一起发送出去,而不需要两辆车分别发送。

其实这个绿色小人就是sender,它负责把不同主题分区的ProducerBatch(货箱)封装进同一个ClientRequest(货车集装箱),前提是这些主题分区存储在同一个broker上(都属华北一区),然后通过NetWorkClient(货车)真正网络IO发往Broker(华北一区站点)。

下面进入Sender的代码讲解。

Sender类

从Sender设计可以看到,Sender内部有两个重要的实例引用,一个就是仓库:RecordAccumulator,另外一个就是那一辆辆车:负责网络IO的KafkaClient对象。KafkaClient负责让不同的车找到通往各自目的地的路,把货物运送过去。KafkaClient后面会单独讲解,本篇我们专注于分析sender怎么把消息重新分组装车。

run()方法

Sender实现Runnable接口,重写run方法,这也是线程的启动入口,代码如下:

 

public void run() {
    log.debug("Starting Kafka producer I/O thread.");

    // main loop, runs until close is called
    while (running) {
        try {
            run(time.milliseconds());
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");

    // okay we stopped accepting requests but there may still be
    // requests in the accumulator or waiting for acknowledgment,
    // wait until these are completed.
    while (!forceClose && (this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0)) {
        try {
            run(time.milliseconds());
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }
    if (forceClose) {
        // We need to fail all the incomplete batches and wake up the threads waiting on
        // the futures.
        log.debug("Aborting incomplete batches due to forced shutdown");
        this.accumulator.abortIncompleteBatches();
    }
    try {
        this.client.close();
    } catch (Exception e) {
        log.error("Failed to close network client", e);
    }

    log.debug("Shutdown of Kafka producer I/O thread has completed.");
}

run 方法主循环中调用 run(time.milliseconds());方法,sender的主要逻辑实际在这个方法中。一旦running为false跳出主循环,根据状态判断是继续发送完成,还是强制关闭。强制关闭的话,通过accumulator.abortIncompleteBatches()把RecourdAccumulator中incomplete集合中保存的未完成ProducerBatch做相应的处理,对他们进行封箱,防止继续有新的消息被追加进来,然后从所属Deque中删除掉,释放掉BufferPool中的空间。这部分核心的代码如下:

void abortBatches(final RuntimeException reason) {
    for (ProducerBatch batch : incomplete.copyAll()) {
        Deque dq = getDeque(batch.topicPartition);
        synchronized (dq) {
            batch.abortRecordAppends();
            dq.remove(batch);
        }
        batch.abort(reason);
        deallocate(batch);
    }
}

run 方法最后关闭负责网络IO 的NetworkClient。

 

run(long now) 方法

 

sender的主要逻辑实际在这个方法中。这个方法前面一大段逻辑都是处理开启事物的消息发送,我们为了简化代码理解,不做这部分分析,直接进入非事务的消息发送逻辑中。

非事务的消息发送只有两行关键代码:

long pollTimeout = sendProducerData(now);
client.poll(pollTimeout, now);

1、准备发送的数据请求

2、把准备好的消息请求真正发送出去

下面我们先看准备数据请求的方法sendProducerData()。

sendProducerData()方法

这个方法是sender的主流程,需要认真理解。我先贴代码,代码下紧跟逻辑讲解。

private long sendProducerData(long now) {

    Cluster cluster = metadata.fetch();
    // get the list of partitions with data ready to send
    RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

    // if there are any partitions whose leaders are not known yet, force metadata update
    if (!result.unknownLeaderTopics.isEmpty()) {
        // The set of topics with unknown leader contains topics with leader election pending as well as
        // topics which may have expired. Add the topic again to metadata to ensure it is included
        // and request metadata update, since there are messages to send to the topic.
        for (String topic : result.unknownLeaderTopics)
            this.metadata.add(topic);

        log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
            result.unknownLeaderTopics);
        this.metadata.requestUpdate();
    }

    // remove any nodes we aren't ready to send to
    Iterator iter = result.readyNodes.iterator();
    long notReadyTimeout = Long.MAX_VALUE;
    while (iter.hasNext()) {
        Node node = iter.next();
        if (!this.client.ready(node, now)) {
            iter.remove();
            notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
        }
    }

    // create produce requests
    Map> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
    addToInflightBatches(batches);
    if (guaranteeMessageOrder) {
        // Mute all the partitions drained
        for (List batchList : batches.values()) {
            for (ProducerBatch batch : batchList)
                this.accumulator.mutePartition(batch.topicPartition);
        }
    }

    accumulator.resetNextBatchExpiryTime();
    List expiredInflightBatches = getExpiredInflightBatches(now);
    List expiredBatches = this.accumulator.expiredBatches(now);
    expiredBatches.addAll(expiredInflightBatches);

    // Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics
    // for expired batches. see the documentation of @TransactionState.resetProducerId to understand why
    // we need to reset the producer id here.
    if (!expiredBatches.isEmpty())
        log.trace("Expired {} batches in accumulator", expiredBatches.size());
    for (ProducerBatch expiredBatch : expiredBatches) {
        String errorMessage = "Expiring " + expiredBatch.recordCount + " record(s) for " + expiredBatch.topicPartition
            + ":" + (now - expiredBatch.createdMs) + " ms has passed since batch creation";
        failBatch(expiredBatch, -1, NO_TIMESTAMP, new TimeoutException(errorMessage), false);
        if (transactionManager != null && expiredBatch.inRetry()) {
            // This ensures that no new batches are drained until the current in flight batches are fully resolved.
            transactionManager.markSequenceUnresolved(expiredBatch.topicPartition);
        }
    }
    sensors.updateProduceRequestMetrics(batches);

    // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
    // loop and try sending more data. Otherwise, the timeout will be the smaller value between next batch expiry
    // time, and the delay time for checking data availability. Note that the nodes may have data that isn't yet
    // sendable due to lingering, backing off, etc. This specifically does not include nodes with sendable data
    // that aren't ready to send since they would cause busy looping.
    long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
    pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
    pollTimeout = Math.max(pollTimeout, 0);
    if (!result.readyNodes.isEmpty()) {
        log.trace("Nodes with data ready to send: {}", result.readyNodes);
        // if some partitions are already ready to be sent, the select time would be 0;
        // otherwise if some partition already has some data accumulated but not ready yet,
        // the select time will be the time difference between now and its linger expiry time;
        // otherwise the select time will be the time difference between now and the metadata expiry time;
        pollTimeout = 0;
    }
    sendProduceRequests(batches, now);
    return pollTimeout;
}

主要逻辑如下:

  1. 根据accumulator中待发送消息对应的主题分区,检查kafka集群对应的node哪些可用,哪些不可用。得到ReadyCheckResult 结果
  2. 如果ReadyCheckResult 中的unknownLeaderTopics有值,那么则需要更新Kafka集群元数据
  3. 循环readyNodes,检查KafkaClient对该node是否符合网络IO的条件,不符合的从集合中删除。
  4. 通过accumulator.drain()方法把待发送的消息按node号进行分组,返回Map>
  5. 把待发送的batch添加到Sender的inFlightBatches中。inFlightBatches是Map>,可见是按照主题分区来存储的。
  6. 获取所有过期的batch,循环做过期处理
  7. 计算接下来外层程序逻辑中调用NetWorkClient的poll操作时的timeout时间
  8. 调用sendProduceRequests()方法,将待发送的ProducerBatch封装成为ClientRequest,然后“发送”出去。注意这里的发送,其实只是加入发送的队列。等到NetWorkClient进行poll操作时,才发生网络IO
  9. 返回第7步中计算的poll操作timeout时间

 

下面我们重点来来第8步的sendProduceRequests()方法,这个方法中实际是按照node编号循环第4步得到的Map> batches,取得node对应的List,然后调用sendProduceRequest()。封装ClientRequest是在这个方法中进行的。

sendProduceRequest()方法

代码如下,不想看代码可以直接看逻辑分析。

private void sendProduceRequest(long now, int destination, short acks, int timeout, List batches) {
        if (batches.isEmpty())
            return;

        Map produceRecordsByPartition = new HashMap<>(batches.size());
        final Map recordsByPartition = new HashMap<>(batches.size());

        // find the minimum magic version used when creating the record sets
        byte minUsedMagic = apiVersions.maxUsableProduceMagic();
        for (ProducerBatch batch : batches) {
            if (batch.magic() < minUsedMagic)
                minUsedMagic = batch.magic();
        }

        for (ProducerBatch batch : batches) {
            TopicPartition tp = batch.topicPartition;
            MemoryRecords records = batch.records();

            // down convert if necessary to the minimum magic used. In general, there can be a delay between the time
            // that the producer starts building the batch and the time that we send the request, and we may have
            // chosen the message format based on out-dated metadata. In the worst case, we optimistically chose to use
            // the new message format, but found that the broker didn't support it, so we need to down-convert on the
            // client before sending. This is intended to handle edge cases around cluster upgrades where brokers may
            // not all support the same message format version. For example, if a partition migrates from a broker
            // which is supporting the new magic version to one which doesn't, then we will need to convert.
            if (!records.hasMatchingMagic(minUsedMagic))
                records = batch.records().downConvert(minUsedMagic, 0, time).records();
            produceRecordsByPartition.put(tp, records);
            recordsByPartition.put(tp, batch);
        }

        String transactionalId = null;
        if (transactionManager != null && transactionManager.isTransactional()) {
            transactionalId = transactionManager.transactionalId();
        }
        ProduceRequest.Builder requestBuilder = ProduceRequest.Builder.forMagic(minUsedMagic, acks, timeout,
                produceRecordsByPartition, transactionalId);
        RequestCompletionHandler callback = new RequestCompletionHandler() {
            public void onComplete(ClientResponse response) {
                handleProduceResponse(response, recordsByPartition, time.milliseconds());
            }
        };

        String nodeId = Integer.toString(destination);
        ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0,
                requestTimeoutMs, callback);
        client.send(clientRequest, now);
        log.trace("Sent produce request to {}: {}", nodeId, requestBuilder);
    }

这个方法的输入的List batches已经是按照node分好组的,列表中的ProducerBatch不一定在同一个主题分区,但所属主题分区肯定在同一个node上。

1、循环batches,MemoryRecords records = batch.records(); 获取ProducerBatch中组织好的消息内容也。按照主题分区分堆ProducerBatch和MemoryRecords得到两个以TopicPartition为Key的Map,produceRecordsByPartition和recordsByPartition

2、声明ProduceRequest.Builder对象,他内部有引用指向produceRecordsByPartition

3、生成ClientRequest对象

4、调用NetWorkClient的send方法。

在NetWorkClient的send方法中,通过Builder对象的build方法生成ProduceRequest对象。再通过request.toSend(destination, header),得到NetworkSend对象send。生成InFlightRequest对象,保存起来。最后调用selector.send(send);这个方法会把send请求加入队列等待随后的poll方法把它发送出去。

讲解到这里,因为涉及到NIO的概念,而Kafka又对NIO做了封装,所以会难以理解。本篇大家只要能明白我们在sendProduceRequest()方法中,生成按kafka node划分的请求对象ClientRequest,然后交给NetWorkClient处理。关于NetWorkClient如何通过NIO方式把ClientRequest发送出去,我们以后的章节再讲。

总结

Send线程中主要做了如下几件事:

  1. 把按照主题分区分堆的ProducerBatch最终转化为以node节点区分的一组ClientRequest
  2. 调用NetWorkClient的send方法,做好发送ClientRequest的准备
  3. 全部就绪后调用NetWorkClient的poll方法,触发网络IO,把消息真正发送出去

下一节《NetworkClient类代码分析》

你可能感兴趣的:(Kafka源代码分析)