Kafka源码分析--生产者

ps.本文所有源码都基于kafka-0.10.0.1

Kafka提供了Java版本的生产者实现--KafkaProducer,使用KafkaProducer的API可以轻松实现同步/异步发送消息、批量发送、超时重发等复杂的功能,KafkaProducer是线程安全的,多个线程之间可以共享实用同一个KafkaProducer对象。

下面先看一个使用上的小例子:

public static void main(String[] args) {
    boolean isAsync = args.length == 0 || !args[0].trim().equalsIgnoreCase("sync");
    Properties properties = new Properties();
    properties.put("bootstrap.servers", "localhost:9092");
    properties.put("client.id", "DemoProducer");
    properties.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
    properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    KafkaProducer producer = new KafkaProducer<>(properties);
    String topic = "test";

    int messageNo = 1;
    while(true) {
        String messageStr = "Message_" + messageNo;
        long startTime = System.currentTimeMillis();
        if(isAsync) {
            //ProducerRecord可以当成一个消息,包含key,value,和topic
            //异步要有回调函数
            producer.send(new ProducerRecord(topic, messageNo, messageStr), new DemoCallBack(startTime, messageNo, messageStr));
        } else { //同步发送
            try {
                //KafkaProducer.send()返回的类型是Future
                //这里通过Future.get()方法,阻塞当前线程,等待Kafka服务端的ACK响应
                producer.send(new ProducerRecord(topic, messageNo, messageStr)).get();
                System.out.println("Sent message:(" + messageNo + "," + messageStr + ")");
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        }
        ++messageNo; //递增消息的key
    }
}

static class DemoCallBack implements Callback { //回调对象
    private final long startTime;
    private final int key;
    private final String message;

    public DemoCallBack(long startTime, int key, String message) {
        this.startTime = startTime;
        this.key = key;
        this.message = message;
    }

    /**
     * 生产者成功发送消息,收到kafka服务端发来的ack确认消息后,会调用此回调函数
     * @param metadata 生产者发送的消息的元数据,如果发送过程中出现异常,此参数为null
     * @param exception 发送过程中出现的异常,如果发送成功,则此参数为null
     */
    @Override 
    public void onCompletion(RecordMetadata metadata, Exception exception) {
        long elapsedTime = System.currentTimeMillis() - startTime;
        if(metadata != null) {
            System.out.println("message(" + key + ", " + message + ") sent to partition(" + metadata.partition() + "), " + "offest(" + metadata.offset() + ") in " + elapsedTime + " ms");
        } else {
            exception.printStackTrace();
        }
    }
}

分析一下KafkaProducer发送消息的整个流程

Kafka源码分析--生产者_第1张图片

  1. ProducerInterceptors对消息进行拦截
  2. Serializer对消息的key和value进行序列化
  3. Partitioner为消息选择合适的Partition
  4. RecordAccumulator收集消息,实现压缩并批量发送
  5. Sender从RecordAccumulator获取消息
  6. 构造ClientRequest
  7. 将ClientRequest交给NetworkClient,准备发送
  8. NetworkClient将请求放入KafkaChannel的缓存
  9. 执行网络I/O,发送请求
  10. 收到响应,调用ClientRequest的回调函数
  11. 调用RecordBatch的回调函数,最用调用每个消息上注册的回调函数

我们一步一步的分析,在上面的小例子中,调用了KafkaProducer.send()函数如下:

@Override
public Future send(ProducerRecord<K, V> record, Callback callback) {
    // intercept the record, which can be potentially modified; this method does not throw exceptions
    ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);
    return doSend(interceptedRecord, callback);
}

然后就到了我们的第1步,ProducerInterceptors对消息进行拦截,interceptors默认是null,可以通过继承ProducerInterceptor接口实现自己的拦截器,并通过“interceptor.classes”配置,可以配置多个,可以看到上面的代码调用了函数ProducerInterceptors.onSend(ProducerRecord),这个函数的参数和返回类型都是ProducerRecord,所以我们在函数中可以对ProducerRecord进行操作,比如一定条件下返回null,就是将这个消息过滤掉;或者为消息的value增加时间戳等等。

接下来执行了doSend()函数如下:

private Future doSend(ProducerRecord<K, V> record, Callback callback) {
    TopicPartition tp = null;
    try {
        // first make sure the metadata for the topic is available
        long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);
        long remainingWaitMs = Math.max(0, this.maxBlockTimeMs - waitedOnMetadataMs);
        byte[] serializedKey;
        try {
            serializedKey = keySerializer.serialize(record.topic(), record.key());
        } catch (ClassCastException cce) {
            throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                    " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                    " specified in key.serializer");
        }
        byte[] serializedValue;
        try {
            serializedValue = valueSerializer.serialize(record.topic(), record.value());
        } catch (ClassCastException cce) {
            throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                    " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                    " specified in value.serializer");
        }
        int partition = partition(record, serializedKey, serializedValue, metadata.fetch());
        int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
        ensureValidRecordSize(serializedSize);
        tp = new TopicPartition(record.topic(), partition);
        long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
        log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
        // producer callback will make sure to call both 'callback' and interceptor callback
        Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);
        RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
        if (result.batchIsFull || result.newBatchCreated) {
            log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
            this.sender.wakeup();
        }
        return result.future;
        // handling exceptions and record the errors;
        // for API exceptions return them in the future,
        // for other exceptions throw directly
    } // 省略一堆catch
}

这里我们有必要解释一下waitOnMetadata函数,这个函数的作用是确定要发送的topic的元数据是可用的,下面我们解释一下元数据是什么?

我们从头说,kafka上一个topic可以有多个分区,这些分区的leader副本分配在服务端不同的Broker上,而且这个分布是动态变化的(比如down机重新选leader或者水平扩展分区等),而生产者在发送消息的时候只指定了topic,并未确定分区编号,所以KafkaProducer要将此消息追加到指定的Topic的某个分区的Leader副本中,首先需要知道Topic的分区数量,进过路由后确定目标分区,之后KafkaProducer需要知道目标分区的leader副本所在的服务器地址、端口等信息,才能建立连接,发送到服务端。

在KafkaProducer中维护了kafka集群的元数据,这些元数据记录了:某个topic中有哪几个分区,每个分区的leader副本分配哪个节点上,follower副本分配哪些节点上,哪些副本在ISR集合中以及这些节点的ip,port。

本文参考:

1、书籍《Apache Kafka源码分析》徐郡明 编著

2、图片来源https://blog.csdn.net/zhanglh046/article/details/72845477



你可能感兴趣的:(kafka)