ps.本文所有源码都基于kafka-0.10.0.1
Kafka提供了Java版本的生产者实现--KafkaProducer,使用KafkaProducer的API可以轻松实现同步/异步发送消息、批量发送、超时重发等复杂的功能,KafkaProducer是线程安全的,多个线程之间可以共享实用同一个KafkaProducer对象。
下面先看一个使用上的小例子:
public static void main(String[] args) { boolean isAsync = args.length == 0 || !args[0].trim().equalsIgnoreCase("sync"); Properties properties = new Properties(); properties.put("bootstrap.servers", "localhost:9092"); properties.put("client.id", "DemoProducer"); properties.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer"); properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); KafkaProducer producer = new KafkaProducer<>(properties); String topic = "test"; int messageNo = 1; while(true) { String messageStr = "Message_" + messageNo; long startTime = System.currentTimeMillis(); if(isAsync) { //ProducerRecord可以当成一个消息,包含key,value,和topic //异步要有回调函数 producer.send(new ProducerRecord(topic, messageNo, messageStr), new DemoCallBack(startTime, messageNo, messageStr)); } else { //同步发送 try { //KafkaProducer.send()返回的类型是Future//这里通过Future.get()方法,阻塞当前线程,等待Kafka服务端的ACK响应 producer.send(new ProducerRecord(topic, messageNo, messageStr)).get(); System.out.println("Sent message:(" + messageNo + "," + messageStr + ")"); } catch (InterruptedException | ExecutionException e) { e.printStackTrace(); } } ++messageNo; //递增消息的key } } static class DemoCallBack implements Callback { //回调对象 private final long startTime; private final int key; private final String message; public DemoCallBack(long startTime, int key, String message) { this.startTime = startTime; this.key = key; this.message = message; } /** * 生产者成功发送消息,收到kafka服务端发来的ack确认消息后,会调用此回调函数 * @param metadata 生产者发送的消息的元数据,如果发送过程中出现异常,此参数为null * @param exception 发送过程中出现的异常,如果发送成功,则此参数为null */ @Override public void onCompletion(RecordMetadata metadata, Exception exception) { long elapsedTime = System.currentTimeMillis() - startTime; if(metadata != null) { System.out.println("message(" + key + ", " + message + ") sent to partition(" + metadata.partition() + "), " + "offest(" + metadata.offset() + ") in " + elapsedTime + " ms"); } else { exception.printStackTrace(); } } }
分析一下KafkaProducer发送消息的整个流程
我们一步一步的分析,在上面的小例子中,调用了KafkaProducer.send()函数如下:
@Override public Futuresend(ProducerRecord<K, V> record, Callback callback) { // intercept the record, which can be potentially modified; this method does not throw exceptions ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record); return doSend(interceptedRecord, callback); }
然后就到了我们的第1步,ProducerInterceptors对消息进行拦截,interceptors默认是null,可以通过继承ProducerInterceptor接口实现自己的拦截器,并通过“interceptor.classes”配置,可以配置多个,可以看到上面的代码调用了函数ProducerInterceptors.onSend(ProducerRecord),这个函数的参数和返回类型都是ProducerRecord,所以我们在函数中可以对ProducerRecord进行操作,比如一定条件下返回null,就是将这个消息过滤掉;或者为消息的value增加时间戳等等。
接下来执行了doSend()函数如下:
private FuturedoSend(ProducerRecord<K, V> record, Callback callback) { TopicPartition tp = null; try { // first make sure the metadata for the topic is available long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs); long remainingWaitMs = Math.max(0, this.maxBlockTimeMs - waitedOnMetadataMs); byte[] serializedKey; try { serializedKey = keySerializer.serialize(record.topic(), record.key()); } catch (ClassCastException cce) { throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() + " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() + " specified in key.serializer"); } byte[] serializedValue; try { serializedValue = valueSerializer.serialize(record.topic(), record.value()); } catch (ClassCastException cce) { throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() + " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() + " specified in value.serializer"); } int partition = partition(record, serializedKey, serializedValue, metadata.fetch()); int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue); ensureValidRecordSize(serializedSize); tp = new TopicPartition(record.topic(), partition); long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp(); log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition); // producer callback will make sure to call both 'callback' and interceptor callback Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp); RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs); if (result.batchIsFull || result.newBatchCreated) { log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition); this.sender.wakeup(); } return result.future; // handling exceptions and record the errors; // for API exceptions return them in the future, // for other exceptions throw directly } // 省略一堆catch
}
这里我们有必要解释一下waitOnMetadata函数,这个函数的作用是确定要发送的topic的元数据是可用的,下面我们解释一下元数据是什么?
我们从头说,kafka上一个topic可以有多个分区,这些分区的leader副本分配在服务端不同的Broker上,而且这个分布是动态变化的(比如down机重新选leader或者水平扩展分区等),而生产者在发送消息的时候只指定了topic,并未确定分区编号,所以KafkaProducer要将此消息追加到指定的Topic的某个分区的Leader副本中,首先需要知道Topic的分区数量,进过路由后确定目标分区,之后KafkaProducer需要知道目标分区的leader副本所在的服务器地址、端口等信息,才能建立连接,发送到服务端。
在KafkaProducer中维护了kafka集群的元数据,这些元数据记录了:某个topic中有哪几个分区,每个分区的leader副本分配哪个节点上,follower副本分配哪些节点上,哪些副本在ISR集合中以及这些节点的ip,port。
本文参考:
1、书籍《Apache Kafka源码分析》徐郡明 编著
2、图片来源https://blog.csdn.net/zhanglh046/article/details/72845477