发送的一般步骤
producer发送消息底层完全是异步发送,通过Future同时提供了同步发送和异步回调发送
org.apache.kafka.clients.producer.Callback
接口处理消息发送后的逻辑。此接口比较粗糙,只有一个onCompletion方法,其实如果提供一个onSuccess和一个OnFailed方法就好了。onCompletion方法的两个参数RecordMetadata和Exception不会同时非空,即至少只有一个是null。消息发送成功时Exception为null,消息发送失败时,RecordMetadata时null。kafka的错误类型主要包含两类,可重试异常和不可重试异常
可重试异常,对于可重试异常,如果在producer中配置了重试次数,只要在规定的重试次数内自定恢复了,便不会出现在onCompletion方法的exception中。如果超过了重试次数仍没有成功,则仍然会进exception中,此时需要程序自行处理
LeaderNotAvailableException: 通常出现在 leader换届选举期间,表示分区的leader副本不可用。一般是瞬时异常,重试之后可以自行恢复
NotControllerException:表示Controller在经历新一轮的选举,controller 当前不可用。一般可以通过重试机制自行恢复
NetworkException:网络瞬时故障导致的异常,可重试。
所有可重试异常都继承org.apache.kafka.common.errors.RetriableException
.所以未即成此异常类的异常都属于不可重试异常,即无法处理的问题,比如发送消息大小过大,序列化异常等
发送实例代码
@Test
public void testSync() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-master:9092,kafka-slave1:9093,kafka-slave2:9094");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
props.put(ProducerConfig.RETRIES_CONFIG, "10");
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, "1000");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<Integer, String> producer = new KafkaProducer<>(props);
//同一个key的消息放到同一个分区,不指定key则均衡分布,消息分区的选择是在客户端进行的
String key = "test";
String topic = "testTopic";
for (int i = 0; i < 100; i++) {
try {
String messageStr = "hello world " + i;
ProducerRecord producerRecord = new ProducerRecord(topic, key, messageStr);
Future<RecordMetadata> future = producer.send(producerRecord);
List<PartitionInfo> partitionInfos = producer.partitionsFor(topic);
for (PartitionInfo partitionInfo : partitionInfos) {
logger.info(partitionInfo.toString());
}
//同步调用
RecordMetadata recordMetadata = future.get();
logger.info(ToStringBuilder.reflectionToString(recordMetadata));
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
logger.error(e.getMessage(), e);
} catch (ExecutionException e) {
logger.error(e.getMessage(), e);
}
}
producer.close();
}
ProducerRecord
public class ProducerRecord<K, V> {
//消息主题
private final String topic;
//消息分区
private final Integer partition;
//headers 字段是消息的头部,Kafka 0.11.x 版本才引入的,它大多用来设定一些与应用相关的信息
private final Headers headers;
//消息Key
private final K key;
//消息体
private final V value;
//消息的时间戳,
private final Long timestamp;
... 省略 ...
}
异步发送实例。对于同一个分区来说,如果消息1在消息2之前发送,那么KafkaProducer可以保证对应的callback1在callback2之前调用,即回调函数的调用可以保证分区有序
@Test
public void testASync() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-master:9092,kafka-slave1:9093,kafka-slave2:9094");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
props.put(ProducerConfig.RETRIES_CONFIG, "10");
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, "1000");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<Integer, String> producer = new KafkaProducer<>(props);
String key = "testAsync";
String topic = "testTopic";
CountDownLatch countDownLatch = new CountDownLatch(100);
for (int i = 100; i < 200; i++) {
String messageStr = "hello world " + i;
ProducerRecord producerRecord = new ProducerRecord(topic, key, messageStr);
producer.send(producerRecord, new Callback() {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
//exception与recordMetadata不会同时非空,即至少有一个为null
if (e != null) {
if (e instanceof RetriableException) {
//处理可重试瞬时异常
} else {
//处理不可重试瞬时异常
logger.error(e.getMessage(), e);
}
}
//消息发送成功
if (recordMetadata != null) {
logger.info(ToStringBuilder.reflectionToString(recordMetadata));
}
countDownLatch.countDown();
}
});
}
try {
countDownLatch.await();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
producer.close();
}
ProducerRecord
中的key
有两个用途
Topic(主题)
的哪个Partition(分区)
,默认分区策略将拥有相同key
的消息写到同一个Partition(分区)
如果key
为null,并且使用了默认分区器,则消息将被随机发送到Topic(主题)
内各个**可用的Partition(分区)**上,默认分区器使用轮训算法(Round Robin)将消息均衡地分布到各个Partition(分区)
上。
如果key
不为null,并且使用了默认分区器,kafka客户端会对key
进行散列(使用kafka自己的算法,java版本的升级不影响散列值),根据散列值把消息映射到特定的Partition(分区)
上。在进行映射时,会使用Topic(主题)
所有的Partition(分区)
(不仅仅是可用Partition(分区)
),如果写入数据的Partition(分区)
是不可用的就会发生错误。
只有在不改变Topic(主题)Partition(分区)
数量的情况下,key
与Partition(分区)
之间的映射才能保持不变。如果要使用key
来映射分区,最好在创建Topic(主题)
的时候就把Partition(分区)
规划好,并且永远不要增加新的Partition(分区)
默认的分区策略实现类org.apache.kafka.clients.producer.internals.DefaultPartitioner
public class DefaultPartitioner implements Partitioner {
private final ConcurrentMap<String, AtomicInteger> topicCounterMap = new ConcurrentHashMap<>();
public void configure(Map<String, ?> configs) {
}
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
if (keyBytes == null) {
int nextValue = nextValue(topic);
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0) {
int part = Utils.toPositive(nextValue) % availablePartitions.size();
return availablePartitions.get(part).partition();
} else {
// no partitions are available, give a non-available partition
return Utils.toPositive(nextValue) % numPartitions;
}
} else {
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
private int nextValue(String topic) {
AtomicInteger counter = topicCounterMap.get(topic);
if (null == counter) {
counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
AtomicInteger currentCounter = topicCounterMap.putIfAbsent(topic, counter);
if (currentCounter != null) {
counter = currentCounter;
}
}
return counter.getAndIncrement();
}
public void close() {
}
}
自定义分区策略
org.apache.kafka.clients.producer.Partitioner
接口自定义分区策略
public class SmsPartition implements Partitioner {
/**
* @param topic topic名称
* @param key 消息key或者null
* @param keyBytes 消息键值序列化字节数组或 null
* @param value 消息体或 null
* @param valueBytes 消息体序列化字节数组或 null
* @param cluster 集群元数据
*/
@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
List<PartitionInfo> partitionInfos = cluster.partitionsForTopic(topic);
//这里定义key的类型是字符串,序列化也是字符串
if (keyBytes == null || !(key instanceof String)) {
throw new IllegalArgumentException("key不能为空,且必须是字符串类型");
}
int size = partitionInfos.size();
if (size <= 1) {
return size;
} else {
String keyString = (String) key;
//key值为sms的消息分配最后一个分区
if ("sms".equals(keyString)) {
return size - 1;
}
return Math.abs(Utils.murmur2(keyBytes) % (size - 1));
}
}
@Override
public void close() {
//关闭分区,主要为了关闭那些创建分区时初始化的系统资源等
}
@Override
public void configure(Map<String, ?> configs) {
}
}
自定义分区案例
/**
* 1. 创建topic
* bin/kafka-topics.sh --create --zookeeper zookeeper:2181 --replication-factor 3 --partitions 5 --topic testTopic
* 2. 自定义分区
* 3. 运行后分区的消息数
* bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka-master:9092 -topic testTopic --time -1
* testTopic:2:0
* testTopic:4:100
* testTopic:1:0
* testTopic:3:0
* testTopic:0:200
*/
@Test
public void testPartition() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-master:9092,kafka-slave1:9093,kafka-slave2:9094");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
props.put(ProducerConfig.RETRIES_CONFIG, "10");
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, "1000");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "cn.jannal.kafka.partition.SmsPartition");
KafkaProducer<Integer, String> producer = new KafkaProducer<>(props);
String key = "sms";
String topic = "testTopic";
for (int i = 200; i < 300; i++) {
try {
String messageStr = "hello world " + i;
ProducerRecord producerRecord = new ProducerRecord(topic, key, messageStr);
Future<RecordMetadata> future = producer.send(producerRecord);
List<PartitionInfo> partitionInfos = producer.partitionsFor(topic);
for (PartitionInfo partitionInfo : partitionInfos) {
logger.info(partitionInfo.toString());
}
//同步调用
RecordMetadata recordMetadata = future.get();
logger.info(ToStringBuilder.reflectionToString(recordMetadata));
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
logger.error(e.getMessage(), e);
} catch (ExecutionException e) {
Thread.currentThread().interrupt();
logger.error(e.getMessage(), e);
}
}
producer.close();
}
interceptor主要用于实现 clients 端的定制化控制逻辑,对于 producer 而言, interceptor 使得用户在消息发送前以及 producer 回调逻辑前对消息做一些定制化需求,比如修改消息、统计等,多个interceptor形成一个拦截器链
自定义拦截器需要实现org.apache.kafka.clients.producer.ProducerInterceptor
接口
interceptor 可能运行在多个线程中,因此在具体实现时需要自行确保线程安全。另外,若指定了多个 interceptor,则 producer 将按照指定顺序调用它们,同时把每个interceptor 中捕获的异常记录到错误日志中而不是向上传递
public class CounterProducerlnterceptor implements ProducerInterceptor {
private static AtomicInteger sendCounter = new AtomicInteger(0);
private static AtomicInteger successCounter = new AtomicInteger(0);
private static AtomicInteger failedCounter = new AtomicInteger(0);
/**
* 1. 消息被序列化以计算分区前调用该方法,该方法中可以对消息做任何操作,
* 但最好保证不要修改消息所属的topic和分区
* 2. 该方法运行在发送主线程中
*/
@Override
public ProducerRecord onSend(ProducerRecord record) {
sendCounter.incrementAndGet();
return record;
}
/**
* 1. 在消息被应答之前或消息发送失败时调用
* 2. 该方法在producer的I/O线程中,尽量不要放入耗时的业务逻辑
*/
@Override
public void onAcknowledgement(RecordMetadata metadata, Exception exception) {
if (exception == null) {
successCounter.incrementAndGet();
} else {
failedCounter.incrementAndGet();
}
}
/**
* 关闭interceptor执行一些资源清理工作
* 关闭producer时,会调用此方法
*/
@Override
public void close() {
System.out.println("发送个数" + sendCounter.get());
System.out.println("成功个数" + successCounter.get());
System.out.println("失败个数" + failedCounter.get());
}
@Override
public void configure(Map<String, ?> configs) {
}
}
实际案例
@Test
public void testProducerInterceptor() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-master:9092,kafka-slave1:9093,kafka-slave2:9094");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
props.put(ProducerConfig.RETRIES_CONFIG, "10");
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, "1000");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
//构建拦截器链
List<String> interceptors = new ArrayList<>();
interceptors.add("cn.jannal.kafka.interceptor.CounterProducerlnterceptor");
props.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, interceptors);
KafkaProducer<Integer, String> producer = new KafkaProducer<>(props);
String topic = "testTopic";
for (int i = 0; i < 100; i++) {
try {
String messageStr = "hello world " + i;
ProducerRecord producerRecord = new ProducerRecord(topic, null, messageStr);
Future<RecordMetadata> future = producer.send(producerRecord);
List<PartitionInfo> partitionInfos = producer.partitionsFor(topic);
for (PartitionInfo partitionInfo : partitionInfos) {
logger.info(partitionInfo.toString());
}
//同步调用
RecordMetadata recordMetadata = future.get();
logger.info(ToStringBuilder.reflectionToString(recordMetadata));
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
logger.error(e.getMessage(), e);
} catch (ExecutionException e) {
logger.error(e.getMessage(), e);
}
}
//关闭producer时,才会调用interceptor的close方法
producer.close();
}
max.block.ms=60000ms
:默认60000msacks=all或者-1
retries:Integer.MAX_VALUE
:无限重试可恢复的异常max.in.flight.requests.per.connection = 1
:防止消息乱序,但是可能会降低吞吐量unclean.leader.election.enable =false
:不允许非ISR中的副本被选举为leaderreplication.factor=3
:多副本备份min.insync.replicas=2
:控制某条消息至少被写入到 ISR 中的多少个副本才算成功。producer设置acks=-1此参数才有意义replication.factor >min.insync.replicas
:如果设置相等,只要有一个副本挂掉,分区就无法正常工作。一般配置replication.factor = min.insync.replicas+1