Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
简单来说,就是一个分布式的数据流分发平台。
官网下载kafka 进入bin目录,因kafka依赖zookeeper作为分布式协同,需要先启动zookeeper,kafka包中,已经有zookeeper。以mac为例,进入kafka目录下
Properties properties = new Properties();
properties.put("bootstrap.servers", "127.0.0.1:9092");
properties.put("client.id","DemoProducer");
properties.put("acks","0");
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = null;
Person person= null;
try {
producer = new KafkaProducer<String, String>(properties);
for (int i = 0; i < 100; i++) {
producer.send(new ProducerRecord<String, String>("Message",null, i+""));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
producer.close();
}
Properties properties = new Properties();
properties.put("bootstrap.servers", "127.0.0.1:9092");
properties.put("enable.auto.commit", "true");
properties.put("auto.commit.interval.ms", "1000");
properties.put("session.timeout.ms", "30000");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("group.id", "DemoProducer");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Arrays.asList("Message"));
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
//TimeUnit.MILLISECONDS.sleep(100);
System.out.printf("offset = %d, partitions = %s ,value = %s ", record.offset(),record.partition(), record.value());
System.out.println();
// kafkaConsumer.commitSync();//手动提交
}
}
client.id 发出请求时传递给服务器的ID字符串
acks 消息持久化方式
enable.auto.commit 是否自动提交 提交后不重复消费
auto.commit.interval.ms 自动提交间隔周期
session.timeout.ms 心跳
group.id 群组
1.一个生产者多个消费者,怎样均衡消费?
默认情况下一个topic只有一个partitions,同一个群组下的一个消费者只能消费一个partitions,所以默认情况下上面的两个消费者同时启动也只有一个消费者能够消费到数据。
解决方案:修改partitions kafka/bin下有提供工具
sh kafka-topics.sh --alter --zookeeper 127.0.0.1:2181 --topic Message --partitions 4
修改以后查看Topic信息
sh kafka-topics.sh --describe --zookeeper 127.0.0.1:2181 --topic Message
Topic: Message PartitionCount:4 ReplicationFactor:1 Configs:
Topic: Message Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: Message Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: Message Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: Message Partition: 3 Leader: 0 Replicas: 0 Isr: 0
2.改过partitions,发现数据还是只在Partition:0 上?
kafka数据分片的规则是 如果生产者指定key 那么就会获取key的hash值 与PartitionCount 取余数 就是Partition的位置,【注意生产者代码new ProducerRecord
如果key为null,分片规则:寻找上一次存储数据的Partition,如果没有则直接存在Partition0,如果存在就存在下一个Partition,均匀存储。
https://docs.confluent.io/current/installation/configuration/index.html