kafka

kafka

什么是 Kafka

Kafka 是一个分布式流式平台,它有三个关键能力

  1. 订阅发布记录流,它类似于企业中的消息队列 或 企业消息传递系统
  2. 以容错的方式存储记录流
  3. 实时记录流

Kafka 的应用

  1. 作为消息系统
  2. 作为存储系统
  3. 作为流处理器

Kafka 可以建立流数据管道,可靠性的在系统或应用之间获取数据。

建立流式应用传输和响应数据。

Kafka 作为消息系统

Kafka 作为消息系统,它有三个基本组件

kafka_第1张图片

  • Producer : 发布消息的客户端
  • Broker:一个从生产者接受并存储消息的客户端
  • Consumer : 消费者从 Broker 中读取消息

在大型系统中,会需要和很多子系统做交互,也需要消息传递,在诸如此类系统中,你会找到源系统(消息发送方)和 目的系统(消息接收方)。为了在这样的消息系统中传输数据,你需要有合适的数据管道

kafka_第2张图片

这种数据的交互看起来就很混乱,如果我们使用消息传递系统,那么系统就会变得更加简单和整洁

kafka_第3张图片

  • Kafka 运行在一个或多个数据中心的服务器上作为集群运行
  • Kafka 集群存储消息记录的目录被称为 topics
  • 每一条消息记录包含三个要素:键(key)、值(value)、时间戳(Timestamp)

核心 API

Kafka 有四个核心API,它们分别是

  • Producer API,它允许应用程序向一个或多个 topics 上发送消息记录
  • Consumer API,允许应用程序订阅一个或多个 topics 并处理为其生成的记录流
  • Streams API,它允许应用程序作为流处理器,从一个或多个主题中消费输入流并为其生成输出流,有效的将输入流转换为输出流。
  • Connector API,它允许构建和运行将 Kafka 主题连接到现有应用程序或数据系统的可用生产者和消费者。例如,关系数据库的连接器可能会捕获对表的所有更改

kafka_第4张图片

Kafka 基本概念

Kafka 作为一个高度可扩展可容错的消息系统,它有很多基本概念,下面就来认识一下这些 Kafka 专属的概念

topic

Topic 被称为主题,在 kafka 中,使用一个类别属性来划分消息的所属类,划分消息的这个类称为 topic。topic 相当于消息的分配标签,是一个逻辑概念。主题好比是数据库的表,或者文件系统中的文件夹。

partition

partition 译为分区,topic 中的消息被分割为一个或多个的 partition,它是一个物理概念,对应到系统上的就是一个或若干个目录,一个分区就是一个 提交日志。消息以追加的形式写入分区,先后以顺序的方式读取。

kafka_第5张图片

注意:由于一个主题包含无数个分区,因此无法保证在整个 topic 中有序,但是单个 Partition 分区可以保证有序。消息被迫加写入每个分区的尾部。Kafka 通过分区来实现数据冗余和伸缩性

分区可以分布在不同的服务器上,也就是说,一个主题可以跨越多个服务器,以此来提供比单个服务器更强大的性能。

segment

Segment 被译为段,将 Partition 进一步细分为若干个 segment,每个 segment 文件的大小相等。

broker

Kafka 集群包含一个或多个服务器,每个 Kafka 中服务器被称为 broker。broker 接收来自生产者的消息,为消息设置偏移量,并提交消息到磁盘保存。broker 为消费者提供服务,对读取分区的请求作出响应,返回已经提交到磁盘上的消息。

broker 是集群的组成部分,每个集群中都会有一个 broker 同时充当了 集群控制器(Leader)的角色,它是由集群中的活跃成员选举出来的。每个集群中的成员都有可能充当 Leader,Leader 负责管理工作,包括将分区分配给 broker 和监控 broker。集群中,一个分区从属于一个 Leader,但是一个分区可以分配给多个 broker(非Leader),这时候会发生分区复制。这种复制的机制为分区提供了消息冗余,如果一个 broker 失效,那么其他活跃用户会重新选举一个 Leader 接管。

kafka_第6张图片

producer

生产者,即消息的发布者,其会将某 topic 的消息发布到相应的 partition 中。生产者在默认情况下把消息均衡地分布到主题的所有分区上,而并不关心特定消息会被写到哪个分区。不过,在某些情况下,生产者会把消息直接写到指定的分区。

consumer

消费者,即消息的使用者,一个消费者可以消费多个 topic 的消息,对于某一个 topic 的消息,其只会消费同一个 partition 中的消息

kafka_第7张图片

安装

zookeeper模式
创建软件目录
mkdir /opt/soft
cd /opt/soft
下载
wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz
解压
tar -zxvf kafka_2.13-3.6.1.tgz 
修改目录名称
mv kafka_2.13-3.6.1 kafka
配置环境变量
vim /etc/profile.d/my_env.sh
export KAFKA_HOME=/opt/soft/kafka
export PATH=$PATH:$KAFKA_HOME/bin
修改配置文件

配置文件存放在 kafka/config目录

vim /opt/soft/kafka/config/server.properties

主要修改以下三个参数:

  • broker.id=1 注意不同的节点id号不同

  • log.dirs=/tmp/kafka-logs 修改为 log.dirs=/opt/soft/kafka/kafka-logs

  • zookeeper.connect=localhost:2181 修改为

    zookeeper.connect=spark01:2181,spark02:2181,spark03:2181/kafka

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# This configuration file is intended for use in ZK-based mode, where Apache ZooKeeper is required.
# See kafka.server.KafkaConfig for additional details and defaults
#

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

# The address the socket server listens on. If not configured, the host name will be equal to the value of
# java.net.InetAddress.getCanonicalHostName(), with PLAINTEXT listener name, and port 9092.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092

# Listener name, hostname and port the broker will advertise to clients.
# If not set, it uses the value for "listeners".
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/opt/soft/kafka/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
#log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=spark01:2181,spark02:2181,spark03:2181/kafka

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=18000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

分发配置到其它节点
scp -r /opt/soft/kafka root@spark02:/opt/soft
scp -r /opt/soft/kafka root@spark03:/opt/soft
scp /etc/profile.d/my_env.sh root@spark02:/etc/profile.d
scp /etc/profile.d/my_env.sh root@spark03:/etc/profile.d

在所有节点刷新环境变量

source /etc/profile
启动停止

在每个节点分别启动

kafka-server-start.sh -daemon /opt/soft/kafka/config/server.properties
kafka-server-stop.sh
启动脚本
vim kafka-service.sh
#!/bin/bash

case $1 in
"start"){
        for i in spark01 spark02 spark03
        do
                echo  ------------- kafka $i 启动 ------------
                ssh $i "/opt/soft/kafka/bin/kafka-server-start.sh -daemon /opt/soft/kafka/config/server.properties"
        done
}
;;
"stop"){
        for i in spark01 spark02 spark03
        do
                echo  ------------- kafka $i 停止 ------------
                ssh $i "/opt/soft/kafka/bin/kafka-server-stop.sh"
        done
}
esac

kraft模式
创建软件目录
mkdir /opt/soft
cd /opt/soft
下载
wget https://downloads.apache.org/kafka/3.4.0/kafka_2.13-3.6.1.tgz
解压
tar -zxvf kafka_2.13-3.6.1.tgz 
修改目录名称
mv kafka_2.13-3.6.1 kafka
配置环境变量
vim /etc/profile.d/my_env.sh
export KAFKA_HOME=/opt/soft/kafka
export PATH=$PATH:$KAFKA_HOME/bin
修改配置文件

配置文件存放在 kafka/config/kraft目录

vim /opt/soft/kafka/config/kraft/server.properties

主要修改以下三个参数:

  • process.roles=broker,controller
  • node.id=1 注意不同的节点id号不同
  • controller.quorum.voters=controller.quorum.voters=1@localhost:9093 修改为 controller.quorum.voters=controller.quorum.voters=1@spark01:9093,2@spark02:9093,3@spark03:9093
  • advertised.listeners=PLAINTEXT://localhost:9092 修改为 advertised.listeners=PLAINTEXT://spark01:9092
  • log.dirs=/tmp/kraft-combined-logs 修改为 log.dirs=/opt/soft/kafka/kraft-combined-logs
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# This configuration file is intended for use in KRaft mode, where
# Apache ZooKeeper is not present.  See config/kraft/README.md for details.
#

############################# Server Basics #############################

# The role of this server. Setting this puts us in KRaft mode
process.roles=broker,controller

# The node id associated with this instance's roles
node.id=1

# The connect string for the controller quorum
controller.quorum.voters=1@spark01:9093,2@spark02:9093,3@spark03:9093

############################# Socket Server Settings #############################

# The address the socket server listens on.
# Combined nodes (i.e. those with `process.roles=broker,controller`) must list the controller listener here at a minimum.
# If the broker listener is not defined, the default listener will use a host name that is equal to the value of java.net.InetAddress.getCanonicalHostName(),
# with PLAINTEXT listener name, and port 9092.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092,CONTROLLER://:9093

# Name of listener used for communication between brokers.
inter.broker.listener.name=PLAINTEXT

# Listener name, hostname and port the broker will advertise to clients.
# If not set, it uses the value for "listeners".
advertised.listeners=PLAINTEXT://spark01:9092

# A comma-separated list of the names of the listeners used by the controller.
# If no explicit mapping set in `listener.security.protocol.map`, default will be using PLAINTEXT protocol
# This is required if running in KRaft mode.
controller.listener.names=CONTROLLER

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/opt/soft/kafka/kraft-combined-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

分发配置到其它节点
scp -r /opt/soft/kafka root@spark02:/opt/soft
scp -r /opt/soft/kafka root@spark03:/opt/soft
scp /etc/profile.d/my_env.sh root@spark02:/etc/profile.d
scp /etc/profile.d/my_env.sh root@spark03:/etc/profile.d

在所有节点刷新环境变量

source /etc/profile
初始化集群数据目录
生成存储目录唯一 ID
kafka-storage.sh random-uuid

生成结果:

JfRaZDSORA2xK8pMSCa9AQ
用该 ID 格式化 kafka 存储目录

注意:在每个节点都要执行一次

kafka-storage.sh format -t JfRaZDSORA2xK8pMSCa9AQ \
-c /opt/soft/kafka/config/kraft/server.properties

执行结果:

Formatting /opt/soft/kraft-combined-logs with metadata.version 3.4-IV0.
启动停止

在每个节点分别启动

kafka-server-start.sh -daemon /opt/soft/kafka/config/kraft/server.properties
kafka-server-stop.sh
启动脚本
vim kafka-service.sh
#!/bin/bash

case $1 in
"start"){
        for i in spark01 spark02 spark03
        do
                echo  ------------- kafka $i 启动 ------------
                ssh $i "/opt/soft/kafka/bin/kafka-server-start.sh -daemon /opt/soft/kafka/config/kraft/server.properties"
        done
}
;;
"stop"){
        for i in spark01 spark02 spark03
        do
                echo  ------------- kafka $i 停止 ------------
                ssh $i "/opt/soft/kafka/bin/kafka-server-stop.sh"
        done
}
esac

命令行操作

主题命令行
查看操作主题命令参数
kafka-topics.sh
参数 描述
–bootstrap-server 连接的 Kafka Broker 主机名称和端口号
–topic 操作的 topic 名称
–create 创建主题
–delete 删除主题
–alter 修改主题
–list 查看所有主题
–describe 查看主题详细描述
–partitions 设置分区数
–replication-factor 设置分区副本
–config 更新系统默认的配置
查看当前服务器中的所有 topic
kafka-topics.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --list
创建 lihaozhe topic

选项说明:

–topic 定义 topic 名

–partitions 定义分区数

–replication-factor 定义副本数

kafka-topics.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 \
--topic lihaozhe --create --partitions 1 --replication-factor 3 
查看主题详情
kafka-topics.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 \
--describe --topic lihaozhe

执行结果:

Topic: lihaozhe	TopicId: kJWVrG0xQQSaFcrWGMYEGg	PartitionCount: 1	ReplicationFactor: 3	Configs: 
	Topic: lihaozhe	Partition: 0	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	
修改分区数

注意:

​ 分区数只能增加,不能减少

​ 不能通过命令行的方式修改副本

kafka-topics.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 \
--alter --topic lihaozhe --partitions 3

执行成功后再次查看主题详细信息结果如下:

Topic: lihaozhe	TopicId: kJWVrG0xQQSaFcrWGMYEGg	PartitionCount: 3	ReplicationFactor: 3	Configs: 
	Topic: lihaozhe	Partition: 0	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
	Topic: lihaozhe	Partition: 1	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: lihaozhe	Partition: 2	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2

生产者命令行
查看操作生产者命令参数
kafka-console-producer.sh
参数 描述
–bootstrap-server 连接的 Kafka Broker 主机名称和端口号
–topic 操作的 topic 名称
–key.serializer 指定发送消息的 key 的序列化类 一定要写全类名
–value.serializer 指定发送消息的 value 的序列化类 一定要写全类名
–buffer.memory RecordAccumulator 缓冲区总大小,默认 32Mb
–batch.size 缓冲区一批数据最大值,默认 16Kb。
适当增加该值,可以提高吞吐量,
但是如果该值设置太大,会导致数据传输延迟增加
–linger.ms 如果数据迟迟未达到 batch.size,sender 等待 linger.time之后就会发送数据。
单位 ms,默认值是 0ms,表示没有延迟。
生产环境建议该值大小为 5-100ms 之间。
–acks 0:生产者发送过来的数据,不需要等数据落盘应答
1:生产者发送过来的数据,Leader 收到数据后应答
-1(all):生产者发送过来的数据,Leader+和 isr 队列里面的所有节点收齐数据后应答
默认值是-1,-1 和all 是等价的
–max.in.flight.requests.per.connection 允许最多没有返回 ack 的次数,默认为 5,
开启幂等性要保证该值是 1-5 的数字
–retries 当消息发送出现错误的时候,系统会重发消息
retries表示重试次数。默认是 int 最大值,2147483647
如果设置了重试,还想保证消息的有序性,需要设置
MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION=1
否则在重试此失败消息的时候,其他的消息可能发送成功了
–retry.backoff.ms 两次重试之间的时间间隔,默认是 100ms
–enable.idempotence 是否开启幂等性,默认 true,开启幂等性。
–compression.type 生产者发送的所有数据的压缩方式。
默认是 none,也就是不压缩
支持压缩类型:none、gzip、snappy、lz4 和 zstd
发送消息
kafka-console-producer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
消费者命令行
查看操作消费者命令参数
kafka-console-consumer.sh
参数 描述
–bootstrap-server 连接的 Kafka Broker 主机名称和端口号
–topic 操作的 topic 名称
–from-beginning 从头开始消费
–group 指定消费者组名称
消费 lihaozhe 主题中的数据
kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 \
--topic lihaozhe
把主题中所有的数据都读取出来

包括历史数据

kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 \
--topic lihaozhe --from-beginning

生产者

生产者发送数据流程

kafka_第8张图片

  1. RecordAccumulator:每一个是生产上都会维护一个固定大小的内存空间,主要用于合并单条消息,进行批量发送,提高吞吐量,减少带宽消耗。

  2. RecordAccumulator的大小是可配置的,可以配置buffer.memory来修改缓冲区大小,默认值为:33554432(32M)

  3. RecordAccumulator内存结构分为两部分

    • 第一部分为已经使用的内存,这一部分主要存放了很多的队列。

      每一个主题的每一个分区都会创建一个队列,来存放当前分区下待发送的消息集合。

    • 第二部分为未使用的内存,这一部分分为已经池化后的内存和未池化的整个剩余内存(nonPooledAvailableMemory)。

      池化的内存的会根据batch.size(默认值为16K)的配置进行池化多个ByteBuffer,

      放入一个队列中。所有的剩余空间会形成一个未池化的剩余空间。

kafka_第9张图片
kafka_第10张图片

flume producer

vim file2kafka.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /root/data/app.*
a1.sources.r1.positionFile = /root/flume/taildir_positon.json

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = spark01:9092,spark02:9092,spark03:9092
a1.sinks.k1.kafka.topic = lihaozhe
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动 flume

flume-ng agent -n a1 -c conf -f file2kafka.conf

flume comsumer

vim kafka2log.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 50
a1.sources.r1.batchDurationMillis = 200
a1.sources.r1.kafka.bootstrap.servers = spark01:9092,spark02:9092,spark03:9092
a1.sources.r1.kafka.topics = lihaozhe
a1.sources.r1.kafka.consumer.group.id = custom.g.id

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动 flume

flume-ng agent -n a1 -c conf -f kafka2log.conf  -Dflume.root.logger=INFO,console

java api

pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0modelVersion>

  <groupId>com.lihaozhegroupId>
  <artifactId>kafka-codeartifactId>
  <version>1.0.0version>
  <packaging>jarpackaging>

  <name>kafkaname>
  <url>http://maven.apache.orgurl>

  <properties>
    <jdk.version>8jdk.version>
    <maven.compiler.source>8maven.compiler.source>
    <maven.compiler.target>8maven.compiler.target>
    <project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8project.reporting.outputEncoding>
    <maven.test.failure.ignore>truemaven.test.failure.ignore>
    <maven.test.skip>truemaven.test.skip>
  properties>
  <dependencies>
    
    <dependency>
      <groupId>org.junit.jupitergroupId>
      <artifactId>junit-jupiter-apiartifactId>
      <version>5.10.1version>
      <scope>testscope>
    dependency>
    
    <dependency>
      <groupId>org.junit.jupitergroupId>
      <artifactId>junit-jupiter-engineartifactId>
      <version>5.10.1version>
      <scope>testscope>
    dependency>
    <dependency>
      <groupId>org.projectlombokgroupId>
      <artifactId>lombokartifactId>
      <version>1.18.20version>
    dependency>
    <dependency>
      <groupId>org.apache.logging.log4jgroupId>
      <artifactId>log4j-slf4j-implartifactId>
      <version>2.20.0version>
    dependency>
    <dependency>
      <groupId>com.alibaba.fastjson2groupId>
      <artifactId>fastjson2artifactId>
      <version>2.0.31version>
    dependency>
    <dependency>
      <groupId>com.github.binarywanggroupId>
      <artifactId>java-testdata-generatorartifactId>
      <version>1.1.2version>
    dependency>
    <dependency>
      <groupId>mysqlgroupId>
      <artifactId>mysql-connector-javaartifactId>
      <version>8.2.0version>
    dependency>
    <dependency>
      <groupId>org.apache.kafkagroupId>
      <artifactId>kafka-clientsartifactId>
      <version>3.6.1version>
    dependency>
  dependencies>
  <build>
    <finalName>${project.name}finalName>
    
    <plugins>
      <plugin>
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-compiler-pluginartifactId>
        <version>3.11.0version>
        <configuration>
          
          <encoding>UTF-8encoding>
          
          <source>${jdk.version}source>
          <target>${jdk.version}target>
        configuration>
      plugin>
      <plugin>
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-clean-pluginartifactId>
        <version>3.2.0version>
      plugin>
      <plugin>
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-resources-pluginartifactId>
        <version>3.3.1version>
      plugin>
      <plugin>
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-war-pluginartifactId>
        <version>3.3.2version>
      plugin>
      
      
      <plugin>
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-surefire-pluginartifactId>
        <version>3.2.2version>
        <configuration>
          <skip>trueskip>
        configuration>
      plugin>
    plugins>
  build>
project>

生产者
producer 异步发送数据到 topic 不带回调函数

com.lihaozhe.producer.AsyncProducer

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * producer 异步发送数据到 topic 不带回调函数
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducer {

    public static void main(String[] args) {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        for (int i = 0; i < 5; i++) {
            producer.send(new ProducerRecord<>("lihaozhe", "李昊哲" + i));
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

producer 同步发送数据到 topic 不带回调函数

com.lihaozhe.producer.SyncProducer

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

/**
 * producer 同步发送数据到 topic 不带回调函数
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class syncProducer {

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        for (int i = 0; i < 5; i++) {
            producer.send(new ProducerRecord<>("lihaozhe", "李昊哲" + i)).get();
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

producer 异步发送数据到 topic 带回调函数

com.lihaozhe.producer.AsyncProducerCallback

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * producer 异步发送数据到 topic 回调函数
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerCallback {

    public static void main(String[] args) throws InterruptedException {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        for (int i = 0; i < 500; i++) {
            producer.send(new ProducerRecord<>("lihaozhe", "李昊哲" + i), (metadata, exception) -> {
                if (exception == null){
                    System.out.println("topic: " + metadata.topic() + "\tpartition: " + metadata.partition());
                }
            });
            Thread.sleep(2);
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

producer 异步发送数据到 topic 指定分区号

com.lihaozhe.producer.AsyncProducerCallbackPartitions01

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * producer 异步发送数据到 topic 带回调函数
 * 指定分区号
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerCallbackPartitions01 {

    public static void main(String[] args) throws InterruptedException {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        for (int i = 0; i < 500; i++) {
            // topic partion key value
            producer.send(new ProducerRecord<>("lihaozhe", 0, null, "李昊哲" + i), (metadata, exception) -> {
                if (exception == null) {
                    System.out.println("topic: " + metadata.topic() + "\tpartition: " + metadata.partition());
                }
            });
            Thread.sleep(2);
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

producer 异步发送数据到 topic 根据指定的 key 的 hash 值 对分区数取模

com.lihaozhe.producer.AsyncProducerCallbackPartitions02

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * producer 异步发送数据到 topic 带回调函数
 * 根据指定的 key 的 hash 值 对分区数取模
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerCallbackPartitions02 {

    public static void main(String[] args) throws InterruptedException {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        for (int i = 0; i < 500; i++) {
            // 字符 a 的 hash 值为 97
            producer.send(new ProducerRecord<>("lihaozhe", "a", "李昊哲" + i), (metadata, exception) -> {
                if (exception == null) {
                    System.out.println("topic: " + metadata.topic() + "\tpartition: " + metadata.partition());
                }
            });
            Thread.sleep(2);
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

producer 异步发送数据到 topic 关联自定义分区器

自定义分区类

com.lihaozhe.producer.MyPartitioner

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

/**
 * 自定义分区器
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class MyPartitioner implements Partitioner {
    /**
     * @param topic      The topic name
     * @param key        The key to partition on (or null if no key)
     * @param keyBytes   The serialized key to partition on( or null if no key)
     * @param value      The value to partition on or null
     * @param valueBytes The serialized value to partition on or null
     * @param cluster    The current cluster metadata
     * @return partition
     */
    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {

        String msg = value.toString();

        if (msg.contains("李哲")) {
            return 0;
        } else if (msg.contains("李昊哲")) {
            return 1;
        } else {
            return 2;
        }
    }

    @Override
    public void close() {

    }

    @Override
    public void configure(Map<String, ?> configs) {

    }
}

com.lihaozhe.producer.AsyncProducerCallbackPartitions03

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Arrays;
import java.util.List;
import java.util.Properties;

/**
 * producer 异步发送数据到 topic 带回调函数
 * 关联自定义分区器
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerCallbackPartitions03 {

    public static void main(String[] args) throws InterruptedException {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 关联自定义分区器 注意必须些完整类名字
        properties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, MyPartitioner.class.getName());

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        List<String> names = Arrays.asList("李昊哲", "李哲", "李大宝");

        for (int i = 0; i < 500; i++) {
            // topic partion key value
            producer.send(new ProducerRecord<>("lihaozhe", names.get(i % names.size())), (metadata, exception) -> {
                if (exception == null) {
                    System.out.println("topic: " + metadata.topic() + "\tpartition: " + metadata.partition());
                }
            });
            Thread.sleep(2);
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

调整生产者发送参数

com.lihaozhe.producer.AsyncProducerParameters

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * 调整生产者发送参数
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerParameters {

    public static void main(String[] args) {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 缓冲区大小
        properties.put(ProducerConfig.BUFFER_MEMORY_CONFIG,33554432);

        // 批次大小
        properties.put(ProducerConfig.BATCH_SIZE_CONFIG,16384);

        // linger.ms
        properties.put(ProducerConfig.LINGER_MS_CONFIG, 1);

        // 压缩 none, gzip, snappy, lz4, zstd
        properties.put(ProducerConfig.COMPRESSION_TYPE_CONFIG,"snappy");

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        for (int i = 0; i < 5; i++) {
            producer.send(new ProducerRecord<>("lihaozhe", "李昊哲" + i));
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

调整生产者发送参数 ack retries

com.lihaozhe.producer.AsyncProducerAck

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * producer 异步发送数据到 topic 带回调函数
 * 修改 ack retries
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerAck {

    public static void main(String[] args) throws InterruptedException {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        
        // acks
        properties.put(ProducerConfig.ACKS_CONFIG, "1");

        // retries 重试次数
        properties.put(ProducerConfig.RETRIES_CONFIG, 3);

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        // 3、发送数据
        for (int i = 0; i < 500; i++) {
            producer.send(new ProducerRecord<>("lihaozhe", "李昊哲" + i), (metadata, exception) -> {
                if (exception == null) {
                    System.out.println("topic: " + metadata.topic() + "\tpartition: " + metadata.partition());
                }
            });
            Thread.sleep(2);
        }
        // 4、释放资源
        producer.close();
        System.out.println("success");
    }
}

事务

com.lihaozhe.producer.AsyncProducerTransactions

package com.lihaozhe.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * producer 异步发送数据到 topic 不带回调函数
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerTransactions {

    public static void main(String[] args) {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 指定事务id
        properties.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "transactional_id_01");

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

        producer.initTransactions();

        producer.beginTransaction();

        try {
            // 3、发送数据
            for (int i = 0; i < 5; i++) {
                producer.send(new ProducerRecord<>("lihaozhe", "李昊哲" + i));
            }
            // int i = 1 / 0;
            producer.commitTransaction();
            System.out.println("success");
        } catch (Exception e) {
            System.out.println("failed");
            producer.abortTransaction();
        } finally {
            // 4、释放资源
            producer.close();
        }
    }
}

```java
/**
 * 提前在控制台 打开消费者监听 命令如下
 * kafka-console-consumer.sh --bootstrap-server spark01:9092,spark02:9092,spark03:9092 --topic lihaozhe
 *
 * @author 李昊哲
 * @version 1.0.0
 */
public class AsyncProducerTransactions {

    public static void main(String[] args) {
        // 1、基础配置
        Properties properties = new Properties();

        // 连接集群 bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "spark01:9092,spark02:9092,spark03:9092");

        // 指定对应的key和value的序列化类型 key.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        // 指定事务id
        properties.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "transactional_id_01");

        // 2、创建kafka生产者对象
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

        producer.initTransactions();

        producer.beginTransaction();

        try {
            // 3、发送数据
            for (int i = 0; i < 5; i++) {
                producer.send(new ProducerRecord<>("lihaozhe", "李昊哲" + i));
            }
            // int i = 1 / 0;
            producer.commitTransaction();
            System.out.println("success");
        } catch (Exception e) {
            System.out.println("failed");
            producer.abortTransaction();
        } finally {
            // 4、释放资源
            producer.close();
        }
    }
}

你可能感兴趣的:(Java,大数据,云计算,kafka,大数据,数据分析)