集群的特点
集群拥有以下两个特点:
集群的能力
负载均衡和错误恢复要求各服务实体中有执行同一任务的资源存在,而且对于同一任务的各个资源来说,执行任务所需的信息视图必须是相同的。
Kafka
是一个分布式消息系统,具有高水平扩展和高吞吐量的特点。在Kafka 集群中,没有 “中心主节点” 的概念,集群中所有的节点都是对等的。
Broker(代理)
每个 Broker 即一个 Kafka 服务实例,多个 Broker 构成一个 Kafka 集群,生产者发布的消息将保存在 Broker 中,消费者将从 Broker 中拉取消息进行消费。
Kafka集群架构图
从图中可以看出 Kafka 强依赖于 ZooKeeper ,通过 ZooKeeper 管理自身集群,如:Broker 列表管理、Partition 与 Broker 的关系、Partition 与 Consumer 的关系、Producer 与 Consumer 负载均衡、消费进度 Offset 记录、消费者注册 等,所以为了达到高可用,ZooKeeper 自身也必须是集群。
场景
真实的集群是需要部署在不同的服务器上的,但是在我们测试时同时启动十几个虚拟机内存会吃不消,所以这里我们搭建伪集群,也就是把所有的服务都搭建在一台虚拟机上,用端口进行区分。
我们这里要求搭建一个三个节点的Zookeeper集群(伪集群)。
安装JDK
集群目录
创建zookeeper-cluster目录,将解压后的Zookeeper复制到以下三个目录
itcast@Server-node:/mnt/d/zookeeper-cluster$ ll
total 0
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 ./
drwxrwxrwx 1 dayuan dayuan 512 Aug 19 18:42 ../
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-1/
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-2/
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 10:02 zookeeper-3/
itcast@Server-node:/mnt/d/zookeeper-cluster$
ClientPort设置
配置每一个Zookeeper 的dataDir(zoo.cfg) clientPort 分别为2181 2182 2183
# the port at which the clients will connect
clientPort=2181
myid配置
在每个zookeeper的 data 目录下创建一个 myid 文件,内容分别是0、1、2 。这个文件就是记录每个服务器的ID
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$ cat
temp/zookeeper/data/myid
0
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$
zoo.cfg
在每一个zookeeper 的 zoo.cfg配置客户端访问端口(clientPort)和集群服务器IP列表。
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$ cat conf/zoo.cfg
# The number of milliseconds of each tick
# zk服务器的心跳时间
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#dataDir=/tmp/zookeeper
dataDir=temp/zookeeper/data
dataLogDir=temp/zookeeper/log
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.0=127.0.0.1:2888:3888
server.1=127.0.0.1:2889:3889
server.2=127.0.0.1:2890:3890
dayuan@MY-20190430BUDR:/mnt/d/zookeeper-cluster/zookeeper-1$
解释:server.服务器ID=服务器IP地址:服务器之间通信端口:服务器之间投票选举端口
启动集群
启动集群就是分别启动每个实例,启动后我们查询一下每个实例的运行状态
itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-1$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /mnt/d/zookeeper-cluster/zookeeper-1/bin/../conf/zoo.cfg
Mode: leader
itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-2$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /mnt/d/zookeeper-cluster/zookeeper-2/bin/../conf/zoo.cfg
Mode: follower
itcast@Server-node:/mnt/d/zookeeper-cluster/zookeeper-3$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /mnt/d/zookeeper-cluster/zookeeper-3/bin/../conf/zoo.cfg
Mode: follower
集群目录
itcast@Server-node:/mnt/d/kafka-cluster$ ll
total 0
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:15 ./
drwxrwxrwx 1 dayuan dayuan 512 Aug 19 18:42 ../
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:39 kafka-1/
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 14:02 kafka-2/
drwxrwxrwx 1 dayuan dayuan 512 Jul 24 14:02 kafka-3/
drwxrwxrwx 1 dayuan dayuan 512 Aug 28 18:15 kafka-4/
itcast@Server-node:/mnt/d/kafka-cluster$
server.properties
# broker 编号,集群内必须唯一
broker.id=1
# host 地址
host.name=127.0.0.1
# 端口
port=9092
# 消息日志存放地址
log.dirs=/tmp/kafka/log/cluster/log3
# ZooKeeper 地址,多个用,分隔
zookeeper.connect=localhost:2181,localhost:2182,localhost:2183
启动集群
分别通过 cmd 进入每个 Kafka 实例,输入命令启动
...............................
[2019-07-24 06:18:19,793] INFO [Transaction Marker Channel Manager 2]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager)
[2019-07-24 06:18:19,793] INFO [TransactionCoordinator id=2] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator)
[2019-07-24 06:18:19,846] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2019-07-24 06:18:19,869] INFO [SocketServer brokerId=2] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
[2019-07-24 06:18:19,879] INFO Kafka version: 2.2.1 (org.apache.kafka.common.utils.AppInfoParser)
[2019-07-24 06:18:19,879] INFO Kafka commitId: 55783d3133a5a49a (org.apache.kafka.common.utils.AppInfoParser)
[2019-07-24 06:18:19,883] INFO [KafkaServer id=2] started (kafka.server.KafkaServer)
MirrorMaker
是为解决Kafka跨集群同步、创建镜像集群而存在的;下图展示了其工作原理。该工具消费源集群消息然后将数据重新推送到目标集群。
创建镜像
使用MirrorMaker创建镜像是比较简单的,搭建好目标Kafka集群后,只需要启动mirror-maker程序即可。其中,一个或多个consumer配置文件、一个producer配置文件是必须的,whitelist、blacklist是可选的。在consumer的配置中指定源Kafka集群的Zookeeper,在producer的配置中指定目标集群的 Zookeeper(或者broker.list)。
kafka-run-class.sh kafka.tools.MirrorMaker –
consumer.config sourceCluster1Consumer.config –
consumer.config sourceCluster2Consumer.config –num.streams 2 –
producer.config targetClusterProducer.config –whitelist=“.*”
consumer配置文件:
# format: host1:port1,host2:port2 ...
bootstrap.servers=localhost:9092
# consumer group id
group.id=test-consumer-group
# What to do when there is no initial offset in Kafka or if the current
# offset does not exist any more on the server: latest, earliest, none
#auto.offset.reset=
producer配置文件:
# format: host1:port1,host2:port2 ...
bootstrap.servers=localhost:9092
# specify the compression codec for all data generated: none, gzip, snappy, lz4, zstd
compression.type=none
同步数据如何做到不丢失 首先发送到目标集群时需要确认:request.required.acks=1 发送时采用阻塞模式,否则缓冲区满了数据丢弃:queue.enqueue.timeout.ms=-1
本章主要对Kafka集群展开讲解,介绍了集群使用场景,Zookeeper
和Kafka
多借点集群的搭建,以及多集群的同步操作。
参考文章《Kafka技术手册》
需要的同学可加助理VX:C18173184271 备注:CSDN Java_Caiyo 免费获取Java资料!