一、Zookeeper集群准备
Kafka服务有赖于Zookeeper来管理,故在安装kakfa前先安装zk集群环境。具体安装步骤,详见 Linux 系列(七)——Zookeeper集群搭建
二、Kafka核心配置解析
1、上传压缩包,解压
2、解压后,cd /lamp/kafka/config文件下,查看kafka所有相关配置
-rw-r--r--. 1 root root 906 May 17 21:26 connect-console-sink.properties -rw-r--r--. 1 root root 909 May 17 21:26 connect-console-source.properties -rw-r--r--. 1 root root 2760 May 17 21:26 connect-distributed.properties -rw-r--r--. 1 root root 883 May 17 21:26 connect-file-sink.properties -rw-r--r--. 1 root root 881 May 17 21:26 connect-file-source.properties -rw-r--r--. 1 root root 1074 May 17 21:26 connect-log4j.properties -rw-r--r--. 1 root root 2061 May 17 21:26 connect-standalone.properties -rw-r--r--. 1 root root 1199 May 17 21:26 consumer.properties -rw-r--r--. 1 root root 4369 May 17 21:26 log4j.properties -rw-r--r--. 1 root root 1900 May 17 21:26 producer.properties -rw-r--r--. 1 root root 5243 May 17 21:26 server.properties -rw-r--r--. 1 root root 1032 May 17 21:26 tools-log4j.properties -rw-r--r--. 1 root root 1041 Jun 18 11:19 zookeeper.propertieszookeeper.properties
# the directory where the snapshot is stored. dataDir=/usr/local/cloud/zookeeper1/data # the port at which the clients will connect clientPort=2181 # disable the per-ip limit on the number of connections since this is a non-production config maxClientCnxns=0server.properties
############################# Server Basics 指定broker ID,每个kafka实例拥有一个唯一brokerID作为标识############################# # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 ############################# Log Basics kafka日志持久化日志文件路径,生产环境指定到非tmp临时文件夹下############################# # A comma seperated list of directories under which to store log files log.dirs=/tmp/kafka-logs # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # 配置本broker topic 分区个数 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 ############################# Zookeeper ############################# # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=192.168.220.128:2181 #zk服务地址 # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000 #zk响应过期时间
server.properties是kafka broker的配置文件,主要用于配置broker的日志、分区、zk连接信息。除了上述提供的主要配置之外,还有Log Retention Policy日志保留策略(以总时间、segment大小、间隔时间配置)、Log Flush Policy 日志flush策略 ,日志中存有10000条记录开始持久化操作、每隔1000ms执行持久化,通过配置可调可配,非常灵活。
producer.properties
############################# Producer Basics ############################# # list of brokers used for bootstrapping knowledge about the rest of the cluster # format: host1:port1,host2:port2 ... bootstrap.servers=192.168.220.128:9092 # specify the compression codec for all data generated: none, gzip, snappy, lz4 compression.type=noneconsumer.properties
# Zookeeper connection string # comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002" zookeeper.connect=127.0.0.1:2181 # timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000 #consumer group id group.id=test-consumer-group #consumer timeout #consumer.timeout.ms=5000三、Kafka常用操作
1、kafka启动
启动kafka前先启动zk服务: bin/zookeeper-server-start.sh /usr/local/cloud/zookeeper1/conf/zoo.cfg --指定zk的配置文件
启动kafka: bin/kafka-server-start.sh /lamp/kafka/config/server.properties >/dev/null 2>&1 & --后台启动kafka
执行jsp查看启动进程如下
2、创建topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic max
在创建完topic后,可查看kafka日志持久化路径,已经创建好刚才的log日志文件用于记录topic中parttion信息。(log :partition=1:1)
3、查看topic
查看topic中list:bin/kafka-topics.sh --list --zookeeper localhost:2181 --返回以创建的topic名称,如max,linxi
查看某topic 详情信息:bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic max
4、删除topic
bin/kafka-run-class.sh kafka.admin.TopicCommand –delete --topic max --zookeeper 192.168.220.128:2181
5、创建生产者
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic max 向名为max的topic中生产消息
6、创建消费者
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic max --from-beginning
生产者和消费间消息是实时的,这边生产,另一端立马开始消费。
另外在zk中记录了partion消息的消费记录信息,进入zk客户端连接服务,进行查看
三、Kafka集群
Kafka的集群与zk集群类似,灰常简单,因为已经搭建好zk集群,kafka只需创建多个实例,修改上述server.properties中broker.id 不重复即可,and 在zk connection处添加上三个zk实例即可。
四、客户端实现
/* *消息生产者 */ public class Producer extends Thread{ private String topic; public Producer(String topic) { super(); this.topic=topic; } @Override public void run() { Producer producer=createProducer(); producer.send(new KeyedMessage<>(topic, "xiaowei")); } private Producer createProducer() { Properties props=new Properties(); props.setProperty("zookeeper.connect", "192.168.220.128:2181"); props.setProperty("serializer.class", StringEncoder.class.getName()); props.setProperty("metadata.broker.list", "192.168.220.128:9092");//p.properties的配置中的端口号 return new Producer(new ProducerConfig(props)); } public static void main(String[] args) { new Producer("max").start(); } } /* *消息消费者 */ public class Consumer extends Thread{ private String topic; public Consumer(String topic) { super(); this.topic=topic; } @Override public void run() { //通过props创建consumer连接 ConsumerConnector consumer=createConsumer(); //创建topicMap存放指定消费的topic名,分区号 Map<String,Integer> topicMap=new HashMap<String,Integer>(); topicMap.put(topic, 1); //msgstreams-topic中的所有消息,kafkaStream-某一条消息 Map<String,List<KafkaStream<byte[],byte[]>>> msgStreams=consumer.createMessageStreams(topicMap); KafkaStream<byte[],byte[]> kafkaStream=msgStreams.get(topic).get(0); //数据处理过程 ConsumerIterator<byte[],byte[]> iterator=kafkaStream.iterator(); while(iterator.hasNext()) { byte[] msg=iterator.next.message(); System.out.println("message is" +new String(msg)); } } private ConsumerConnector createConsumer() { Properties props=new Properties(); props.setProperty("zookeeper.connect", "192.168.220.128:2181"); props.setProperty("group.id", "max"); return Consumer.createJavaConsumerConnector(new Consumer()); } public static void main(String[] args) { new Consumer("max").start(); } }Kafka实际应用的消费者和生产者都属于某些项目或系统,例如在大数据应用中,使用flume收集信息,flume的sink配置kafka接收,kafka高速接收缓冲这些数据,storm从kafka获取,进行数据处理;实际flume就充当kafka的消息生产者,storm为消息消费者。