Kafka具有消费分组的概念,一个Topic的一个partition只能由一个Consumer group中的一个Consmer消费,概念似乎很容易理解,那么我们来做个测试看
使用一个已经创建好的Topic,
[root@node1 bin]# ./kafka-topics.sh --describe --zookeeper localhost:2181 --topic scripts
Topic:scripts PartitionCount:4 ReplicationFactor:1 Configs:
Topic: scripts Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: scripts Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: scripts Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: scripts Partition: 3 Leader: 0 Replicas: 0 Isr: 0
[root@node1 bin]#
这里我们使用python来开发,将使用kafka-python包,这个包的安装方法可以参考我的另一篇博客:kfka学习笔记二:使用Python操作Kafka
consumer.py
from kafka import KafkaConsumer
import time
import datetime
#定义topic name
topic_="scripts"
#定义consumer group
consumer_group_="consumer-20171017"
#定义一个函数用来输出格式化的信息,增加了时间提示
def log(str):
dd = datetime.datetime.now()
#t = time.strftime(r"%Y-%m-%d_%H-%M-%S",time.localtime())
print("[%s]%s"%(dd,str))
print('start listen %s:%s' %(topic_,consumer_group_))
#创建consumer,链接到192.168.120.11:9092
consumer=KafkaConsumer(topic_,group_id=consumer_group_,bootstrap_servers=['192.168.120.11:9092'])
#接收消息并输出,这里会阻塞
for msg in consumer:
recv = "%s:%s:%d:%d: key=%s value=%s" %(consumer_group_,msg.topic,msg.partition,msg.offset,msg.key,msg.value)
log(recv)
producer.py代码,这个producer会根据命令参数里面指定的路径,把这个路径下的文件名发送到指定的topic
#-*- coding: utf-8 -*-
from kafka import KafkaProducer
import json
import os
import datetime
from sys import argv
#定义topic
topic_="scripts"
#创建链接到192.168.120.11:9092这个端口的producer
producer = KafkaProducer(bootstrap_servers='192.168.120.11:9092')
#定义带时间格式的日志函数
def log(str):
t = datetime.datetime.now()
#t = time.strftime(r"%Y-%m-%d_%H-%M-%S",time.localtime())
print("[%s]%s"%(t,str))
#根据路径遍历这个路径下的所有文件包括目录,发送到指定的topic,并且在屏幕上输出发送的文件名,以及文件数量
def list_file(path):
dir_list = os.listdir(path);
i=0
for f in dir_list:
producer.send(topic_,f)
producer.flush()
log('send[%s]: %s' % (topic_,f))
i = i +1
log("%s send ok" % (str(i)))
list_file(argv[1])
producer.close()
log('done')
在上面第一步里面可以看到script这个topic有4个partition,为何要启动5个Consumer呢?后面会讲到
1、启动5个Consumer
consumer.py 在192.168.120.11 /opt/app/python_app 这个目录下面
[root@node1 python_app]# pwd
/opt/app/python_app
[root@node1 python_app]# ll
总用量 32
-rw-r--r--. 1 root root 1116 9月 23 15:48 bs4_demo.py
drwxr-xr-x. 4 root root 4096 9月 23 03:45 consumer
-rw-r--r--. 1 root root 557 9月 24 05:18 consumer.py
-rw-r--r--. 1 root root 705 9月 23 15:08 pachong.py
drwxr-xr-x. 4 root root 4096 9月 23 04:21 producer
-rw-r--r--. 1 root root 441 9月 23 23:13 script.py
-rw-r--r--. 1 root root 345 9月 23 21:33 test1.py
-rw-r--r--. 1 root root 602 9月 23 05:56 thread.py
[root@node1 python_app]#
2、启动Producer
producer.py 在192.168.120.12 /opt/app/python_app 这个目录下面,
假如我们用/opt/package/kafka_2.11-0.11.0.0/libs/这个路径作为参数,那么这个producer会把这个路径下的所有文件名发送到指定的topic,下面是执行结果。
[root@node2 python_app]# python producer.py /opt/package/kafka_2.11-0.11.0.0/libs/
[2017-11-08 02:21:51.988314]send[scripts]: jetty-security-9.2.15.v20160210.jar
[2017-11-08 02:21:52.027536]send[scripts]: connect-file-0.11.0.0.jar
[2017-11-08 02:21:52.036955]send[scripts]: slf4j-log4j12-1.7.25.jar
[2017-11-08 02:21:52.041581]send[scripts]: kafka-streams-examples-0.11.0.0.jar
[2017-11-08 02:21:52.048101]send[scripts]: kafka-tools-0.11.0.0.jar
[2017-11-08 02:21:52.068020]send[scripts]: zkclient-0.10.jar
[2017-11-08 02:21:52.074602]send[scripts]: connect-transforms-0.11.0.0.jar
[2017-11-08 02:21:52.078697]send[scripts]: jackson-databind-2.8.5.jar
[2017-11-08 02:21:52.083996]send[scripts]: kafka_2.11-0.11.0.0-test.jar.asc
[2017-11-08 02:21:52.090364]send[scripts]: scala-library-2.11.11.jar
[2017-11-08 02:21:52.095241]send[scripts]: jetty-server-9.2.15.v20160210.jar
[2017-11-08 02:21:52.104866]send[scripts]: jetty-http-9.2.15.v20160210.jar
[2017-11-08 02:21:52.110210]send[scripts]: javax.servlet-api-3.1.0.jar
[2017-11-08 02:21:52.119537]send[scripts]: jersey-guava-2.24.jar
[2017-11-08 02:21:52.122763]send[scripts]: jackson-annotations-2.8.5.jar
[2017-11-08 02:21:52.129778]send[scripts]: jetty-servlet-9.2.15.v20160210.jar
[2017-11-08 02:21:52.134385]send[scripts]: jackson-jaxrs-json-provider-2.8.5.jar
[2017-11-08 02:21:52.140605]send[scripts]: jetty-util-9.2.15.v20160210.jar
[2017-11-08 02:21:52.146042]send[scripts]: jersey-server-2.24.jar
[2017-11-08 02:21:52.154295]send[scripts]: kafka_2.11-0.11.0.0.jar
[2017-11-08 02:21:52.159763]send[scripts]: javax.ws.rs-api-2.0.1.jar
[2017-11-08 02:21:52.165220]send[scripts]: validation-api-1.1.0.Final.jar
[2017-11-08 02:21:52.172444]send[scripts]: scala-parser-combinators_2.11-1.0.4.jar
[2017-11-08 02:21:52.179656]send[scripts]: hk2-utils-2.5.0-b05.jar
[2017-11-08 02:21:52.186331]send[scripts]: jersey-client-2.24.jar
[2017-11-08 02:21:52.196561]send[scripts]: hk2-locator-2.5.0-b05.jar
[2017-11-08 02:21:52.204605]send[scripts]: connect-api-0.11.0.0.jar
[2017-11-08 02:21:52.208720]send[scripts]: snappy-java-1.1.2.6.jar
[2017-11-08 02:21:52.215308]send[scripts]: kafka_2.11-0.11.0.0-test.jar
[2017-11-08 02:21:52.220175]send[scripts]: connect-runtime-0.11.0.0.jar
[2017-11-08 02:21:52.225902]send[scripts]: jetty-servlets-9.2.15.v20160210.jar
[2017-11-08 02:21:52.230140]send[scripts]: plexus-utils-3.0.24.jar
[2017-11-08 02:21:52.239122]send[scripts]: kafka-clients-0.11.0.0.jar
[2017-11-08 02:21:52.243209]send[scripts]: jersey-container-servlet-2.24.jar
[2017-11-08 02:21:52.249485]send[scripts]: kafka-streams-0.11.0.0.jar
[2017-11-08 02:21:52.261434]send[scripts]: kafka-log4j-appender-0.11.0.0.jar
[2017-11-08 02:21:52.268488]send[scripts]: jersey-common-2.24.jar
[2017-11-08 02:21:52.272981]send[scripts]: kafka_2.11-0.11.0.0-test-sources.jar
[2017-11-08 02:21:52.282869]send[scripts]: jetty-io-9.2.15.v20160210.jar
[2017-11-08 02:21:52.286566]send[scripts]: hk2-api-2.5.0-b05.jar
[2017-11-08 02:21:52.294364]send[scripts]: javax.annotation-api-1.2.jar
[2017-11-08 02:21:52.300823]send[scripts]: lz4-1.3.0.jar
[2017-11-08 02:21:52.303430]send[scripts]: jackson-core-2.8.5.jar
[2017-11-08 02:21:52.310836]send[scripts]: jackson-jaxrs-base-2.8.5.jar
[2017-11-08 02:21:52.316827]send[scripts]: kafka_2.11-0.11.0.0-sources.jar.asc
[2017-11-08 02:21:52.320950]send[scripts]: jersey-container-servlet-core-2.24.jar
[2017-11-08 02:21:52.328338]send[scripts]: guava-20.0.jar
[2017-11-08 02:21:52.334289]send[scripts]: aopalliance-repackaged-2.5.0-b05.jar
[2017-11-08 02:21:52.341927]send[scripts]: jersey-media-jaxb-2.24.jar
[2017-11-08 02:21:52.349975]send[scripts]: zookeeper-3.4.10.jar
[2017-11-08 02:21:52.355673]send[scripts]: kafka_2.11-0.11.0.0.jar.asc
[2017-11-08 02:21:52.361833]send[scripts]: commons-lang3-3.5.jar
[2017-11-08 02:21:52.370736]send[scripts]: kafka_2.11-0.11.0.0-scaladoc.jar.asc
[2017-11-08 02:21:52.376613]send[scripts]: kafka_2.11-0.11.0.0-scaladoc.jar
[2017-11-08 02:21:52.386769]send[scripts]: kafka_2.11-0.11.0.0-sources.jar
[2017-11-08 02:21:52.391215]send[scripts]: javassist-3.21.0-GA.jar
[2017-11-08 02:21:52.396025]send[scripts]: slf4j-api-1.7.25.jar
[2017-11-08 02:21:52.407430]send[scripts]: rocksdbjni-5.0.1.jar
[2017-11-08 02:21:52.416798]send[scripts]: log4j-1.2.17.jar
[2017-11-08 02:21:52.423280]send[scripts]: kafka_2.11-0.11.0.0-javadoc.jar
[2017-11-08 02:21:52.433237]send[scripts]: jackson-module-jaxb-annotations-2.8.5.jar
[2017-11-08 02:21:52.439345]send[scripts]: metrics-core-2.2.0.jar
[2017-11-08 02:21:52.445700]send[scripts]: kafka_2.11-0.11.0.0-javadoc.jar.asc
[2017-11-08 02:21:52.452517]send[scripts]: javax.inject-1.jar
[2017-11-08 02:21:52.491859]send[scripts]: argparse4j-0.7.0.jar
[2017-11-08 02:21:52.498331]send[scripts]: reflections-0.9.11.jar
[2017-11-08 02:21:52.504047]send[scripts]: kafka_2.11-0.11.0.0-test-sources.jar.asc
[2017-11-08 02:21:52.513288]send[scripts]: maven-artifact-3.5.0.jar
[2017-11-08 02:21:52.519388]send[scripts]: jopt-simple-5.0.3.jar
[2017-11-08 02:21:52.527171]send[scripts]: osgi-resource-locator-1.0.1.jar
[2017-11-08 02:21:52.535946]send[scripts]: connect-json-0.11.0.0.jar
[2017-11-08 02:21:52.540584]send[scripts]: jetty-continuation-9.2.15.v20160210.jar
[2017-11-08 02:21:52.546839]send[scripts]: javax.inject-2.5.0-b05.jar
[2017-11-08 02:21:52.547075]73 send ok
[2017-11-08 02:21:52.552301]done
[root@node2 python_app]#
可以看到总共向scripts这个Topic发送了73个文件名
接下来我们看看各个Consumer接收数据的情况
第1个Consumer:
[root@node1 python_app]# python consumer.py
start listen scripts:consumer-20171017
[2017-09-24 05:26:41.025240]consumer-20171017:scripts:2:117: key=None value=jetty-security-9.2.15.v20160210.jar
[2017-09-24 05:26:41.084800]consumer-20171017:scripts:2:118: key=None value=kafka-streams-examples-0.11.0.0.jar
[2017-09-24 05:26:41.091487]consumer-20171017:scripts:2:119: key=None value=kafka-tools-0.11.0.0.jar
[2017-09-24 05:26:41.115627]consumer-20171017:scripts:2:120: key=None value=connect-transforms-0.11.0.0.jar
[2017-09-24 05:26:41.122231]consumer-20171017:scripts:2:121: key=None value=jackson-databind-2.8.5.jar
[2017-09-24 05:26:41.128516]consumer-20171017:scripts:2:122: key=None value=kafka_2.11-0.11.0.0-test.jar.asc
[2017-09-24 05:26:41.146627]consumer-20171017:scripts:2:123: key=None value=jetty-server-9.2.15.v20160210.jar
[2017-09-24 05:26:41.154155]consumer-20171017:scripts:2:124: key=None value=javax.servlet-api-3.1.0.jar
[2017-09-24 05:26:41.170524]consumer-20171017:scripts:2:125: key=None value=jetty-servlet-9.2.15.v20160210.jar
[2017-09-24 05:26:41.220996]consumer-20171017:scripts:2:126: key=None value=hk2-utils-2.5.0-b05.jar
[2017-09-24 05:26:41.281828]consumer-20171017:scripts:2:127: key=None value=kafka-clients-0.11.0.0.jar
[2017-09-24 05:26:41.341866]consumer-20171017:scripts:2:128: key=None value=lz4-1.3.0.jar
[2017-09-24 05:26:41.351880]consumer-20171017:scripts:2:129: key=None value=jackson-jaxrs-base-2.8.5.jar
[2017-09-24 05:26:41.383141]consumer-20171017:scripts:2:130: key=None value=jersey-media-jaxb-2.24.jar
[2017-09-24 05:26:41.471836]consumer-20171017:scripts:2:131: key=None value=jackson-module-jaxb-annotations-2.8.5.jar
[2017-09-24 05:26:41.483293]consumer-20171017:scripts:2:132: key=None value=metrics-core-2.2.0.jar
[2017-09-24 05:26:41.569093]consumer-20171017:scripts:2:133: key=None value=osgi-resource-locator-1.0.1.jar
接收到17个消息
第2个Consumer:
[root@node1 python_app]# python consumer.py
start listen scripts:consumer-20171017
[2017-09-24 05:26:41.068568]consumer-20171017:scripts:1:123: key=None value=connect-file-0.11.0.0.jar
[2017-09-24 05:26:41.095554]consumer-20171017:scripts:1:124: key=None value=zkclient-0.10.jar
[2017-09-24 05:26:41.181753]consumer-20171017:scripts:1:125: key=None value=jetty-util-9.2.15.v20160210.jar
[2017-09-24 05:26:41.213648]consumer-20171017:scripts:1:126: key=None value=scala-parser-combinators_2.11-1.0.4.jar
[2017-09-24 05:26:41.228033]consumer-20171017:scripts:1:127: key=None value=jersey-client-2.24.jar
[2017-09-24 05:26:41.241564]consumer-20171017:scripts:1:128: key=None value=hk2-locator-2.5.0-b05.jar
[2017-09-24 05:26:41.268877]consumer-20171017:scripts:1:129: key=None value=connect-runtime-0.11.0.0.jar
[2017-09-24 05:26:41.274813]consumer-20171017:scripts:1:130: key=None value=jetty-servlets-9.2.15.v20160210.jar
[2017-09-24 05:26:41.286442]consumer-20171017:scripts:1:131: key=None value=plexus-utils-3.0.24.jar
[2017-09-24 05:26:41.295452]consumer-20171017:scripts:1:132: key=None value=jersey-container-servlet-2.24.jar
[2017-09-24 05:26:41.295541]consumer-20171017:scripts:1:133: key=None value=kafka-streams-0.11.0.0.jar
[2017-09-24 05:26:41.322267]consumer-20171017:scripts:1:134: key=None value=jetty-io-9.2.15.v20160210.jar
[2017-09-24 05:26:41.335643]consumer-20171017:scripts:1:135: key=None value=javax.annotation-api-1.2.jar
[2017-09-24 05:26:41.391061]consumer-20171017:scripts:1:136: key=None value=zookeeper-3.4.10.jar
[2017-09-24 05:26:41.426617]consumer-20171017:scripts:1:137: key=None value=kafka_2.11-0.11.0.0-sources.jar
[2017-09-24 05:26:41.451555]consumer-20171017:scripts:1:138: key=None value=slf4j-api-1.7.25.jar
[2017-09-24 05:26:41.461958]consumer-20171017:scripts:1:139: key=None value=log4j-1.2.17.jar
[2017-09-24 05:26:41.488710]consumer-20171017:scripts:1:140: key=None value=kafka_2.11-0.11.0.0-javadoc.jar.asc
[2017-09-24 05:26:41.553666]consumer-20171017:scripts:1:141: key=None value=maven-artifact-3.5.0.jar
[2017-09-24 05:26:41.577693]consumer-20171017:scripts:1:142: key=None value=connect-json-0.11.0.0.jar
[2017-09-24 05:26:41.585841]consumer-20171017:scripts:1:143: key=None value=jetty-continuation-9.2.15.v20160210.jar
接收到21个消息
第3个Consumer:
[root@node1 python_app]# python consumer.py
start listen scripts:consumer-20171017
[2017-09-24 05:26:41.143396]consumer-20171017:scripts:3:135: key=None value=jetty-http-9.2.15.v20160210.jar
[2017-09-24 05:26:41.189993]consumer-20171017:scripts:3:136: key=None value=jersey-server-2.24.jar
[2017-09-24 05:26:41.252316]consumer-20171017:scripts:3:137: key=None value=snappy-java-1.1.2.6.jar
[2017-09-24 05:26:41.299705]consumer-20171017:scripts:3:138: key=None value=kafka-log4j-appender-0.11.0.0.jar
[2017-09-24 05:26:41.347477]consumer-20171017:scripts:3:139: key=None value=jackson-core-2.8.5.jar
[2017-09-24 05:26:41.359786]consumer-20171017:scripts:3:140: key=None value=kafka_2.11-0.11.0.0-sources.jar.asc
[2017-09-24 05:26:41.364435]consumer-20171017:scripts:3:141: key=None value=jersey-container-servlet-core-2.24.jar
[2017-09-24 05:26:41.378374]consumer-20171017:scripts:3:142: key=None value=aopalliance-repackaged-2.5.0-b05.jar
[2017-09-24 05:26:41.405218]consumer-20171017:scripts:3:143: key=None value=commons-lang3-3.5.jar
[2017-09-24 05:26:41.421678]consumer-20171017:scripts:3:144: key=None value=kafka_2.11-0.11.0.0-scaladoc.jar.asc
[2017-09-24 05:26:41.442244]consumer-20171017:scripts:3:145: key=None value=javassist-3.21.0-GA.jar
[2017-09-24 05:26:41.496126]consumer-20171017:scripts:3:146: key=None value=javax.inject-1.jar
[2017-09-24 05:26:41.535033]consumer-20171017:scripts:3:147: key=None value=argparse4j-0.7.0.jar
接收到13个消息
第4个Consumer:
[root@node1 python_app]# python consumer.py
start listen scripts:consumer-20171017
[2017-09-24 05:26:41.076967]consumer-20171017:scripts:0:124: key=None value=slf4j-log4j12-1.7.25.jar
[2017-09-24 05:26:41.131355]consumer-20171017:scripts:0:125: key=None value=scala-library-2.11.11.jar
[2017-09-24 05:26:41.160009]consumer-20171017:scripts:0:126: key=None value=jersey-guava-2.24.jar
[2017-09-24 05:26:41.166156]consumer-20171017:scripts:0:127: key=None value=jackson-annotations-2.8.5.jar
[2017-09-24 05:26:41.177352]consumer-20171017:scripts:0:128: key=None value=jackson-jaxrs-json-provider-2.8.5.jar
[2017-09-24 05:26:41.207782]consumer-20171017:scripts:0:129: key=None value=kafka_2.11-0.11.0.0.jar
[2017-09-24 05:26:41.230674]consumer-20171017:scripts:0:130: key=None value=javax.ws.rs-api-2.0.1.jar
[2017-09-24 05:26:41.230763]consumer-20171017:scripts:0:131: key=None value=validation-api-1.1.0.Final.jar
[2017-09-24 05:26:41.248229]consumer-20171017:scripts:0:132: key=None value=connect-api-0.11.0.0.jar
[2017-09-24 05:26:41.263827]consumer-20171017:scripts:0:133: key=None value=kafka_2.11-0.11.0.0-test.jar
[2017-09-24 05:26:41.308756]consumer-20171017:scripts:0:134: key=None value=jersey-common-2.24.jar
[2017-09-24 05:26:41.316388]consumer-20171017:scripts:0:135: key=None value=kafka_2.11-0.11.0.0-test-sources.jar
[2017-09-24 05:26:41.330062]consumer-20171017:scripts:0:136: key=None value=hk2-api-2.5.0-b05.jar
[2017-09-24 05:26:41.370872]consumer-20171017:scripts:0:137: key=None value=guava-20.0.jar
[2017-09-24 05:26:41.396702]consumer-20171017:scripts:0:138: key=None value=kafka_2.11-0.11.0.0.jar.asc
[2017-09-24 05:26:41.417372]consumer-20171017:scripts:0:139: key=None value=kafka_2.11-0.11.0.0-scaladoc.jar
[2017-09-24 05:26:41.446264]consumer-20171017:scripts:0:140: key=None value=rocksdbjni-5.0.1.jar
[2017-09-24 05:26:41.464423]consumer-20171017:scripts:0:141: key=None value=kafka_2.11-0.11.0.0-javadoc.jar
[2017-09-24 05:26:41.538334]consumer-20171017:scripts:0:142: key=None value=reflections-0.9.11.jar
[2017-09-24 05:26:41.547427]consumer-20171017:scripts:0:143: key=None value=kafka_2.11-0.11.0.0-test-sources.jar.asc
[2017-09-24 05:26:41.562840]consumer-20171017:scripts:0:144: key=None value=jopt-simple-5.0.3.jar
[2017-09-24 05:26:41.594002]consumer-20171017:scripts:0:145: key=None value=javax.inject-2.5.0-b05.jar
接收到22个消息
第5个Consumer:
[root@node1 python_app]# python consumer.py
start listen scripts:consumer-20171017
没有收到任何消息
OK我们汇总一下 Consuemr Group 里面总共是5个Consumer,其中4个Consumer收到的消息数量总和是73,正好是Producer发送的消息数量,但是第5个Consumer没有收到任何消息,如果仔细看每个Consumer输出的数据,可以发现每一个Consumer其实都只接收Topic 4个分区中的一个分区的数据,分别是2、1、3、0 这4个分区,由于总共只有4个分区,因此启动的第5个分区就没有收到任何数据,如果此时增加第5个分区,然后继续发送消息,那么第5个Consumer仍然无法接收到消息,重启之后可以收到消息,其实可以在Consumer里面指定partition,这是一个确定的做法,否则每个Consumer只是随机的接收Topic里面某个partition的数据。
在多次发送的过程中发现,Consumer其实不是只消费一个Partition的数据,在某些时候可以消费多个Partition 的数据
目前还不清楚为何会出现这种情况,但是对于负载均衡来讲,这样也没有影响。