超可爱慕之

2020.12.02课堂笔记(Kafka的原理及环境配置)

消息中间件（MQ）：

消息中间件是基于队列与消息传递技术，在网络环境中为应用系统提供同步或异步、可靠的消息传输的支撑性软件系统
主要的作用是削峰和解耦：
举例：菜鸟驿站，快递员把快递发到菜鸟驿站，通过手机接收验证码去取件。
饭店炒菜的大厨，端盘子的，洗盘子的。都会把盘子集中放在一个地方，需要的人自由去取，而不是端着盘子在那里等着对方忙完。
消息中间件起到的是同样的作用，在双十一高峰期的时候不会把数据从前端直接写到数据库中，而是存放在消息中间件里，数据库需要的时候自由去取，有推送和订阅两种形式。
有哪些常见的消息中间件：
RabbitMQ、RocketMQ、ActiveMQ、Kafka
有什么异同：

特性	ActiveMQ	RabbitMQ	RocketMQ	Kafka
单机吞吐量	万级，比 RocketMQ、Kafka 低一个数量级	同 ActiveMQ	10 万级，支撑高吞吐	10 万级，高吞吐，一般配合大数据类的系统来进行实时数据计算、日志采集等场景
topic 数量对吞吐量的影响			topic 可以达到几百/几千的级别，吞吐量会有较小幅度的下降，这是 RocketMQ 的一大优势，在同等机器下，可以支撑大量的 topic	topic 从几十到几百个时候，吞吐量会大幅度下降，在同等机器下，Kafka 尽量保证 topic 数量不要过多，如果要支撑大规模的 topic，需要增加更多的机器资源
时效性	ms 级	微秒级，这是 RabbitMQ 的一大特点，延迟最低	ms 级	延迟在 ms 级以内
可用性	高，基于主从架构实现高可用	同 ActiveMQ	非常高，分布式架构	非常高，分布式，一个数据多个副本，少数机器宕机，不会丢失数据，不会导致不可用
消息可靠性	有较低的概率丢失数据	基本不丢	经过参数优化配置，可以做到 0 丢失	同 RocketMQ
功能支持	MQ 领域的功能极其完备	基于 erlang 开发，并发能力很强，性能极好，延时很低	MQ 功能较为完善，还是分布式的，扩展性好	功能较为简单，主要支持简单的 MQ 功能，在大数据领域的实时计算以及日志采集被大规模使用

综上，各种对比之后，有如下建议：

ActiveMQ：
优点：1 非常成熟，功能强大，在早些年业内大量的公司以及项目中都有应用
缺点：1 没经过大规模吞吐量场景的验证（主要是基于解耦和异步来用的，较少在大规模吞吐的场景中使用）
2 社区也不是很活跃
3 偶尔会有较低概率丢失消息（不推荐）

后来大家开始用 RabbitMQ：
优点： 1 开源的，比较稳定的支持，活跃度也高
2 erlang语言开发，性能极其好，延时很低；
缺点：1 erlang 语言阻止了大量的 Java 工程师去深入研究和掌控它，对公司而言，几乎处于不可控的状态（erlang开发，很难去看懂源码，基本职能依赖于开源社区的快速维护和修复bug）
2 RabbitMQ确实吞吐量会低一些，这是因为他做的实现机制比较重

kafka：
缺点：kafka唯一的一点劣势是有可能消息重复消费，那么对数据准确性会造成极其轻微的影响，在大数据领域中以及日志采集中，这点轻微影响可以忽略

一般的业务系统要引入MQ，最早大家都用ActiveMQ，但是现在确实大家用的不多了，没经过大规模吞吐量场景的验证，社区也不是很活跃后来大家开始用RabbitMQ，但是确实erlang语言阻止了大量的java工程师去深入研究和掌控他，对公司而言，几乎处于不可控的状态，但是确实人家是开源的，比较稳定的支持，活跃度也高；不过现在确实越来越多的公司，会去用RocketMQ，确实很不错，但是要想好社区万一突然黄掉的风险所以中小型公司，技术实力较为一般，技术挑战不是特别高，用RabbitMQ是不错的选择；大型公司，基础架构研发实力较强，用RocketMQ是很好的选择

如果是大数据领域的实时计算、日志采集等场景，用Kafka是业内标准的，绝对没问题，社区活跃度很高，绝对不会黄，何况几乎是全世界这个领域的事实性规范

什么是Kafka

Apache Kafka® 是一个分布式流处理平台. 这到底意味着什么呢?

我们知道流处理平台有以下三种特性:

1.可以让你发布和订阅流式的记录。这一方面与消息队列或者企业消息系统类似。
2.可以储存流式的记录，并且有较好的容错性。
3.可以在流式记录产生时就进行处理。

Kafka适合什么样的场景?

它可以用于两大类别的应用:
构造实时流数据管道，它可以在系统或应用之间可靠地获取数据。 (相当于message queue)
构建实时流式应用程序，对这些流数据进行转换或者影响。 (就是流处理，通过kafka stream topic和topic之间内部进行变化)

为了理解Kafka是如何做到以上所说的功能，从下面开始，我们将深入探索Kafka的特性。
首先是一些概念:

Kafka作为一个集群，运行在一台或者多台服务器上.
Kafka 通过 topic 对存储的流数据进行分类。
每条记录中包含一个key，一个value和一个timestamp（时间戳）。

Kafka有四个核心的API:

The Producer API 允许一个应用程序发布一串流式的数据到一个或者多个Kafka topic。
The Consumer API 允许一个应用程序订阅一个或多个 topic ，并且对发布给他们的流式数据进行处理。
The Streams API 允许一个应用程序作为一个流处理器，消费一个或者多个topic产生的输入流，然后生产一个输出流到一个或多个topic中去，在输入输出流中进行有效的转换。
The Connector API 允许构建并运行可重用的生产者或者消费者，将Kafka topics连接到已存在的应用程序或者数据系统。比如，连接到一个关系型数据库，捕捉表（table）的所有变更内容。

在Kafka中，客户端和服务器使用一个简单、高性能、支持多语言的 TCP 协议.此协议版本化并且向下兼容老版本，我们为Kafka提供了Java客户端，也支持许多其他语言的客户端。

Kafka的相关配置：

Kafka修改的配置项：
在config目录下找到server.properties文件：
kafka在启动服务之前必须要设定3个参数：broker.id、log.dirs、zookeeper.connect

[root@hadoop100 config]# vi server.properties
写在前面的是行号大概所在的位置：
//listeners: <协议名称>://<内网ip>:<端口>  9092是Kafka默认的监听端口
38 advertised.listeners=PLAINTEXT://192.168.237.100:9092
22 broker.id=0
62 log.dirs=/opt/kafka211/kafka-logs # 设置日志的存储位置， 默认是/tmp目录下，容易丢失
125 zookeeper.connect=192.168.237.100:2181
//默认为false，只有设置为true才可以执行删除操作
140 delete.topic.enable=true #实际开发环境中使用false

启动Kafka：

[root@hadoop100 opt]# kafka-server-start.sh /opt/kafka211/config/server.properties
[root@hadoop100 logs]# /opt/kafka211/bin/kafka-server-start.sh /opt/server.properties 2>&1 >> /var/kafka.log &

这里是配置了环境变量，所以可以直接使用kafka-server-start.sh命令
在环境变量中添加：

vi /etc/profile 编辑环境变量配置文件：
export KAFKA_HOME=/opt/kafka211
在PATH中添加: $KAFKA_HOME/bin
结束后要求source才能生效：
source /etc/profile

启动后会进入阻塞窗口，后台启动Kafka的方法，加一个 --daemon：

[root@hadoop100 opt]# kafka-server-start.sh -daemon /opt/kafka211/config/server.properties

每次写Kafka配置文件的全路径非常麻烦，可以使用软连接的方式：

# 创建软连接
ln -s /opt/kafka211/config/server.properties /opt/server.properties
# 启动Kafka
[root@hadoop100 opt]# kafka-server-start.sh -daemon /opt/server.properties
# 方式二  
[root@hadoop100 logs]# /opt/kafka211/bin/kafka-server-start.sh /opt/server.properties 2>&1 >> /var/kafka.log &

查看队列：

[root@hadoop100 opt]# kafka-topics.sh --zookeeper 192.168.237.100:2181 --list
查看具体的队列：
[root@hadoop100 opt]# kafka-topics.sh --zookeeper 192.168.237.100:2181 --topic kb09demo --describe

创建topic

[root@hadoop100 opt]# kafka-topics.sh 
--create # 代表将要创建一个topic，同理还有--delete(删除),--describe(查看),--alter(修改)
--zookeeper 192.168.237.100:2181  # kafka连接的zookeeper
--topic kb09demo                  # 指定topic的名字
--partitions 1                    # 设置分区数
--replication-factor 1            # 副本数，这个数值不能大于节点的数量

删除队列：

[root@hadoop100 opt]# kafka-topics.sh --delete --topic kb09demo --zookeeper 192.168.237.100:2181
Topic kb09demo is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.
//需要在配置中设置delete.topic.enable为true

生产者创建生产消息：

[root@hadoop100 opt]# kafka-console-producer.sh --topic kb09demo --broker-list 92.168.237.100:9092

消费者接收消息

[root@hadoop100 ~]# kafka-console-consumer.sh --topic kb09demo --bootstrap-server 192.168.237.100:9092 --from-beginning

查看，创建，删除topic都使用了kafka-topics.sh脚本：

[root@hadoop100 bin]# kafka-topics.sh --help
Command must include exactly one action: --list, --describe, --create, --alter or --delete
Option                                   Description
------                                   -----------
--alter                                  Alter the number of partitions,
                                           replica assignment, and/or
                                           configuration for the topic.
--config <String: name=value>            A topic configuration override for the
                                           topic being created or altered.The
                                           following is a list of valid
                                           configurations:
                                                cleanup.policy                  
                                                compression.type                
                                                delete.retention.ms             
                                                file.delete.delay.ms            
                                                flush.messages                  
                                                flush.ms                        
                                                follower.replication.throttled. 
                                           replicas
                                                index.interval.bytes            
                                                leader.replication.throttled.replicas
                                                max.message.bytes               
                                                message.downconversion.enable   
                                                message.format.version          
                                                message.timestamp.difference.max.ms
                                                message.timestamp.type          
                                                min.cleanable.dirty.ratio       
                                                min.compaction.lag.ms           
                                                min.insync.replicas             
                                                preallocate                     
                                                retention.bytes                 
                                                retention.ms                    
                                                segment.bytes                   
                                                segment.index.bytes             
                                                segment.jitter.ms               
                                                segment.ms                      
                                                unclean.leader.election.enable  
                                         See the Kafka documentation for full
                                           details on the topic configs.
--create                                 Create a new topic.
--delete                                 Delete a topic
--delete-config <String: name>           A topic configuration override to be
                                           removed for an existing topic (see
                                           the list of configurations under the
                                           --config option).
--describe                               List details for the given topics.
--disable-rack-aware                     Disable rack aware replica assignment
--force                                  Suppress console prompts
--help                                   Print usage information.
--if-exists                              if set when altering or deleting
                                           topics, the action will only execute
                                           if the topic exists
--if-not-exists                          if set when creating topics, the
                                           action will only execute if the
                                           topic does not already exist
--list                                   List all available topics.
--partitions <Integer: # of partitions>  The number of partitions for the topic
                                           being created or altered (WARNING:
                                           If partitions are increased for a
                                           topic that has a key, the partition
                                           logic or ordering of the messages
                                           will be affected
--replica-assignment <String:            A list of manual partition-to-broker
  broker_id_for_part1_replica1 :           assignments for the topic being
  broker_id_for_part1_replica2 ,           created or altered.
  broker_id_for_part2_replica1 :
  broker_id_for_part2_replica2 , ...>
--replication-factor <Integer:           The replication factor for each
  replication factor>                      partition in the topic being created.
--topic <String: topic>                  The topic to be create, alter or
                                           describe. Can also accept a regular
                                           expression except for --create option
--topics-with-overrides                  if set when describing topics, only
                                           show topics that have overridden
                                           configs
--unavailable-partitions                 if set when describing topics, only
                                           show partitions whose leader is not
                                           available
--under-replicated-partitions            if set when describing topics, only
                                           show under replicated partitions
--zookeeper <String: hosts>              REQUIRED: The connection string for
                                           the zookeeper connection in the form
                                           host:port. Multiple hosts can be
                                           given to allow fail-over.

生产者脚本：kafka-console-producer.sh

[root@hadoop100 bin]# kafka-console-producer.sh
Read data from standard input and publish it to Kafka.
Option                                   Description
------                                   -----------
--batch-size <Integer: size>             Number of messages to send in a single
                                           batch if they are not being sent
                                           synchronously. (default: 200)
--broker-list <String: broker-list>      REQUIRED: The broker list string in
                                           the form HOST1:PORT1,HOST2:PORT2.
--compression-codec [String:             The compression codec: either 'none',
  compression-codec]                       'gzip', 'snappy', or 'lz4'.If
                                           specified without value, then it
                                           defaults to 'gzip'
--line-reader <String: reader_class>     The class name of the class to use for
                                           reading lines from standard in. By
                                           default each line is read as a
                                           separate message. (default: kafka.
                                           tools.
                                           ConsoleProducer$LineMessageReader)
--max-block-ms <Long: max block on       The max time that the producer will
  send>                                    block for during a send request
                                           (default: 60000)
--max-memory-bytes <Long: total memory   The total memory used by the producer
  in bytes>                                to buffer records waiting to be sent
                                           to the server. (default: 33554432)
--max-partition-memory-bytes <Long:      The buffer size allocated for a
  memory in bytes per partition>           partition. When records are received
                                           which are smaller than this size the
                                           producer will attempt to
                                           optimistically group them together
                                           until this size is reached.
                                           (default: 16384)
--message-send-max-retries <Integer>     Brokers can fail receiving the message
                                           for multiple reasons, and being
                                           unavailable transiently is just one
                                           of them. This property specifies the
                                           number of retires before the
                                           producer give up and drop this
                                           message. (default: 3)
--metadata-expiry-ms <Long: metadata     The period of time in milliseconds
  expiration interval>                     after which we force a refresh of
                                           metadata even if we haven't seen any
                                           leadership changes. (default: 300000)
--producer-property <String:             A mechanism to pass user-defined
  producer_prop>                           properties in the form key=value to
                                           the producer.
--producer.config <String: config file>  Producer config properties file. Note
                                           that [producer-property] takes
                                           precedence over this config.
--property <String: prop>                A mechanism to pass user-defined
                                           properties in the form key=value to
                                           the message reader. This allows
                                           custom configuration for a user-
                                           defined message reader.
--request-required-acks <String:         The required acks of the producer
  request required acks>                   requests (default: 1)
--request-timeout-ms <Integer: request   The ack timeout of the producer
  timeout ms>                              requests. Value must be non-negative
                                           and non-zero (default: 1500)
--retry-backoff-ms <Integer>             Before each retry, the producer
                                           refreshes the metadata of relevant
                                           topics. Since leader election takes
                                           a bit of time, this property
                                           specifies the amount of time that
                                           the producer waits before refreshing
                                           the metadata. (default: 100)
--socket-buffer-size <Integer: size>     The size of the tcp RECV size.
                                           (default: 102400)
--sync                                   If set message send requests to the
                                           brokers are synchronously, one at a
                                           time as they arrive.
--timeout <Integer: timeout_ms>          If set and the producer is running in
                                           asynchronous mode, this gives the
                                           maximum amount of time a message
                                           will queue awaiting sufficient batch
                                           size. The value is given in ms.
                                           (default: 1000)
--topic <String: topic>                  REQUIRED: The topic id to produce
                                           messages to.

具体参数的含义：

参数	值类型	说明	有效值
--bootstrap-server	String	要连接的服务器(kafka_2.12-2.5.0版本后加入的) 必需(除非指定–broker-list)	形如：host1:prot1,host2:prot2
--topic	String	(必需)接收消息的主题名称
--broker-list	String	(kafka_2.12-2.5.0版本前)要连接的服务器	形如：host1:prot1,host2:prot2
--batch-size	Integer	单个批处理中发送的消息数	200(默认值)
--compression-codec	String	压缩编解码器	none、gzip(默认值)snappy、lz4、zstd
--max-block-ms	Long	在发送请求期间，生产者将阻止的最长时间	60000(默认值)
--max-memory-bytes	Long	生产者用来缓冲等待发送到服务器的总内存	33554432(默认值)
--max-partition-memory-bytes	Long	为分区分配的缓冲区大小	16384
--message-send-max-retries	Integer	最大的重试发送次数	3
--metadata-expiry-ms	Long	强制更新元数据的时间阈值(ms)	300000
--producer-property	String	将自定义属性传递给生成器的机制	形如：key=value
--producer.config	String	生产者配置属性文件 [–producer-property]优先于此配置配置文件完整路径
--property	String	自定义消息读取器	parse.key=true\|false key.separator= ignore.error=true
--request-required-acks	String	生产者请求的确认方式	0、1(默认值)、all
--request-timeout-ms	Integer	生产者请求的确认超时时间	1500(默认值)
--retry-backoff-ms	Integer	生产者重试前，刷新元数据的等待时间阈值	100(默认值)
--socket-buffer-size	Integer	TCP接收缓冲大小	102400(默认值)
--timeout	Integer	消息排队异步等待处理的时间阈值	1000(默认值)
--sync		同步发送消息
--version		显示 Kafka 版本不配合其他参数时，显示为本地Kafka版本
--help		打印帮助信息

消费者脚本：kafka-console-consumer.sh

[root@hadoop100 bin]# kafka-console-consumer.sh
The console consumer is a tool that reads data from Kafka and outputs it to standard output.
Option                                   Description
------                                   -----------
--bootstrap-server <String: server to    REQUIRED: The server(s) to connect to.
  connect to>
--consumer-property <String:             A mechanism to pass user-defined
  consumer_prop>                           properties in the form key=value to
                                           the consumer.
--consumer.config <String: config file>  Consumer config properties file. Note
                                           that [consumer-property] takes
                                           precedence over this config.
--enable-systest-events                  Log lifecycle events of the consumer
                                           in addition to logging consumed
                                           messages. (This is specific for
                                           system tests.)
--formatter <String: class>              The name of a class to use for
                                           formatting kafka messages for
                                           display. (default: kafka.tools.
                                           DefaultMessageFormatter)
--from-beginning                         If the consumer does not already have
                                           an established offset to consume
                                           from, start with the earliest
                                           message present in the log rather
                                           than the latest message.
--group <String: consumer group id>      The consumer group id of the consumer.
--isolation-level <String>               Set to read_committed in order to
                                           filter out transactional messages
                                           which are not committed. Set to
                                           read_uncommittedto read all
                                           messages. (default: read_uncommitted)
--key-deserializer <String:
  deserializer for key>
--max-messages <Integer: num_messages>   The maximum number of messages to
                                           consume before exiting. If not set,
                                           consumption is continual.
--offset <String: consume offset>        The offset id to consume from (a non-
                                           negative number), or 'earliest'
                                           which means from beginning, or
                                           'latest' which means from end
                                           (default: latest)
--partition <Integer: partition>         The partition to consume from.
                                           Consumption starts from the end of
                                           the partition unless '--offset' is
                                           specified.
--property <String: prop>                The properties to initialize the
                                           message formatter. Default
                                           properties include:
                                                print.timestamp=true|false
                                                print.key=true|false
                                                print.value=true|false
                                                key.separator=<key.separator>
                                                line.separator=<line.separator>
                                                key.deserializer=<key.deserializer>
                                                value.deserializer=<value.
                                           deserializer>
                                         Users can also pass in customized
                                           properties for their formatter; more
                                           specifically, users can pass in
                                           properties keyed with 'key.
                                           deserializer.' and 'value.
                                           deserializer.' prefixes to configure
                                           their deserializers.
--skip-message-on-error                  If there is an error when processing a
                                           message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms>       If specified, exit if no message is
                                           available for consumption for the
                                           specified interval.
--topic <String: topic>                  The topic id to consume on.
--value-deserializer <String:
  deserializer for values>
--whitelist <String: whitelist>          Whitelist of topics to include for
                                           consumption.