最近项目现场发生了 日志文件积压的情况,日志文件的大概处理流程是:读取日志文件->进行结构化->发送到 kafka
以一个 7.57M的日志文件为例(约58400条日志信息),程序需要 16.8s才能将其处理完,也就是说平均每秒只能处理约 3500条数据,经过验证,其中瓶颈在数据发送到 Kafka这一步。这个根据经验,明显肯定是没有达到 Kafka的瓶颈的,应该是自己 producer程序有问题,于是去 Kafka官网一查:
Single producer thread, 3x asynchronous replication
786,980 records/sec
(75.1 MB/sec)
跟官网给出的性能数据一比,那就老老实实去找自己程序的问题吧。
根据经验,第一个想到的就是去进行客户端参数调优。接下来就介绍下 kafka producer的重要配置项,然后给出参数调优的方案和最终结果。
The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both
the client and the server. This configuration controls the default batch size in bytes.
No attempt will be made to batch records larger than this size.
Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent.
A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). A very large batch size may use
memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records.
Type: int
Default: 16384
Valid Values: [0,...]
Importance: medium
kafka 并不会傻到来一条就发送一条数据,会进行缓存,然后批量发送,这个参数就是控制 每批发送的数据量上限
The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records
arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting
accomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record the producer will wait for up to the given delay to
allow other records to be sent so that the sends can be batched together. This can be thought of as analogous to Nagle's algorithm in TCP. This setting gives the upper
bound on the delay for batching: once we get batch.size worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer
than this many bytes accumulated for this partition we will 'linger' for the specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay).
Setting linger.ms=5, for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absence of load.
Type: long
Default: 0
Valid Values: [0,...]
Importance: medium
kafka 会对需要发送的数据进行缓存,这个配置就是控制缓存数据的最长时间
The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the durability of records that
are sent. The following settings are allowed:
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and
considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't
generally know of any failures). The offset given back for each record will always be set to -1.
acks=1 This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the
leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least
one in-sync replica remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting.
Type: string
Default: 1
Valid Values: [all, -1, 0, 1]
Importance: high
这个参数控制 producer发送数据后是否等待服务端的确认消息,有点 UDP和 TCP的感觉在里面
Specify the final compression type for a given topic. This configuration accepts the standard compression codecs ('gzip', 'snappy', 'lz4', 'zstd'). It additionally accepts
'uncompressed' which is equivalent to no compression; and 'producer' which means retain the original compression codec set by the producer.
Type: string
Default: producer
Valid Values:
Importance: high
Update Mode: cluster-wide
这个参数很重要!从前面已经知道了 kafka会缓存需要发送的数据,那么在批量发送数据时是不是可以进行压缩呢?答案是肯定的,就是这个参数控制使用的压缩算法。不知道为什么 kafka默认是不进行压缩的,这点略坑,具体原因后面有时间再研究。
先把调优方案贴出来:
batch_size=563840 ——默认值是 16384
linger_ms=30000 ——默认值是 0
acks=0 ——默认值是 1
compression_type="gzip" ——默认值是 None
讲下参数调优的思路:
适当调大 batch.size和 linger.ms
这两个参数是配合起来使用的,目的就是缓存更多的数据,减少客户端发起请求的次数。这两个参数根据实际情况调整,注意要适量
关闭数据发送确认机制
灵感来自 UDP协议,适用于对数据完整性要求不高的场景,比如日志,丢几条无所谓那种
制定数据发送时的压缩算法
这是本次调优的大招,让 kafka使用 gzip算法将需要发送的数据先压缩后发送。
名称 | 文件大小 | 单个文件数据量 | 文件总数 | 耗时 | 平均处理每个文件耗时 |
---|---|---|---|---|---|
原程序 | 7.57M | 58400 | 20 | 336s | 16.8s |
调优后 | 7.57M | 58400 | 20 | 207s | 10.3s |
结果还是很喜人的,在同样的情况下,调优后的程序处理能力提高了约 40%。
程序经过调优,数据处理能力从 3500条/秒 到 5700条/秒,虽然跟 kafka官网给出的性能数据相差甚远(机器处理能力、Kafka集群、网络IO等都不同),但是 40%的性能提升还是很不错的。