原创文章,转载请注明: 转载自始终不够
本文链接地址: flume:支持重命名、移动文件的roll file sink升级版
转载请注明:始终不够 » flume:支持重命名、移动文件的roll file sink升级版
flume一套日志收集系统,关于这套系统如何使用的文章网络上比较多,且官方手册也很详细,这里不再累赘。主要讨论关于flume自带的roll file sink弊端及升级方案。
flume自带的roll file sink用于在本地文件系统对文件进行持久化。flume仅仅提供了如下几个参数用来配置roll file sink:
channel | – | |
type | – | The component type name, needs to be file_roll. |
sink.directory | – | The directory where files will be stored |
sink.rollInterval | 30 | Roll the file every 30 seconds. Specifying 0 will disable rolling and cause all events to be written to a single file. |
sink.serializer | TEXT | Other possible options include avro_event or the FQCN of an implementation of EventSerializer.Builder interface. |
batchSize | 100 |
在一些场景,我们希望使用文件队列的方式对日志进行处理。在这样的场景中,roll file sink暴漏出以下两个问题:
当然,一般最后一个文件是正在写入的文件,如果你能够保证你的程序永远不会处理最后一个文件,也不需要解决上述两个问题。
针对以上两个问题,对flume roll file sink做了升级,升级如下:
# 负载机flume client配置
# 组件名称配置
client.sources = source_client
client.sinks = sink_client
client.channels = channel_client
# 日志源组件配置(监听目录下新增文件)
client.sources.source_client.type = spooldir
client.sources.source_client.channels = channel_client
client.sources.source_client.spoolDir = /some_logs_dir/
client.sources.source_client.fileHeader = true
# sink组件配置
client.sinks.sink_client.type = cn.huyanping.flume.sinks.SafeRollingFileSink
client.sinks.sink_client.channel = channel_client
client.sinks.sink_client.sink.directory = /data/source
client.sinks.sink_client.sink.moveFile = true
client.sinks.sink_client.sink.targetDirectory = /data/target
client.sinks.sink_client.sink.rollInterval = 1
client.sinks.sink_client.sink.useFileSuffix = true
client.sinks.sink_client.sink.fileSuffix = .COMPLETED
client.sinks.sink_client.sink.useCopy = true
client.sinks.sink_client.sink.copyDirectory = /data/copy1,/data/copy2
# 文件管道设置
client.channels.channel_client.type = file
client.channels.channel_client.checkpointDir = /data/tmp/checkpoint
client.channels.channel_client.dataDirs = /data/tmp
项目地址:https://github.com/huyanping/flume-sinks-safe-roll-file-sink