flume使用taildir收集文件和文件夹

使用taildir实现同时收集文件夹和文件的功能

tail source官方文档

Flume在1.7之前没有提供tail dir source这样的命令,需要自己编译tail dir source的jar包,编译好之后上传到flume的lib目录下,

jar包下载地址
链接:https://pan.baidu.com/s/1oIPqyvALUyzpd8YpdpNEqA

配置文件参考:

taildir.properties

# in this case called 'a1'

a1.sources = s1
a1.channels = c1
a1.sinks = k1

# For each one of the sources, the type is defined
a1.sources.s1.type = org.apache.flume.source.taildir.TaildirSource
a1.sources.s1.positionFile = /opt/cdhmoduels/apache-flume-1.5.0-cdh5.3.6-bin/taidir/dirsource/taildir_position.json
a1.sources.s1.filegroups = f1 f2
a1.sources.s1.filegroups.f1 = /opt/cdhmoduels/apache-flume-1.5.0-cdh5.3.6-bin/taidir/madman.txt
a1.sources.s1.headers.f1.headerKey1 = value1
a1.sources.s1.filegroups.f2 = /opt/cdhmoduels/apache-flume-1.5.0-cdh5.3.6-bin/taidir/dirsource/.*
a1.sources.s1.headers.f2.headerKey1 = value2
a1.sources.s1.headers.f2.headerKey2 = value2-2

# The channel can be defined as follows.
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1

# Each sink's type must be defined
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/event/taildir
a1.sinks.k1.hdfs.filePrefix = hive-log
#Specify the channel the sink should use

# Each channel's type is defined.
a1.channels.c1.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000

启动命令

bin/flume-ng agent --conf conf/ --name a1 --conf-file  conf/taildir.properties -Dflume.root.logger=INFO,console

你可能感兴趣的:(大数据学习)