Flume是一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。
agent是flume最小的独立运行单元,一个agent就是一个jvm,所有,在安装flume之前必须安装jdk。
一:flume安装:
wget http://mirrors.cnnic.cn/apache/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
二:解压:
tar -xvzf apache-flume*.tar.gz
三:修改配置文件:
1:修改conf文件夹下flume-env.sh来指定JAVA_HOME,如果不去指定JAVA_HOME就有可能在启动flume时报错:
exportJAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
JRE_HOME=$JAVA_HOME/jre
2:在conf文件夹下新建flume.conf:
这是位于日志存放机器上的flume配置,它用来接收其他机器上flume发过来的数据,在channel选择器通过报头的不同键值来区分放在哪个channel里,最后由不同的sink将不同channel里的数据存放到不同的文件里。
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# source配置
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.1.80
a1.sources.r1.port = 4444
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type=multiplexing
a1.sources.r1.selector.header=sn
a1.sources.r1.selector.mapping.game1=c1
a1.sources.r1.selector.mapping.game2=c2
# channel配置
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacity = 800000
a1.channels.c1.keep-alive=3
a1.channels.c2.type = memory
a1.channels.c2.capacity = 10000
a1.channels.c2.transactionCapacity = 10000
a1.channels.c2.byteCapacity = 800000
a1.channels.c2.keep-alive=3
#sink配置
a1.sinks.k1.type = com.fyx.flume.FileSink
a1.sinks.k1.file.path = /home/ayue/flume/flume-test/game1/
a1.sinks.k1.channel = c1
a1.sinks.k1.file.filePrefix = log-
a1.sinks.k1.file.txnEventMax = 100
a1.sinks.k1.file.maxOpenFiles = 5
a1.sinks.k2.type = com.fyx.flume.FileSink
a1.sinks.k2.file.path = /home/ayue/flume/flume-test/game2/
a1.sinks.k2.channel = c2
a1.sinks.k2.file.filePrefix = log-
a1.sinks.k2.file.txnEventMax = 100
a1.sinks.k2.file.maxOpenFiles = 5
这是位于日志采集机器上的flume配置,采集数据后通过source拦截器给每个事件添加上报头以及报头上的key和value,方便后面的channel选择器来进行选择。
a2.sources = r2
a2.sinks = k2
a2.channels = c2
# source配置
a2.sources.r2.type = exec
a2.sources.r2.command = tail -F /home/ayue/flume/flume-source/serviceTest.log
a2.sources.r2.channels = c2
a2.sources.r2.interceptors = static
a2.sources.r2.interceptors.static.type=static
a2.sources.r2.interceptors.static.key=sn
a2.sources.r2.interceptors.static.value =game1
a2.sources.r2.interceptors.static.preserveExisting=false
# channel配置
a2.channels.c2.type = memory
a2.channels.c2.capacity = 10000
a2.channels.c2.transactionCapacity = 10000
a2.channels.c2.byteCapacity = 800000
a2.channels.c2.keep-alive=3
#sink配置
a2.sinks.k2.type = avro
a2.sinks.k2.hostname = 192.168.1.80
a2.sinks.k2.port = 4444
a2.sinks.k2.channel = c2
a2.sinks.k2.connect-timeout=20000
a2.sinks.k2.batch-size = 100
在a1的sink配置type是com.fyx.flume.FileSink,这是我自己开发的sink插件,因为flume文件存储不支持log-2018-04-18-21这种格式的目录输出,所有就自己开发了这个插件。
插件地址
https://download.csdn.net/download/newayue/10360684
或:https://github.com/ayue123/flume
3:插件导入:
直接在flume下创建plugins.d目录,目录结构为:
plugins.d/
plugins.d/FileSink/
plugins.d/FileSink/lib/flume-file-sink.jar
plugins.d/FileSink/libext/
plugins.d/FileSink/native/
lib是放插件JAR的目录,libext是放插件的依赖JAR的目录,native放使用到的原生库
重新启动flumeagent,flume就会自动装载我们的插件,这样在flume.conf中就可以使用全路径类名配置type属性了.
四:启动:
先将a1启动:
bin/flume-ngagent--confconf--conf-file./conf/flume.conf--name a1-Dflume.root.logger=INFO,console
之后启动a2
bin/flume-ngagent--confconf--conf-file./conf/flume.conf--name a2-Dflume.root.logger=INFO,console
我们可以写一个死循环来模拟日志的写入:
while true
>do
> date >>serviceTest.log
> sleep2
> done五:一些需要注意的坑:
1:a1.sources.r1.channels = c1其中channels不能写为channel,如果写错就会报下面的错:
2018-04-19 01:37:45,867 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSources(FlumeConfiguration.java:589)] Could not configure source r1 due to: Failed to configure component!
org.apache.flume.conf.ConfigurationException: Failed to configure component!
at org.apache.flume.conf.source.SourceConfiguration.configure(SourceConfiguration.java:111)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSources(FlumeConfiguration.java:566)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:346)
at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.access$000(FlumeConfiguration.java:212)
at org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:126)
at org.apache.flume.conf.FlumeConfiguration.(FlumeConfiguration.java:108)
at org.apache.flume.node.PropertiesFileConfigurationProvider.getFlumeConfiguration(PropertiesFileConfigurationProvider.java:189)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:93)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.conf.ConfigurationException: No channels set for r1
at org.apache.flume.conf.source.SourceConfiguration.configure(SourceConfiguration.java:69)
... 15 more
2:a1.sinks.k1.sink.directory= /var/log/flume,其中红色的sink不能省略,省略就会报下面的错:
java.lang.IllegalArgumentException: Directory may not be null
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.flume.sink.RollingFileSink.configure(RollingFileSink.java:90)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:411)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)