鲁春利的工作笔记,谁说程序员不能有文艺范?
下载地址http://flume.apache.org/
当前最新版本:1.6.0
Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。
当前Flume有两个版本Flume 0.9X版本的统称Flume-og,Flume1.X版本的统称Flume-ng。
Here are a few new feature highlights in Version 1.6.0:
Flume Sink and Source for Apache Kafka
A new channel that uses Kafka
Hive Sink based on the new Hive Streaming support
End to End authentication in Flume
Simple regex search-and-replace interceptor
结构:
组成:
Flume 传输数据的基本单位是event,本身为一个byte 数组,并可携带headers 信息。Flume 运行的核心是agent。它是一个完整的数据收集工具(JVM process),含有三个核心组件,分别是source、channel、sink。Event 代表着一个数据流的最小完整单元,从外部数据源(external source)携带数据,通过agent,把数据传输到外部的目的地(next destination)。
Source:完成对日志数据的收集,分成transtion 和event 打入到channel 之中。
Channel:主要提供一个队列的功能,对source 提供中的数据进行简单的缓存。
Sink:取出Channel 中的数据,进行相应的存储文件系统、数据库、或者提交到远程服务器。
通过这些组件,event 可以从一个地方流向另一个地方。
下载flume
wget http://www.apache.org/dyn/closer.lua/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
解压apache-flume-1.6.0-bin.tar.gz
tar -zxv -f apache-flume-1.6.0-bin.tar.gz # 解压后重名为flume1.6.0 mv apache-flume-1.6.0-bin flume1.6.0
配置Java环境变量
vim /etc/profile # 在最后新增如下三行 export JAVA_HOME=/usr/local/jdk1.7 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
配置Flume环境变量
vim ~/.bash_profile # 新增 FLUME_HOME=/usr/local/flume1.6.0 FLUME_CONF_DIR=$FLUME_HOME/conf PATH=$FLUME_HOME/bin:$PATH export FLUME_HOME export FLUME_CONF_DIR export PATH
修改配置文件:
cd $FLUME_HOME/conf cp flume-env.sh.template flume-env.sh vim flume-env.sh # 修改JAVA_HOME export JAVA_HOME=/usr/local/jdk1.7
创建flume使用的配置文件(参考了flume官网的simple example:http://flume.apache.org/FlumeUserGuide.html)
# example.conf: A single-node Flume configuration # agent是一个被执行的flume程序,运行在数据源节点上,a1是agent的名字,名字由用户自定义 # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
启动flume-ng服务
[hadoop@nnode flume1.6.0]$ bin/flume-ng agent --conf conf --conf-file conf/sample.conf --name a1 -Dflume.root.logger=INFO,console # 中间略 2015-11-11 22:08:00,921 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:164)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
再新打开一个窗口,准备执行数据操作
[root@nnode ~]# telnet 127.0.0.1 44444 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. hello OK world OK
此时flume-ng端能够接收到telnet输出的数据
2015-11-11 22:08:00,921 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:164)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444] 2015-11-11 22:09:16,841 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. } 2015-11-11 22:09:16,842 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 77 6F 72 6C 64 0D world. }
从上面能够看到Flume处理数据的单位event(字节数组、headers信息及数据),source消费接收到的event(比如来自于web server):
Event: { headers:{} body: 77 6F 72 6C 64 0D world. }
Flume可以自动监控并重新加载配置文件
此时flume能够监控到配置文件的变化
2015-11-11 22:22:50,845 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:conf/sample.conf # 中间略 2015-11-11 22:22:50,922 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel c1 2015-11-11 22:22:50,932 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: CHANNEL, name: c1 started 2015-11-11 22:22:50,932 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink k1 2015-11-11 22:22:50,938 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source r1 2015-11-11 22:22:50,940 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:150)] Source starting 2015-11-11 22:22:50,941 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:164)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.137.117:44444]