环境: centos-6.5_X86_64 hadoop-2.2.0 flume-ng-1.4.0
master.hadoop: 收集 slave1.hadoop与slave2.hadoop发来的日志 保存至HDFS
下载安装
wget http://mirror.bit.edu.cn/apache/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz
2. 把/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib目录下的所有jar 复制 到/home/hadoop/apache-flume-1.4.0-bin/lib/目录下,删除flume相同名字的jar(不论版本高低)
相关命令
[hadoop@slave1 ~]$ cp /home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/* /home/hadoop/apache-flume-1.4.0-bin/lib/
[hadoop@slave1 ~]$ cd /home/hadoop/apache-flume-1.4.0-bin/lib/
[hadoop@slave1 ~]$ for i in {slf4j-log4j12-1.6.1.jar slf4j-api-1.6.1.jar protobuf-java-2.4.1.jar netty-3.4.0.Final.jarlog4j-1.2.16.jar avro-1.7.3.jar asm-3.1.jar jackson-mapper-asl-1.9.3.jarjackson-core-asl-1.9.3.jar guava-10.0.1.jar commons-codec-1.8.jar};do rm -fr $i ; done
3.修改master.hadoop slave1.hadoop slave2.hadoop 配置文件flume.conf
[hadoop@slave1 ~]$ cd apache-flume-1.4.0-bin/conf
[hadoop@slave1 ~]$ vi flume.conf
#!/bin/sh
slave1.sources = tailf
slave1.channels = memoryChannel
slave1.sinks = avro
slave1.channels.memoryChannel.type = memory
slave1.channels.memoryChannel.capacity = 1000
slave1.sources.tailf.type = exec
slave1.sources.tailf.command = tail -F /home/hadoop/works/test.log
slave1.sources.tailf.channels = memoryChannel
slave1.sources.tailf.interceptors = i1 i2
slave1.sources.tailf.interceptors.i1.type = host
slave1.sources.tailf.interceptors.i1.hostHeader = hostname
slave1.sources.tailf.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
slave1.sinks.avro.type = avro
slave1.sinks.avro.channel = memoryChannel
slave1.sinks.avro.hostname = master.hadoop
slave1.sinks.avro.port = 44414
master.hadoop 的 flume.conf:
[hadoop@master ~]$ cat apache-flume-1.4.0-bin/conf/flume.conf
master.sources = avro
master.channels = memoryChannel
master.sinks = hdfssink
master.channels.memoryChannel.type = memory
master.channels.memoryChannel.capacity = 1000
master.channels.memoryChannel.keep-alive = 30
#master.sources.avro.interceptors = i1
#master.sources.avro.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
master.sources.avro.type = avro
master.sources.avro.bind = 0.0.0.0
master.sources.avro.port = 44414
master.sources.avro.channels = memoryChannel
master.sinks.hdfssink.type = hdfs
master.sinks.hdfssink.channel = memoryChannel
master.sinks.hdfssink.hdfs.path = /flume/%{hostname}_server/%Y%m%d_date
master.sinks.hdfssink.hdfs.rollInterval = 0
master.sinks.hdfssink.hdfs.rollSize = 40000000
master.sinks.hdfssink.hdfs.rollCount = 0
master.sinks.hdfssink.hdfs.writeFormat = Text
master.sinks.hdfssink.hdfs.fileType = DataStream
#master.sinks.hdfssink.hdfs.round = true
#master.sinks.hdfssink.hdfs.roundValue = 10
#master.sinks.hdfssink.hdfs.roundUnit = minute
[hadoop@slave1 ~]$ scp apache-flume-1.4.0-bin/conf/flume.conf slave2.hadoop:~/apache-flume-1.4.0-bin/conf/
再把slave1.hadoop改为slave2.hadoop
4.环境变量:
[hadoop@slave1 ~]$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
export JAVA_HOME=/home/hadoop/jdk1.6.0_45
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
export MAVEN_HOME=/home/hadoop/apache-maven-3.1.1
export PATH=/home/hadoop/apache-maven-3.1.1/bin:$PATH
export HADOOP_PREFIX=/home/hadoop/hadoop-2.2.0
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export HADOOP_HOME=/home/hadoop/hadoop-2.2.0
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export YARN_HOME=${HADOOP_PREFIX}
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export SCALA_HOME=/home/hadoop/scala-2.10.1
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/home/hadoop/spark-0.9.1-bin-hadoop2
export FLUME_HOME=/home/hadoop/apache-flume-1.4.0-bin
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=.:$PATH::$FLUME_HOME/bin
#set sqoop Environment
export SQOOP_HOME=/home/hadoop/sqoop-1.99.3-bin-hadoop200
export PATH=${SQOOP_HOME}/bin:$PATH
5.启动flume
[hadoop@master ~]$ cd apache-flume-1.4.0-bin/
[hadoop@master apache-flume-1.4.0-bin]$ flume-ng agent -c conf -f conf/flume.conf -n master -Dflume.root.logger=INFO,console
~~~~~
......略.....
2014-05-20 16:40:07,477 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SOURCE, name: avro started
2014-05-20 16:40:07,478 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:217)] Avro source avro started.
~~~~~~
[hadoop@slave1 ~]$ cd apache-flume-1.4.0-bin
[hadoop@slave1 ~]$ flume-ng agent -c conf -f conf/flume.conf -n slave1 -Dflume.root.logger=INFO,console
~~~~~~~~
.......略......
2014-05-20 16:49:32,434 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SOURCE, name: tailf started
2014-05-20 16:49:35,170 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:300)] Rpc sink avro started.
~~~~~~~
查看master.hadoop上日志变化:
~~~~~
2014-05-20 16:49:33,957 (New I/O server boss #3) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x39acac4f, /172.20.105.133:47559 => /172.20.105.131:44414] OPEN
2014-05-20 16:49:33,958 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x39acac4f, /172.20.105.133:47559 => /172.20.105.131:44414] BOUND: /172.20.105.131:44414
2014-05-20 16:49:33,959 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x39acac4f, /172.20.105.133:47559 => /172.20.105.131:44414] CONNECTED: /172.20.105.133:47559
2014-05-20 16:49:39,171 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:56)] Serializer = TEXT, UseRawLocalFileSystem = false
2014-05-20 16:49:39,311 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172.tmp
~~~~~~~~
[hadoop@slave1 works]$ ls test*
test.log test-r.log test.sh
[hadoop@slave1 works]$ cat test.sh
#!/bin/sh
while true
do
{
cat /home/hadoop/works/test-r.log >>/home/hadoop/works/test.log
sleep 5
}
done
[hadoop@slave1 works]$ ./test.sh &
[1] 5520
查看master.hadoop日志变化:
~~~
2014-05-20 16:56:16,909 (hdfs-hdfssink-call-runner-1) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:487)] Renaming /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172.tmp to /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172
2014-05-20 16:56:17,114 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779173.tmp
~~~
在slave2.hadoop上做slave1.hadoop相同操作
[hadoop@slave2 ~]$ cd apache-flume-1.4.0-bin/
[hadoop@slave2 apache-flume-1.4.0-bin]$ flume-ng agent -c conf -f conf/flume.conf -n master -Dflume.root.logger=INFO,console
[hadoop@slave2 ~]$ ./works/test.sh &
[1] 4953
[hadoop@slave2 works]$ hdfs dfs -ls /flume
Found 6 items
drwxr-xr-x - hadoop supergroup 0 2014-05-20 16:49 /flume/172.20.105.133_server
drwxr-xr-x - hadoop supergroup 0 2014-05-20 16:59 /flume/172.20.105.135_server
[hadoop@slave2 works]$ hdfs dfs -ls /flume/172.20.105.133_server
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2014-05-20 17:03 /flume/172.20.105.133_server/20140520_date
[hadoop@slave2 works]$ hdfs dfs -ls /flume/172.20.105.133_server/20140520_date
Found 12 items
-rw-r--r-- 2 hadoop supergroup 40138292 2014-05-20 16:56 /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172
-rw-r--r-- 2 hadoop supergroup 40138367 2014-05-20 16:56 /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779173
-rw-r--r-- 2 hadoop supergroup 40138292 2014-05-20 16:57 /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779174