flume-ng1.4.0


环境: centos-6.5_X86_64         hadoop-2.2.0    flume-ng-1.4.0

       master.hadoop: 收集 slave1.hadoop与slave2.hadoop发来的日志 保存至HDFS


  1. 下载安装

    wget  http://mirror.bit.edu.cn/apache/flume/1.4.0/apache-flume-1.4.0-bin.tar.gz

2.  /home/hadoop/hadoop-2.2.0/share/hadoop/common/lib目录下的所有jar  复制 到/home/hadoop/apache-flume-1.4.0-bin/lib/目录下,删除flume相同名字的jar(不论版本高低)


相关命令

[hadoop@slave1 ~]$  cp  /home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/*  /home/hadoop/apache-flume-1.4.0-bin/lib/

[hadoop@slave1 ~]$  cd /home/hadoop/apache-flume-1.4.0-bin/lib/

[hadoop@slave1 ~]$  for i in {slf4j-log4j12-1.6.1.jar  slf4j-api-1.6.1.jar  protobuf-java-2.4.1.jar netty-3.4.0.Final.jarlog4j-1.2.16.jar avro-1.7.3.jar asm-3.1.jar jackson-mapper-asl-1.9.3.jarjackson-core-asl-1.9.3.jar guava-10.0.1.jar commons-codec-1.8.jar};do  rm -fr $i ; done


3.修改master.hadoop  slave1.hadoop  slave2.hadoop  配置文件flume.conf

[hadoop@slave1 ~]$ cd  apache-flume-1.4.0-bin/conf

[hadoop@slave1 ~]$ vi flume.conf

#!/bin/sh

slave1.sources = tailf

slave1.channels = memoryChannel

slave1.sinks = avro  


slave1.channels.memoryChannel.type = memory

slave1.channels.memoryChannel.capacity = 1000


slave1.sources.tailf.type = exec

slave1.sources.tailf.command = tail -F  /home/hadoop/works/test.log

slave1.sources.tailf.channels = memoryChannel


slave1.sources.tailf.interceptors = i1  i2

slave1.sources.tailf.interceptors.i1.type = host

slave1.sources.tailf.interceptors.i1.hostHeader = hostname


slave1.sources.tailf.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder



slave1.sinks.avro.type = avro

slave1.sinks.avro.channel = memoryChannel

slave1.sinks.avro.hostname = master.hadoop

slave1.sinks.avro.port = 44414


master.hadoop 的 flume.conf:

[hadoop@master ~]$ cat  apache-flume-1.4.0-bin/conf/flume.conf


master.sources = avro

master.channels = memoryChannel

master.sinks = hdfssink


master.channels.memoryChannel.type = memory

master.channels.memoryChannel.capacity = 1000

master.channels.memoryChannel.keep-alive = 30


#master.sources.avro.interceptors = i1 

#master.sources.avro.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder


master.sources.avro.type = avro

master.sources.avro.bind = 0.0.0.0

master.sources.avro.port = 44414

master.sources.avro.channels = memoryChannel


master.sinks.hdfssink.type = hdfs

master.sinks.hdfssink.channel = memoryChannel

master.sinks.hdfssink.hdfs.path = /flume/%{hostname}_server/%Y%m%d_date


master.sinks.hdfssink.hdfs.rollInterval = 0

master.sinks.hdfssink.hdfs.rollSize = 40000000

master.sinks.hdfssink.hdfs.rollCount = 0

master.sinks.hdfssink.hdfs.writeFormat = Text

master.sinks.hdfssink.hdfs.fileType = DataStream


#master.sinks.hdfssink.hdfs.round = true

#master.sinks.hdfssink.hdfs.roundValue = 10

#master.sinks.hdfssink.hdfs.roundUnit = minute


[hadoop@slave1 ~]$ scp apache-flume-1.4.0-bin/conf/flume.conf slave2.hadoop:~/apache-flume-1.4.0-bin/conf/  

再把slave1.hadoop改为slave2.hadoop


4.环境变量:

[hadoop@slave1 ~]$ cat .bash_profile 

# .bash_profile


# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

fi


# User specific environment and startup programs


PATH=$PATH:$HOME/bin


export PATH

export JAVA_HOME=/home/hadoop/jdk1.6.0_45

export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

export MAVEN_HOME=/home/hadoop/apache-maven-3.1.1

export PATH=/home/hadoop/apache-maven-3.1.1/bin:$PATH

export HADOOP_PREFIX=/home/hadoop/hadoop-2.2.0

export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

export HADOOP_COMMON_HOME=${HADOOP_PREFIX}

export HADOOP_HDFS_HOME=${HADOOP_PREFIX}

export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}

export HADOOP_YARN_HOME=${HADOOP_PREFIX}

export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop

export HADOOP_HOME=/home/hadoop/hadoop-2.2.0

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

export YARN_HOME=${HADOOP_PREFIX}

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

export SCALA_HOME=/home/hadoop/scala-2.10.1

export PATH=$PATH:$SCALA_HOME/bin

export SPARK_HOME=/home/hadoop/spark-0.9.1-bin-hadoop2

export FLUME_HOME=/home/hadoop/apache-flume-1.4.0-bin

export FLUME_CONF_DIR=$FLUME_HOME/conf

export PATH=.:$PATH::$FLUME_HOME/bin

#set sqoop Environment

export SQOOP_HOME=/home/hadoop/sqoop-1.99.3-bin-hadoop200

export PATH=${SQOOP_HOME}/bin:$PATH



5.启动flume

[hadoop@master ~]$ cd apache-flume-1.4.0-bin/

[hadoop@master apache-flume-1.4.0-bin]$ flume-ng agent -c conf  -f conf/flume.conf -n master  -Dflume.root.logger=INFO,console

~~~~~

......略.....

2014-05-20 16:40:07,477 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SOURCE, name: avro started

2014-05-20 16:40:07,478 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:217)] Avro source avro started.

~~~~~~


[hadoop@slave1 ~]$ cd apache-flume-1.4.0-bin

[hadoop@slave1 ~]$ flume-ng agent -c conf  -f conf/flume.conf -n slave1 -Dflume.root.logger=INFO,console

~~~~~~~~

.......略......


2014-05-20 16:49:32,434 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SOURCE, name: tailf started

2014-05-20 16:49:35,170 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.sink.AbstractRpcSink.start(AbstractRpcSink.java:300)] Rpc sink avro started.

~~~~~~~

查看master.hadoop上日志变化:

~~~~~

2014-05-20 16:49:33,957 (New I/O server boss #3) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x39acac4f, /172.20.105.133:47559 => /172.20.105.131:44414] OPEN

2014-05-20 16:49:33,958 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x39acac4f, /172.20.105.133:47559 => /172.20.105.131:44414] BOUND: /172.20.105.131:44414

2014-05-20 16:49:33,959 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x39acac4f, /172.20.105.133:47559 => /172.20.105.131:44414] CONNECTED: /172.20.105.133:47559

2014-05-20 16:49:39,171 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:56)] Serializer = TEXT, UseRawLocalFileSystem = false

2014-05-20 16:49:39,311 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172.tmp

~~~~~~~~


[hadoop@slave1 works]$ ls test*

test.log  test-r.log  test.sh

[hadoop@slave1 works]$ cat  test.sh 

#!/bin/sh

while  true

do

{

cat /home/hadoop/works/test-r.log  >>/home/hadoop/works/test.log

sleep 5

}

done

[hadoop@slave1 works]$ ./test.sh &

[1] 5520

查看master.hadoop日志变化:

~~~

2014-05-20 16:56:16,909 (hdfs-hdfssink-call-runner-1) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:487)] Renaming /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172.tmp to /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172

2014-05-20 16:56:17,114 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:219)] Creating /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779173.tmp

~~~

在slave2.hadoop上做slave1.hadoop相同操作

[hadoop@slave2 ~]$ cd apache-flume-1.4.0-bin/

[hadoop@slave2 apache-flume-1.4.0-bin]$ flume-ng agent -c conf  -f conf/flume.conf -n master  -Dflume.root.logger=INFO,console


[hadoop@slave2 ~]$ ./works/test.sh &

[1] 4953


[hadoop@slave2 works]$ hdfs  dfs   -ls /flume

Found 6 items

drwxr-xr-x   - hadoop supergroup          0 2014-05-20 16:49 /flume/172.20.105.133_server

drwxr-xr-x   - hadoop supergroup          0 2014-05-20 16:59 /flume/172.20.105.135_server

[hadoop@slave2 works]$ hdfs  dfs   -ls /flume/172.20.105.133_server

Found 1 items

drwxr-xr-x   - hadoop supergroup          0 2014-05-20 17:03 /flume/172.20.105.133_server/20140520_date

[hadoop@slave2 works]$ hdfs  dfs   -ls /flume/172.20.105.133_server/20140520_date

Found 12 items

-rw-r--r--   2 hadoop supergroup   40138292 2014-05-20 16:56 /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779172

-rw-r--r--   2 hadoop supergroup   40138367 2014-05-20 16:56 /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779173

-rw-r--r--   2 hadoop supergroup   40138292 2014-05-20 16:57 /flume/172.20.105.133_server/20140520_date/FlumeData.1400629779174



你可能感兴趣的:(hadoop2.2.0,flume-ng1.4.0)