利用Flume-ng进行日志收集

一、介绍

Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。


二、安装环境
agent:

192.168.7.101

hdfs:

192.168.7.70(namenode) 
192.168.7.71(datanode)
192.168.7.72(datanode)
192.168.7.73(datanode)

操作系统:

CentOS 6.3 x86_64

须用到的软件包:

jdk-1.7.0_65-fcs.x86_64
flume-ng-1.5.0
flume-ng-agent-1.5.0
hadoop-2.3.0+cdh5.1.0
cat /etc/hosts
192.168.7.70 cdh1
192.168.7.71 cdh2
192.168.7.72 cdh3
192.168.7.73 cdh4


三、配置flume-ng-agent
1、涉及到的日志文件:

/home/logs/bizservice/bizservice.log
/home/logs/base/base.log
/home/logs/agent/agent.log
/home/logs/thirdser/thirdser.log


2、安装flume-ng-agent

yum -y install flume-ng* jdk-1.7.0_65-fcs.x86_64


3、配置/etc/profile.d/java.sh并生效

echo 'JAVA_HOME=/usr/java/latest' >> /etc/profile.d/java.sh 
echo 'PATH=$JAVA_HOME/bin:$PATH' >> /etc/profile.d/java.sh 
echo 'export JAVA_HOME PATH' >> /etc/profile.d/java.sh 
source /etc/profile.d/java.sh


4、利用template.conf、flume.sh生成/etc/flume-ng/conf/flume.conf

cat /tmp/template.conf

# Name the  components on this agent
aa.sources =  rr
aa.sinks =  kk
aa.channels  = cc

#  Describe/configure the source
aa.sources.rr.type  = exec
aa.sources.rr.command  = tail -F AGENT1
aa.sources.rr.channels  = cc
aa.sources.rr.bind  = 0.0.0.0
aa.sources.rr.port  = 4141


# Describe  the sink
aa.sinks.kk.type  = hdfs
aa.sinks.kk.channel  = cc
aa.sinks.kk.hdfs.path  =  hdfs://cdh1:8020/flume/AGENT2/%y-%m-%d/%H%M/%S
aa.sinks.kk.hdfs.filePrefix  = AGENT2%{host}
aa.sinks.kk.hdfs.round  = true
aa.sinks.kk.hdfs.roundValue  = 10
aa.sinks.kk.hdfs.roundUnit  = minute
aa.sinks.kk.hdfs.useLocalTimeStamp = true  

# Use a  channel which buffers events in memory
aa.channels.cc.type  = memory
aa.channels.cc.capacity  = 1000
aa.channels.cc.transactionCapacity  = 100
# Bind the  source and sink to the channel
aa.sources.rr.channels  = cc
aa.sinks.kk.channel  = cc
cat flume.sh
#!/bin/bash
source /etc/profile;
cd /etc/flume-ng/conf/
OBJECT=`find
 /home/logs/ -name *.log -type f |egrep 
"bizservice.log|base.log|agent.log|thirdser.log"|awk -F '/' '{print 
$NF}'|sed 's/.log//g'`
find /home/logs/ -name *.log -type f |egrep "bizservice.log|base.log|agent.log|thirdser.log"|tr ' ' '\n' > /tmp/TEMP
rm -f flume.conf
for I in $OBJECT
do
    \cp -rf /tmp/template.conf "$I".conf
    PATHE=`grep "$I" /tmp/TEMP`
    sed -i 's/aa/'$I'_a1/g' "$I".conf
    sed -i 's/rr/'$I'_r1/g' "$I".conf
    sed -i 's/kk/'$I'_k1/g' "$I".conf
    sed -i 's/cc/'$I'_c1/g' "$I".conf
    sed -i 's#AGENT1#'$PATHE'#g' "$I".conf
    sed -i 's/AGENT2/'$I'/g' "$I".conf
    cat  "$I".conf >> flume.conf
    rm -f "$I".conf
done
rm -f TEMP


5、修改/etc/init.d/flume-ng-agent

DEFAULT_FLUME_AGENT_NAME=`cat /etc/flume-ng/conf/flume.conf|grep a1|awk -F . '{print $1}'|sort -u|tr '\n' ' '`

        for FLUME_AGENT in $FLUME_AGENT_NAME
        do
 
 /bin/su -s /bin/bash -c "/bin/bash -c 'echo \$\$ >${FLUME_PID_FILE} 
&& exec ${EXEC_PATH} agent --conf $FLUME_CONF_DIR --conf-file 
$FLUME_CONF_FILE --name $FLUME_AGENT 
>>${FLUME_LOG_DIR}/flume.init.log 2>&1' &" $FLUME_USER
        done


6、启动flume-ng-agent

/etc/init.d/flume-ng-agent start


7、查看日志/var/log/flume-ng/flume.log,显示已写入数据到hdfs中。

03 九月 2014 14:09:39,445 INFO  [hdfs-bizservice_k1-call-runner-9] 
(org.apache.flume.sink.hdfs.BucketWriter$8.call:673)  - Renaming 
hdfs://cdh1:8020/flume/bizservice/14-09-03/1400/00/bizservice.1409724409419.tmp
 to 
hdfs://cdh1:8020/flume/bizservice/14-09-03/1400/00/bizservice.1409724409419
03
 九月 2014 14:09:39,470 INFO  
[SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating 
hdfs://cdh1:8020/flume/bizservice/14-09-03/1400/00/bizservice.1409724409420.tmp


本文出自 “让一切随风” 博客,谢绝转载!

你可能感兴趣的:(log,Flume,BigData,Analysis)