数据采集是大数据分析全流程的重要环节,典型的数据采集工具包括ETL工具、日志采集工具、数据迁移工具等。

   Flume是一个高可用的、高可靠的、分布式的海量日志采集、聚合和传输的系统。

1.安装Flume

下载:http://www.apache.org/dist/flume/

hadoop@dblab:/usr/local$ sudo wget http://www.apache.org/dist/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz

hadoop@dblab:/usr/local$ sudo tar -zxvf apache-flume-1.7.0-bin.tar.gz 

hadoop@dblab:/usr/local$ sudo mv apache-flume-1.7.0-bin ./flume

2.配置环境变量

hadoop@dblab:/usr/local$ sudo vim ~/.bashrc


export FLUME_HOME=/usr/local/flume

export FLUME_CONF_DIR=$FLUME_HOME/conf

export JAVA_HOME=/usr/lib/jvm/default-java

export PATH=$PATH:$HIVE_HOME/bin

hadoop@dblab:/usr/local$ source ~/.bashrc

hadoop@dblab:/usr/local/flume/conf$ mv flume-env.sh.template  flume-env.sh

hadoop@dblab:/usr/local/flume/conf$ sudo vim   flume-env.sh  

#在flume-env.sh文件开头加入如下语句:                    

export JAVA_HOME=/usr/lib/jvm/default-java

3.启动Flume


hadoop@dblab:/usr/local/flume$ cd /usr/local/flume

hadoop@dblab:/usr/local/flume$ ./bin/flume-ng version

错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty

Flume 1.7.0

Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git

Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707

Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016

From source with checksum 0d21b3ffdc55a07e1d08875872c00523

hadoop@dblab:/usr/local/flume$ cd /usr/local/hbase/conf

hadoop@dblab:/usr/local/hbase/conf$ sudo vim hbase-env.sh   

#export HBASE_CLASSPATH=/usr/local/hadoop/conf #注释该行,即解决上述问题

hadoop@dblab:/usr/local/flume$ ./bin/flume-ng version

Flume 1.7.0

Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git

Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707

Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016

From source with checksum 0d21b3ffdc55a07e1d08875872c00523