From: http://www.iteblog.com/archives/911
个人在运行过程中遇到的问题解决方案是:
As people in cloudera mail list suggest, there are probable reasons of this error:
- The HDFS safemode is turned on. Try to run
hadoop fs -safemode leave
and see if the error goes away.
- Flume and Hadoop versions are mismatched. To check this replace the hadoop-core.jar in flume/lib directory with the one found in hadoop's installation folder.
Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。支持在日志系统中定制各类数据发送方,用于收集数据;同时,
Flume提供对数据进行简单处理,并写到各种数据接受方(比如文本、HDFS、Hbase等)的能力 。
Flume主要有以下几类组件:
(1)、Master: 负责配置及通信管理,是集群的控制器,并支持多master节点;
(2)、Agent: 采集数据,Agent是flume中产生数据流的地方,同时Agent会将产生的数据流传输到Collector;
(3)、Collector: 用于对数据进行聚合(数据收集器),往往会产生一个更大的数据流,然后加载到storage(存储)中 。
简单地来说,就是Agent把采集到的数据定时发送给Collector,Collector接收Agent发送的数据并把数据写到指定的位置(比如文本、HDFS、Hbase等)。
这篇文章主要简单介绍如何部署Flume-0.9.4分布式环境,涉及到三台机器。它们的hostname分别为master,agent,collector。
1、到官网下载Flume-0.9.4,并解压:
1 |
[wyp @master ~]$ wget https: |
2 |
repositories/releases/com/cloudera/flume-distribution/ 0.9 . 4 -cdh4. 0.0 / \ |
3 |
flume-distribution- 0.9 . 4 -cdh4. 0.0 -bin.tar.gz \ |
4 |
[wyp @master ~]$ tar -zxvf flume-distribution- 0.9 . 4 -cdh4. 0.0 -bin.tar.gz |
5 |
[wyp @master ~]$ cd flume- 0.9 . 4 -cdh3u3 |
6 |
[wyp @master flume- 0.9 . 4 -cdh3u3]$ |
2、进入$FLUME_HOMNE/bin目录,将flume-env.sh.template重命名为flume-env.sh,并在flume-env.sh文件里面设置如下变量:
1 |
[wyp @master flume- 0.9 . 4 -cdh3u3]$ cd bin |
2 |
[wyp @master bin]$ cp flume-env.sh.template flume-env.sh |
3 |
[wyp @master bin]$ vim flume-env.sh |
5 |
export FLUME_HOME=/home/q/flume- 0.9 . 4 -cdh3u3 |
6 |
export FLUME_CONF_DIR=$FLUME_HOME/conf |
7 |
export PATH=$PATH:$FLUME_HOME/bin |
8 |
export JAVA_HOME=/usr/lib/jvm/java- 6 -sun |
3、进入$FLUME_HOMNE/conf目录,将flume-site.xml.template重命名为flume-site.xml,并修改flume-site.xml配置文件
01 |
[wyp @master bin]$ cd ../conf |
02 |
[wyp @master conf]$ cp flume-site.xml.template flume-site.xml |
03 |
[wyp @master conf]$ vim flume-site.xml |
05 |
<name>flume.master.servers</name> |
07 |
<description>This is the address for the config servers status |
13 |
<name>flume.collector.output.format</name> |
15 |
<description>The output format for the data written by a Flume |
16 |
collector node. There are several formats available: |
17 |
syslog - outputs events in a syslog-like format |
18 |
log4j - outputs events in a pattern similar to Hadoop's log4j pattern |
19 |
raw - Event body only. This is most similar to copying a file but |
20 |
does not preserve any uniqifying metadata like host/timestamp/nanos. |
21 |
avro - Avro Native file format. Default currently is uncompressed. |
22 |
avrojson - this outputs data as json encoded by avro |
23 |
avrodata - this outputs data as a avro binary encoded data |
24 |
debug - used only for debugging |
29 |
<name>flume.collector.roll.millis</name> |
31 |
<description>The time (in milliseconds) |
32 |
between when hdfs files are closed and a new file is opened |
38 |
<name>flume.agent.logdir.maxage</name> |
40 |
<description> number of milliseconds before a local log file is |
41 |
considered closed and ready to forward. |
46 |
<name>flume.agent.logdir.retransmit</name> |
48 |
<description>The time (in milliseconds) before a sent event is |
49 |
assumed lost and needs to be retried in end-to-end reliability |
50 |
mode again. This should be at least 2x the |
51 |
flume.collector.roll.millis. |
4、将配置好的Flume整个文件夹打包,并发送到agent和collector的机器上:
1 |
[wyp @master ~]$ tar -zcvf flume- 0.9 . 4 -cdh3u3.tar.gz flume- 0.9 . 4 -cdh3u3 |
2 |
[wyp @master ~]$ scp flume- 0.9 . 4 -cdh3u3.tar.gz agent:/home/wyp |
3 |
[wyp @master ~]$ scp flume- 0.9 . 4 -cdh3u3.tar.gz collector:/home/wyp |
5、分别在agent和collector机器上解压上述包,并在master,agent和collector机器上分别启动以下进程:
1 |
[wyp @master ~]$ $FLUME_HOME/bin/flume master |
3 |
[wyp @agent ~]$ $FLUME_HOME/bin/flume node_nowatch –n agent |
5 |
[wyp @collector ~]$ $FLUME_HOME/bin/flume node_nowatch –n collector |
这样master机器就充当master角色;agent 机器充当agent角色;collector机器充当collector角色。
6、打开http://master:35871,看看能否进去,并看到agent和collector进程成功启动,则说明Flume安装完成!