Flume是Cloudera提供的日志收集系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。
Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。
2.1 安装
下载: http://archive.cloudera.com/cdh/3/ flume-0.9.0+1.tar.gz
接着解压.暂时用$flume代表解压路径.
用户文档:http://archive.cloudera.com/cdh/3/flume/UserGuide.html
下载: http://archive.cloudera.com/cdh/3/ zookeeper-3.3.1.tar.gz
安装zookeeper
yum install hadoop-zookeeper –y
yum install hadoop-zookeeper-server –y
修改/zookeeper-3.3.1/conf/ zoo_sample.cfg重命名为zoo.cfg
执行如下命令:
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.3.1
export FLUME_HOME=/home/hadoop/flume-0.9.0+1
export PATH=.:$FLUME_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
执行>flume
输出如下:
usage: flume command [args...]
commands include:
dump Takes a specified source and dumps to console
node Start a Flume node/agent (with watchdog)
master Start a Flume Master server (with watchdog)
version Dump flume build version information
node_nowatch Start a flume node/agent (no watchdog)
master_nowatch Start a Flume Master server (no watchdog)
class
ex: flume com.cloudera.flume.agent.FlumeNode
classpath Dump the classpath used by the java executables
shell Start the flume shell
启动flume的master节点执行:bin/flume master
通过flume打开文件
输入命令
$ flume dump 'tail("/home/hadoop/log/bb.txt")'
输出:
可打开http://10.1.27.30:35871/flumemaster.jsp 即可看到整理节点的情况
从上面URL打开的选项卡config,输入节点配置,然后点提交查询内容
如下:
Source为数据源,可有多种输入源,sink为接收器,当启动master节点时,会把文件写入到hdsf里
启动配置好的节点:bin/flume node –n master
通过flume读取syslog-ng
分别启动节点host和collector节点
3.附录:
Flume Event
Sources console
Stdin console
text("filename")
One shot text file source. One line is one event
tail("filename")
Similar to Unix’s tail -F. One line is one event. Stays open for more data and follows filename if file rotated.
multitail("file1"[, "file2"[, …]])
Similar to tail source but follows multiple files.
asciisynth(msg_count,msg_size)
A source that synthetically generates msg_count random messages of size msg_size. This converts all characters into printable ASCII characters.
syslogUdp(port)
Syslog over UDP port, port. This is syslog compatible.
syslogTcp(port)
Syslog over TCP port, port. This is syslog-ng compatible.
Flume Event Sinks
null |
Null sink. Events are dropped. |
console[("format")] |
Console sink. Display to console’s stdout. The "format" argument is optional and defaults to the "debug" output format. |
text("txtfile"[,"format"]) |
Textfile sink. Write the events to text file txtfile using output format "format". The default format is "raw" event bodies with no metadata. |
dfs("dfsfile") |
DFS seqfile sink. Write serialized Flume events to a dfs path such as hdfs://namenode/file or file:///file in Hadoop’s seqfile format. Note that because of the hdfs write semantics, no data for this sink write until the sink is closed. |
syslogTcp("host",port) |
Syslog TCP sink. Forward to events to host on TCP port port in syslog wire format (syslog-ng compatible), or to other Flume nodes setup to listen for syslogTcp. |
默认端口如下:
TCP ports are used in all situations.
node collector port |
flume.collector.port |
35853+ |
node status web server |
flume.node.http.port |
35862+ |
master status web server |
flume.master.http.port |
35871 |
master heartbeat port |
flume.master.heartbeat.port |
35872 |
master admin/shell port |
flume.master.admin.port |
35873 |
master gossip port |
flume.master.gossip.port |
35890 |
master → zk port |
flume.master.zk.client.port |
3181 |
zk → zk quorum port |
flume.master.zk.server.quorum.port |
3182 |
zk → zk election port
|
flume.master.zk.server.election.port |
3183 |