说明:此文特为初次使用cdh上flume,并且对flume有一定认识的同学参考使用,具体请参考官网:
Apache Flume™
环境:centos7.3 1708 ,cdh 5.14.2
看图:
图一
图二
图三
图四
图片五
图片六
在这里启动一下flume
图片七
图片八
默认配置文件配置了以netcat(网络打印输出)作为source,以内存memery作为channel,以logger作为sink输出到日志文件中的一个简单样例配置。
配置如下(如果是做flume的安装测试,无需改动该配置):
# Please paste flume.conf here. Example:
# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type = netcat
tier1.sources.source1.bind = 127.0.0.1
tier1.sources.source1.port = 9999
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type = memory
tier1.sinks.sink1.type = logger
tier1.sinks.sink1.channel = channel1
# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100
agent的名字是tier1
source是source1
channel是channel1
sink是sink1
source的类型是netcat(来自网络的屏幕输出)
监听的网络地址是127.0.0.1本地
监听端口是 9999
source输出给channel1
使用memory作为channel1
channel1输出给sink1
sink1的类型是logger(日志)
最后一行是规定channel1每次的缓存能力是100
到这里,一切准备就绪了
下面开始测试:
在cdh04机器中,(也是上述安装了flume,和作了配置的机器),使用telnet工具连接到127.0.0.1(或则localhost) 9999端口(上述配置中source绑定的监听端口)【如果没有安装telnet,参考后面的telnet安装说明】
telnet localhost 9999
使用telnet连接到localhost本主机
出现Escape character is ‘^]’.后说明连接就绪
我们随意发送一些东西:
HELLO------------------
回车
如下:
telnet localhost 9999
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HELLO------------------
OK
日志位置:
找到此位置,tail -100 flume-cmf-flume-AGENT-cdh04.log
找到
2018-08-16 14:21:05,100 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: channel1 started
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Sink sink1
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Source source1
2018-08-16 14:21:05,601 INFO org.apache.flume.source.NetcatSource: Source starting
2018-08-16 14:21:05,602 INFO org.apache.flume.source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:9999]
2018-08-16 14:21:05,603 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2018-08-16 14:21:05,604 INFO org.mortbay.log: Started [email protected]:41414
2018-08-16 16:03:25,948 INFO org.apache.flume.sink.LoggerSink: Event: { headers:{} body: 48 45 4C 4C 4F 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D HELLO----------- }
至此说明flume安装没问题了,可以使用了。
sudo yum -y install telnet-0.17-64.el7.x86_64
按照如下配置修改flume的配置文件即可
# Please paste flume.conf here. Example:
# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources = source1
#tier1.sources = avro-source1
tier1.channels = channel1
tier1.sinks = sink1
# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type = netcat
tier1.sources.source1.bind = 127.0.0.1
tier1.sources.source1.port = 9999
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type = memory
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
#tier1.sources.avro-source1.channels = ch1
#tier1.sources.avro-source1.type = avro
#tier1.sources.avro-source1.bind = 0.0.0.0
#tier1.sources.avro-source1.port = 41414
#tier1.sources.avro-source1.threads = 5
#define source monitor a file
#tier1.sources.avro-source1.type = exec
#tier1.sources.avro-source1.shell = /bin/bash -c
#tier1.sources.avro-source1.command = tail -n +0 -F cdh03:/home/d2
#tier1.sources.avro-source1.channels = channel1
#tier1.sources.avro-source1.threads = 5
# tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.channel = channel1
# Describe the sink
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /flume/
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.filePrefix=test_flume
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.rollInterval=0
# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100