flume学习笔记(一):cdh5.14.2中安装,启动,测试flume

说明:此文特为初次使用cdh上flume,并且对flume有一定认识的同学参考使用,具体请参考官网:
Apache Flume™
环境:centos7.3 1708 ,cdh 5.14.2

1. 在cdh中添加flume服务

看图:
图一
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第1张图片
图二
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第2张图片
图三
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第3张图片
图四
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第4张图片
图片五
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第5张图片
图片六
在这里启动一下flume
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第6张图片
图片七
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第7张图片
图片八
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第8张图片

2.使用默认配置测试flume正常运行

默认配置文件配置了以netcat(网络打印输出)作为source,以内存memery作为channel,以logger作为sink输出到日志文件中的一个简单样例配置。
配置如下(如果是做flume的安装测试,无需改动该配置):

# Please paste flume.conf here. Example:

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources  = source1
tier1.channels = channel1
tier1.sinks    = sink1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type     = netcat
tier1.sources.source1.bind     = 127.0.0.1
tier1.sources.source1.port     = 9999
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type   = memory
tier1.sinks.sink1.type         = logger
tier1.sinks.sink1.channel      = channel1

# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100

agent的名字是tier1
source是source1
channel是channel1
sink是sink1

source的类型是netcat(来自网络的屏幕输出)
监听的网络地址是127.0.0.1本地
监听端口是 9999

source输出给channel1
使用memory作为channel1
channel1输出给sink1
sink1的类型是logger(日志)
最后一行是规定channel1每次的缓存能力是100

到这里,一切准备就绪了

3.

下面开始测试:
在cdh04机器中,(也是上述安装了flume,和作了配置的机器),使用telnet工具连接到127.0.0.1(或则localhost) 9999端口(上述配置中source绑定的监听端口)【如果没有安装telnet,参考后面的telnet安装说明】
telnet localhost 9999
使用telnet连接到localhost本主机
出现Escape character is ‘^]’.后说明连接就绪
我们随意发送一些东西:
HELLO------------------
回车
如下:

telnet localhost 9999

Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HELLO------------------
OK

4. 查看经过flume采集到日志中的情况:

日志位置:
flume学习笔记(一):cdh5.14.2中安装,启动,测试flume_第9张图片
找到此位置,tail -100 flume-cmf-flume-AGENT-cdh04.log
找到
这里写图片描述

2018-08-16 14:21:05,100 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: channel1 started
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Sink sink1
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Source source1
2018-08-16 14:21:05,601 INFO org.apache.flume.source.NetcatSource: Source starting
2018-08-16 14:21:05,602 INFO org.apache.flume.source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:9999]
2018-08-16 14:21:05,603 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2018-08-16 14:21:05,604 INFO org.mortbay.log: Started [email protected]:41414
2018-08-16 16:03:25,948 INFO org.apache.flume.sink.LoggerSink: Event: { headers:{} body: 48 45 4C 4C 4F 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D HELLO----------- }

至此说明flume安装没问题了,可以使用了。

5. 安装telnet

sudo yum -y install telnet-0.17-64.el7.x86_64

6. 将netcat数据通过flume采集到hdfs

按照如下配置修改flume的配置文件即可

# Please paste flume.conf here. Example:

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources  = source1
#tier1.sources  = avro-source1
tier1.channels = channel1
tier1.sinks    = sink1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type     = netcat
tier1.sources.source1.bind     = 127.0.0.1
tier1.sources.source1.port     = 9999
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type   = memory



# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
#tier1.sources.avro-source1.channels = ch1
#tier1.sources.avro-source1.type = avro
#tier1.sources.avro-source1.bind = 0.0.0.0
#tier1.sources.avro-source1.port = 41414
#tier1.sources.avro-source1.threads = 5
 
#define source monitor a file
#tier1.sources.avro-source1.type = exec
#tier1.sources.avro-source1.shell = /bin/bash -c
#tier1.sources.avro-source1.command = tail -n +0 -F cdh03:/home/d2
#tier1.sources.avro-source1.channels = channel1
#tier1.sources.avro-source1.threads = 5
 



# tier1.sinks.sink1.type         = hdfs
tier1.sinks.sink1.channel      = channel1

# Describe the sink
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /flume/
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.filePrefix=test_flume
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.rollInterval=0


# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100
  • 提示:tier1.sinks.sink1.hdfs.path = /flume/这句指定了数据存放到hdfs中的位置,但这里并没有带’hdfs://'这个schame,是因为,在cdh中配置的flume会自动识别配置hdfs的这个schame。当然你加上也不会错。

你可能感兴趣的:(flume,CDH)