课程目录
业务现状分析=>flume概述=>flume架构及核心组件=>flume环境部署=>flume实战
1、业务现状分析
传统从Server到Hadoop处理上存在的问题
1.难以监控
2.IO的读写开销大
3.容错率高,负载均衡差
4.高延时,需隔一段时间启动
2、flume概述
flume官网:http://flume.apache.org/
Flume is a distributed(分布式的), reliable(高可靠的), and available service(高可用的服务) for efficiently collecting(海量收集), aggregating(聚合), and moving(移动系统) large amounts of log data.
Flume是由Cloudera提供的一个分布式、高可靠、高可用的服务,用于分布式的海量日志的高效收集、聚合、移动系统
设计目标
可靠性
扩展性
管理性
业界同类产品的对比
Flume: Cloudera/Apache Java
Scribe: Facebook C/C++ 不再维护
Chukwa: Yahoo/Apache Java 不再维护
Kafka:
Fluentd: Ruby
Logstash: ELK(ElasticSearch,Kibana)
Flume发展史
Cloudera 0.9.2 Flume-OG
flume-728 Flume-NG ==> Apache
2012.7 1.0
2015.5 1.6
~ 1.7
Flume架构及核心组件
安装jdk
下载
解压到~/app
将java配置系统环境变量中: vi ~/.bash_profile
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
export PATH= J A V A H O M E / b i n : JAVA_HOME/bin: JAVAHOME/bin:PATH
source下让其配置生效:source ~/.bash_profile
检测: java -version
安装Flume
下载
解压到~/app
将java配置系统环境变量中: vi ~/.bash_profile
export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin
export PATH= F L U M E H O M E / b i n : FLUME_HOME/bin: FLUMEHOME/bin:PATH
source下让其配置生效 :source ~/.bash_profile
flume-env.sh的配置:export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
检测: flume-ng version
Flume架构及核心组件
使用Flume的关键就是写配置文件
A) 配置Source
B) 配置Channel
C) 配置Sink
D) 把以上三个组件串起来
a1: agent名称
r1: source的名称
k1: sink的名称
c1: channel的名称
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop000
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
查询官网文档
http://flume.apache.org/FlumeUserGuide.html#avro-legacy-source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop000
a1.sources.r1.port = 44444
type:The component type name, needs to be org.apache.flume.source.avroLegacy.AvroLegacySource
host:The hostname or IP address to bind to
port:The port # to listen on
a1.sinks.k1.type = logger
type:The component type name, needs to be logger
a1.channels.c1.type = memory
type:The component type name, needs to be memory
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
注意:一个source可以输出到多个channel,因此上面是channels;而一此只能从channel输出一个到sink,因此下面是channel
步骤:
1.写配置文件
在conf目录下:vi example.conf
将上面代码写入其中
2.启动agent
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console
3.使用telnet进行测试: telnet hadoop000 44444
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/data/data.log
a1.sources.r1.shell = /bin/sh -c
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
channels
type:The component type name, needs to be exec
command:The command to execute
shell:A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc.
步骤:
1.写配置文件
在conf目录下:vi exec-memory-logger.conf
将上面代码写入其中
2.启动agent
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-memory-logger.conf \
-Dflume.root.logger=INFO,console
3.测试:
新打开一个窗口:输入以下内容
[hadoop@hadoop001 data]$ echo hello >> data.log
[hadoop@hadoop001 data]$ echo world >> data.log
[hadoop@hadoop001 data]$ echo welcome >> data.log
原窗口会出现以下变化:
Event: { headers:{} body: 68 65 6C 6C 6F hello }
Event: { headers:{} body: 77 6F 72 6C 64 world }
Event: { headers:{} body: 77 65 6C 63 6F 6D 65 welcome }
技术选项:
exec source + memory channel + avro sink
avro source + memory channel + logger sink
两个配置文件:
exec-memory-avro.conf
# Name the components on this agent
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel
# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c
# Describe the sink
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.bind = hadoo000
exec-memory-avro.sinks.avro-sink.port = 4444
# Use a channel which buffers events in memory
exec-memory-avro.channels.exec-source.type = memory
# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel
avro-memory-logger.conf
# Name the components on this agent
avro-memory-logger.sources = avro source
avro-memory-logger.sinks = logger sink
avro-memory-logger.channels = memory-channel
# Describe/configure the source
avro-memory-logger.sources.avro source.type = avro
avro-memory-logger.sources.avro source.bind = hadoop000
avro-memory-logger.sources.avro source.port = 44444
# Describe the sink
avro-memory-logger.logger sink.type = logger
# Use a channel which buffers events in memory
avro-memory-logger.channels.avro source.type = memory
# Bind the source and sink to the channel
avro-memory-logger.sources.avro source.channels = memory-channel
avro-memory-logger.sinks.logger sink.channel = memory-channel
先启动avro-memory-logger
flume-ng agent \
--name avro-memory-logger \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/avro-memory-logger.conf \
-Dflume.root.logger=INFO,console
再启动exec-memory-avro
flume-ng agent \
--name exec-memory-avro \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-memory-avro.conf \
-Dflume.root.logger=INFO,console
测试:
新打开一个窗口:输入以下内容
[hadoop@hadoop001 data]$ echo hello spark >> data.log
[hadoop@hadoop001 data]$ echo hello hadoop >> data.log
原窗口会出现以下变化:
Event: { headers:{} body: 68 65 6C 6C 6F 20 73 70 61 72 6B hello spark }
Event: { headers:{} body: 68 65 6C 6C 6F 20 68 61 64 6F 6F 70 hello hadoop }