hadoop001
技术选型
…
exec source + memory channel + avro sink
avro source + memory channel + logger sink
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = 192.168.217.130
exec-memory-avro.sinks.avro-sink.port = 44444
exec-memory-avro.channels.memory-channel.type = memory
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel
…
hadoop002
avro-memory-logger.conf
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = 192.168.217.130
avro-memory-logger.sources.avro-source.port = 44444
avro-memory-logger.sinks.logger-sink.type = logger
avro-memory-logger.channels.memory-channel.type = memory
avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel
hadoop003
avro-memory-logger.conf
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = 192.168.217.131
avro-memory-logger.sources.avro-source.port = 44444
avro-memory-logger.sinks.logger-sink.type = logger
avro-memory-logger.channels.memory-channel.type = memory
avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel
先启动avro-memory-logger
flume-ng agent
–name avro-memory-logger
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/avro-memory-logger.conf
-Dflume.root.logger=INFO,console
flume-ng agent
–name exec-memory-avro
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/exec-memory-avro.conf
-Dflume.root.logger=INFO,console
作业实践一:replicating selector 运用场景:selector 将event复制,分发给所有下游节点
1.Flume Agent配置
vim replicating-selector-agent.conf
# Name the components on this agent
replicating-selector-agent.sources=r1
replicating-selector-agent.sinks = k1 k2
replicating-selector-agent.channels = c1 c2
# http source, with replicating selector
replicating-selector-agent.sources.r1.type = http
replicating-selector-agent.sources.r1.port = 6666
replicating-selector-agent.sources.r1.bind = master
replicating-selector-agent.sources.r1.selector.type = replicating
# Describe the sink
replicating-selector-agent.sinks.k1.type = avro
replicating-selector-agent.sinks.k1.hostname = slave1 # bind to remote host,RPC
replicating-selector-agent.sinks.k1.port = 6666
replicating-selector-agent.sinks.k2.type = avro
replicating-selector-agent.sinks.k2.hostname = slave2 # bind to remote host,PRC
replicating-selector-agent.sinks.k2.port = 6666
# 2 channels in selector test
replicating-selector-agent.channels.c1.type = memory
replicating-selector-agent.channels.c1.capacity = 1000
replicating-selector-agent.channels.c1.transactionCapacity = 100
replicating-selector-agent.channels.c2.type = memory
replicating-selector-agent.channels.c2.capacity = 1000
replicating-selector-agent.channels.c2.transactionCapacity = 100
# bind source ,sink to channels
replicating-selector-agent.sources.r1.channels = c1 c2
replicating-selector-agent.sinks.k1.channel = c1
replicating-selector-agent.sinks.k2.channel = c2
2.Collector1配置
# 01 specify agent,source,sink,channel
replicating-selector-agent.sources = r1
replicating-selector-agent.sinks = k1
replicating-selector-agent.channels = c1
# 02 avro source,connect to local port 6666
replicating-selector-agent.sources.r1.type = avro
replicating-selector-agent.sources.r1.bind = slave1
replicating-selector-agent.sources.r1.port = 6666
# 03 logger sink
replicating-selector-agent.sinks.k1.type = logger
# 04 channel,memory
replicating-selector-agent.channels.c1.type = memory
replicating-selector-agent.channels.c1.capacity = 1000
replicating-selector-agent.channels.c1.transactionCapacity = 100
# 05 bind source,sink to channel
replicating-selector-agent.sources.r1.channels = c1
replicating-selector-agent.sinks.k1.channel = c1
3.Collector2配置
# 01 specify agent,source,sink,channel
replicating-selector-agent.sources = r1
replicating-selector-agent.sinks = k1
replicating-selector-agent.channels = c1
# 02 avro source,connect to local port 6666
replicating-selector-agent.sources.r1.type = avro
replicating-selector-agent.sources.r1.bind = slave2
replicating-selector-agent.sources.r1.port = 6666
# 03 logger sink
replicating-selector-agent.sinks.k1.type = logger
# 04 channel,memory
replicating-selector-agent.channels.c1.type = memory
replicating-selector-agent.channels.c1.capacity = 1000
replicating-selector-agent.channels.c1.transactionCapacity = 100
# 05 bind source,sink to channel
replicating-selector-agent.sources.r1.channels = c1
replicating-selector-agent.sinks.k1.channel = c1
multiplexing selector根据header中给定的key的不同取值,决定分发event给哪个channel,可设置
默认发送channel
1. Flume Agent配置
# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# http source,with multiplexing selector
a1.sources.r1.type = http
a1.sources.r1.bind = master
a1.sources.r1.port = 6666
# for header with key country
# send to c1 if country's value is china
# send to c2 if country's value is singapore
…a1.sources.r1.selector.type= multiplexing
a1.sources.r1.selector.header= country
a1.sources.r1.selector.mapping.china = c1
a1.sources.r1.selector.mapping.singapore = c2
a1.sources.r1.selector.default= c1 …
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = slave1 # bind to remote host, RPC
a1.sinks.k1.port = 6666
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = slave2 # bind to remote host, RPC
a1.sinks.k2.port = 6666
# 2 channels, for selection
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
# bind source,sink to channels
a1.sources.r1.channels= c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
2.Collector1配置
# 01 specify agent,source,sink,channel
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 02 avro source,connect to local port 6666
a1.sources.r1.type = avro
a1.sources.r1.bind = slave1
a1.sources.r1.port = 6666
# 03 logger sink
a1.sinks.k1.type = logger
# 04 channel,memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 05 bind source,sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3.Collector2配置
# 01 specify agent,source,sink,channel
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 02 avro source,connect to local port 6666
a1.sources.r1.type = avro
a1.sources.r1.bind = slave2
a1.sources.r1.port = 6666
# 03 logger sink
a1.sinks.k1.type = logger
# 04 channel,memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 05 bind source,sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3 验证
验证思路:
Agent端通过curl -X POST -d 'json数据' 模拟HTTP请求,Agent Souce将其转换为event,并根据header中key为country的不同value值,进行分发
value为china,则分发给C1,最终到达Collecotor1; value为singapore, 则分发给C2,最终到达Collector2; 其他取值,则分发给默认通道C1
A source instance can specify multiple channels, but a sink instance can only specify one channel. The format is as follows:
作业2:配置multiplexing channel
file-channel-->hdfs-sink
memory-channel-->logger-sink
# Describe the agent
taildir-multiplexing-agent.sources = taildir-source
taildir-multiplexing-agent.channels = file-channel memory-channel
taildir-multiplexing-agent.sinks = hdfs-sink logger-sink
//应该通过配置拦截器,还没配置好
# Describe/configure the source
taildir-multiplexing-agent.sources.taildir-source.type=TAILDIR
taildir-multiplexing-agent.sources.taildir-source.filegroups = ruoze xingxing Jzong
taildir-multiplexing-agent.sources.taildir-source.filegroups.ruoze = /home/hadoop/data1/test_spool_source/1.log
taildir-multiplexing-agent.sources.taildir-source.filegroups.xingxing = /home/hadoop/data1/test_spool_source/2.log
taildir-multiplexing-agent.sources.taildir-source.filegroups.Jzong = /home/hadoop/data1/test_spool_source/3.log
taildir-multiplexing-agent.sources.taildir-source.headers.ruoze.name=CA
taildir-multiplexing-agent.sources.taildir-source.headers.xingxing.name=AZ
taildir-multiplexing-agent.sources.taildir-source.headers.Jzong.name=NY
taildir-multiplexing-agent.sources.taildir-source.positionFile=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/taildir_position.json
taildir-multiplexing-agent.sources.taildir-source.selector.type = multiplexing
taildir-multiplexing-agent.sources.taildir-source.selector.header = State
taildir-multiplexing-agent.sources.taildir-source.selector.mapping.CA = memory-channel
taildir-multiplexing-agent.sources.taildir-source.selector.mapping.AZ = file-channel
taildir-multiplexing-agent.sources.taildir-source.selector.mapping.NY = memory-channel file-channel
#taildir-multiplexing-agent.sources.taildir-source.selector.default = memory-channel
# Use a channel which buffers events in file/memory
taildir-multiplexing-agent.channels.file-channel.type=file
taildir-multiplexing-agent.channels.file-channel.checkpointDir=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/checkpoint
taildir-multiplexing-agent.channels.file-channel.dataDirs=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/data
taildir-multiplexing-agent.channels.memory-channel.type=memory
# Describe the sink
taildir-multiplexing-agent.sinks.logger-sink.type = logger
taildir-multiplexing-agent.sinks.hdfs-sink.type = hdfs
taildir-multiplexing-agent.sinks.hdfs-sink.hdfs.path=hdfs://192.168.2.65:9000/flumehomework
taildir-multiplexing-agent.sinks.hdfs-sink.hdfs.fileType=DataStream
# Bind the source and sink to the channel
taildir-multiplexing-agent.sources.taildir-source.channels = file-channel memory-channel
taildir-multiplexing-agent.sinks.logger-sink.channel = memory-channel
taildir-multiplexing-agent.sinks.hdfs-sink.channel = file-channel
hadoop001 cd $FLUME_HOME/config
vim multiplexing-selector.conf
mul
# Name the components on this agent,whose name is mul
mul.sources = r1
mul.sinks = k1 k2
mul.channels = c1 c2
# http source,with multiplexing selector
mul.sources.r1.type = http
mul.sources.r1.bind = 192.168.217.129
mul.sources.r1.port = 6666
# for header with key country
# send to c1 if country's value is china
# send to c2 if country's value is singapore
mul.sources.r1.selector.type= multiplexing
mul.sources.r1.selector.header= country
mul.sources.r1.selector.mapping.china = c1
mul.sources.r1.selector.mapping.singapore = c2
mul.sources.r1.selector.default= c1
# Describe the sink
mul.sinks.k1.type = avro
mul.sinks.k1.hostname = 192.168.217.130 # bind to remote host, RPC
mul.sinks.k1.port = 6666
mul.sinks.k2.type = avro
mul.sinks.k2.hostname = 192.168.217.131 # bind to remote host, RPC
mul.sinks.k2.port = 6666
# 2 channels, for selection
mul.channels.c1.type = memory
mul.channels.c1.capacity = 1000
mul.channels.c1.transactionCapacity = 100
mul.channels.c2.type = memory
mul.channels.c2.capacity = 1000
mul.channels.c2.transactionCapacity = 100
# bind source,sink to channels
mul.sources.r1.channels= c1 c2
mul.sinks.k1.channel = c1
mul.sinks.k2.channel = c2
hadoop002
# 01 specify agent,source,sink,channel
mul.sources = r1
mul.sinks = k1
mul.channels = c1
# 02 avro source,connect to local port 6666
mul.sources.r1.type = avro
mul.sources.r1.bind = 192.168.217.130
mul.sources.r1.port = 6666
# 03 logger sink
mul.sinks.k1.type = logger
# 04 channel,memory
mul.channels.c1.type = memory
mul.channels.c1.capacity = 1000
mul.channels.c1.transactionCapacity = 100
# 05 bind source,sink to channel
mul.sources.r1.channels = c1
mul.sinks.k1.channel = c1
hadoop003
# 01 specify agent,source,sink,channel
mul.sources = r1
mul.sinks = k1
mul.channels = c1
# 02 avro source,connect to local port 6666
mul.sources.r1.type = avro
mul.sources.r1.bind = 192.168.217.131
mul.sources.r1.port = 6666
# 03 logger sink
mul.sinks.k1.type = logger
# 04 channel,memory
mul.channels.c1.type = memory
mul.channels.c1.capacity = 1000
mul.channels.c1.transactionCapacity = 100
# 05 bind source,sink to channel
mul.sources.r1.channels = c1
mul.sinks.k1.channel = c1
hadoop002 avro-memory-logger-mul
先启动avro-memory-logger-mul.conf
flume-ng agent
–name mul2
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/avro-memory-logger-mul.conf
-Dflume.root.logger=INFO,console
hadoop003
先启动avro-memory-logger-mul
bin/flume-ng agent
–name mul
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/avro-memory-logger-mul.conf
-Dflume.root.logger=INFO,console
hadoop001
exec-memory-avro-mul.conf
bin/flume-ng agent
–name mul
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/exec-memory-avro-mul.conf
-Dflume.root.logger=INFO,console
Agent端通过curl -X POST -d ‘json数据’ 模拟HTTP请求,Agent Souce将其转换为event,并根据header中key为country的不同value值,进行分发
value为china,则分发给C1,最终到达Collecotor1; value为singapore, 则分发给C2,最终到达Collector2; 其他取值,则分发给默认通道C1
curl -X POST -d ‘[{“headers”:{“country”:“china”},“body”:“china”}]’ http://192.168.217.129:6666
curl -X POST -d ‘[{“headers”:{“country”:“singapore”},“body”:“singapore”}]’ http://192.168.217.129:6666