USE OF FLUME-HOMEWORK

hadoop001

技术选型


exec source + memory channel + avro sink
avro source + memory channel + logger sink

exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = 192.168.217.130
exec-memory-avro.sinks.avro-sink.port = 44444

exec-memory-avro.channels.memory-channel.type = memory
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

hadoop002

avro-memory-logger.conf

avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

avro-memory-logger.sources.avro-source.type = avro

avro-memory-logger.sources.avro-source.bind = 192.168.217.130
avro-memory-logger.sources.avro-source.port = 44444

avro-memory-logger.sinks.logger-sink.type = logger

avro-memory-logger.channels.memory-channel.type = memory

avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel

hadoop003

avro-memory-logger.conf

avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

avro-memory-logger.sources.avro-source.type = avro

avro-memory-logger.sources.avro-source.bind = 192.168.217.131
avro-memory-logger.sources.avro-source.port = 44444

avro-memory-logger.sinks.logger-sink.type = logger

avro-memory-logger.channels.memory-channel.type = memory

avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel

先启动avro-memory-logger
flume-ng agent
–name avro-memory-logger
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/avro-memory-logger.conf
-Dflume.root.logger=INFO,console

flume-ng agent
–name exec-memory-avro
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/exec-memory-avro.conf
-Dflume.root.logger=INFO,console

作业实践一:replicating selector 运用场景:selector 将event复制,分发给所有下游节点

1.Flume Agent配置

vim replicating-selector-agent.conf

# Name the components on this agent

replicating-selector-agent.sources=r1
replicating-selector-agent.sinks = k1 k2
replicating-selector-agent.channels = c1 c2

# http source, with replicating selector 

replicating-selector-agent.sources.r1.type = http
replicating-selector-agent.sources.r1.port = 6666
replicating-selector-agent.sources.r1.bind = master
replicating-selector-agent.sources.r1.selector.type = replicating

# Describe the sink

replicating-selector-agent.sinks.k1.type = avro
replicating-selector-agent.sinks.k1.hostname = slave1 # bind to remote host,RPC
replicating-selector-agent.sinks.k1.port = 6666

replicating-selector-agent.sinks.k2.type = avro
replicating-selector-agent.sinks.k2.hostname = slave2 # bind to remote host,PRC
replicating-selector-agent.sinks.k2.port = 6666

# 2 channels in selector test 

replicating-selector-agent.channels.c1.type = memory
replicating-selector-agent.channels.c1.capacity = 1000
replicating-selector-agent.channels.c1.transactionCapacity = 100

replicating-selector-agent.channels.c2.type = memory
replicating-selector-agent.channels.c2.capacity = 1000
replicating-selector-agent.channels.c2.transactionCapacity = 100

# bind source ,sink to channels

replicating-selector-agent.sources.r1.channels = c1 c2
replicating-selector-agent.sinks.k1.channel = c1
replicating-selector-agent.sinks.k2.channel = c2

2.Collector1配置

# 01 specify agent,source,sink,channel

replicating-selector-agent.sources = r1
replicating-selector-agent.sinks = k1
replicating-selector-agent.channels = c1

# 02 avro source,connect to local port 6666

replicating-selector-agent.sources.r1.type = avro
replicating-selector-agent.sources.r1.bind = slave1
replicating-selector-agent.sources.r1.port = 6666

# 03 logger sink

replicating-selector-agent.sinks.k1.type = logger

# 04 channel,memory

replicating-selector-agent.channels.c1.type = memory
replicating-selector-agent.channels.c1.capacity = 1000
replicating-selector-agent.channels.c1.transactionCapacity = 100

# 05 bind source,sink to channel

replicating-selector-agent.sources.r1.channels = c1
replicating-selector-agent.sinks.k1.channel = c1

3.Collector2配置

# 01 specify agent,source,sink,channel

replicating-selector-agent.sources = r1
replicating-selector-agent.sinks = k1
replicating-selector-agent.channels = c1

# 02 avro source,connect to local port 6666

replicating-selector-agent.sources.r1.type = avro
replicating-selector-agent.sources.r1.bind = slave2
replicating-selector-agent.sources.r1.port = 6666

# 03 logger sink

replicating-selector-agent.sinks.k1.type = logger

# 04 channel,memory

replicating-selector-agent.channels.c1.type = memory
replicating-selector-agent.channels.c1.capacity = 1000
replicating-selector-agent.channels.c1.transactionCapacity = 100

# 05 bind source,sink to channel

replicating-selector-agent.sources.r1.channels = c1
replicating-selector-agent.sinks.k1.channel = c1

multiplexing selector

multiplexing selector根据header中给定的key的不同取值,决定分发event给哪个channel,可设置
默认发送channel

1. Flume Agent配置
 # Name the components on this agent  

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# http source,with multiplexing selector 

a1.sources.r1.type = http
a1.sources.r1.bind = master
a1.sources.r1.port = 6666

# for header with key country
# send to c1 if country's value is china
# send to c2 if country's value is singapore

…a1.sources.r1.selector.type= multiplexing
a1.sources.r1.selector.header= country
a1.sources.r1.selector.mapping.china = c1
a1.sources.r1.selector.mapping.singapore = c2
a1.sources.r1.selector.default= c1 …

# Describe the sink  

a1.sinks.k1.type = avro
a1.sinks.k1.hostname = slave1 # bind to remote host, RPC
a1.sinks.k1.port = 6666

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = slave2 # bind to remote host, RPC
a1.sinks.k2.port = 6666

# 2 channels, for selection

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# bind source,sink to channels

a1.sources.r1.channels= c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

2.Collector1配置

# 01 specify agent,source,sink,channel

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 02 avro source,connect to local port 6666

a1.sources.r1.type = avro
a1.sources.r1.bind = slave1
a1.sources.r1.port = 6666

# 03 logger sink

a1.sinks.k1.type = logger

# 04 channel,memory

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 05 bind source,sink to channel

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.Collector2配置

# 01 specify agent,source,sink,channel

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 02 avro source,connect to local port 6666

a1.sources.r1.type = avro
a1.sources.r1.bind = slave2
a1.sources.r1.port = 6666

# 03 logger sink

a1.sinks.k1.type = logger

# 04 channel,memory

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 05 bind source,sink to channel

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3 验证

验证思路:
 Agent端通过curl -X POST -d 'json数据' 模拟HTTP请求,Agent      Souce将其转换为event,并根据header中key为country的不同value值,进行分发
 value为china,则分发给C1,最终到达Collecotor1; value为singapore,      则分发给C2,最终到达Collector2; 其他取值,则分发给默认通道C1

A source instance can specify multiple channels, but a sink instance can only specify one channel. The format is as follows:

作业2:配置multiplexing channel

file-channel-->hdfs-sink
memory-channel-->logger-sink

# Describe the agent

taildir-multiplexing-agent.sources = taildir-source
taildir-multiplexing-agent.channels = file-channel memory-channel
taildir-multiplexing-agent.sinks = hdfs-sink logger-sink

//应该通过配置拦截器,还没配置好

# Describe/configure the source

taildir-multiplexing-agent.sources.taildir-source.type=TAILDIR
taildir-multiplexing-agent.sources.taildir-source.filegroups = ruoze xingxing Jzong
taildir-multiplexing-agent.sources.taildir-source.filegroups.ruoze = /home/hadoop/data1/test_spool_source/1.log
taildir-multiplexing-agent.sources.taildir-source.filegroups.xingxing = /home/hadoop/data1/test_spool_source/2.log
taildir-multiplexing-agent.sources.taildir-source.filegroups.Jzong = /home/hadoop/data1/test_spool_source/3.log
taildir-multiplexing-agent.sources.taildir-source.headers.ruoze.name=CA
taildir-multiplexing-agent.sources.taildir-source.headers.xingxing.name=AZ
taildir-multiplexing-agent.sources.taildir-source.headers.Jzong.name=NY
taildir-multiplexing-agent.sources.taildir-source.positionFile=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/taildir_position.json

taildir-multiplexing-agent.sources.taildir-source.selector.type = multiplexing
taildir-multiplexing-agent.sources.taildir-source.selector.header = State
taildir-multiplexing-agent.sources.taildir-source.selector.mapping.CA = memory-channel
taildir-multiplexing-agent.sources.taildir-source.selector.mapping.AZ = file-channel
taildir-multiplexing-agent.sources.taildir-source.selector.mapping.NY = memory-channel file-channel
#taildir-multiplexing-agent.sources.taildir-source.selector.default = memory-channel

# Use a channel which buffers events in file/memory

taildir-multiplexing-agent.channels.file-channel.type=file
taildir-multiplexing-agent.channels.file-channel.checkpointDir=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/checkpoint
taildir-multiplexing-agent.channels.file-channel.dataDirs=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin/data

taildir-multiplexing-agent.channels.memory-channel.type=memory

# Describe the sink

taildir-multiplexing-agent.sinks.logger-sink.type = logger

taildir-multiplexing-agent.sinks.hdfs-sink.type = hdfs
taildir-multiplexing-agent.sinks.hdfs-sink.hdfs.path=hdfs://192.168.2.65:9000/flumehomework
taildir-multiplexing-agent.sinks.hdfs-sink.hdfs.fileType=DataStream

# Bind the source and sink to the channel

taildir-multiplexing-agent.sources.taildir-source.channels = file-channel memory-channel
taildir-multiplexing-agent.sinks.logger-sink.channel = memory-channel
taildir-multiplexing-agent.sinks.hdfs-sink.channel = file-channel

hadoop001 cd $FLUME_HOME/config

vim multiplexing-selector.conf

mul

# Name the components on this agent,whose name is mul 

mul.sources = r1
mul.sinks = k1 k2
mul.channels = c1 c2

# http source,with multiplexing selector 

mul.sources.r1.type = http
mul.sources.r1.bind = 192.168.217.129
mul.sources.r1.port = 6666

# for header with key country
# send to c1 if country's value is china
# send to c2 if country's value is singapore

mul.sources.r1.selector.type= multiplexing
mul.sources.r1.selector.header= country
mul.sources.r1.selector.mapping.china = c1
mul.sources.r1.selector.mapping.singapore = c2
mul.sources.r1.selector.default= c1

# Describe the sink

mul.sinks.k1.type = avro
mul.sinks.k1.hostname = 192.168.217.130 # bind to remote host, RPC
mul.sinks.k1.port = 6666

mul.sinks.k2.type = avro
mul.sinks.k2.hostname = 192.168.217.131 # bind to remote host, RPC
mul.sinks.k2.port = 6666

# 2 channels, for selection

mul.channels.c1.type = memory
mul.channels.c1.capacity = 1000
mul.channels.c1.transactionCapacity = 100

mul.channels.c2.type = memory
mul.channels.c2.capacity = 1000
mul.channels.c2.transactionCapacity = 100

# bind source,sink to channels

mul.sources.r1.channels= c1 c2
mul.sinks.k1.channel = c1
mul.sinks.k2.channel = c2

hadoop002

# 01 specify agent,source,sink,channel

mul.sources = r1
mul.sinks = k1
mul.channels = c1

# 02 avro source,connect to local port 6666

mul.sources.r1.type = avro
mul.sources.r1.bind = 192.168.217.130
mul.sources.r1.port = 6666

# 03 logger sink

mul.sinks.k1.type = logger

# 04 channel,memory

mul.channels.c1.type = memory
mul.channels.c1.capacity = 1000
mul.channels.c1.transactionCapacity = 100

# 05 bind source,sink to channel

mul.sources.r1.channels = c1
mul.sinks.k1.channel = c1

hadoop003

# 01 specify agent,source,sink,channel

mul.sources = r1
mul.sinks = k1
mul.channels = c1

# 02 avro source,connect to local port 6666

mul.sources.r1.type = avro
mul.sources.r1.bind = 192.168.217.131
mul.sources.r1.port = 6666

# 03 logger sink

mul.sinks.k1.type = logger

# 04 channel,memory

mul.channels.c1.type = memory
mul.channels.c1.capacity = 1000
mul.channels.c1.transactionCapacity = 100

# 05 bind source,sink to channel

mul.sources.r1.channels = c1
mul.sinks.k1.channel = c1

hadoop002 avro-memory-logger-mul

先启动avro-memory-logger-mul.conf
flume-ng agent
–name mul2
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/avro-memory-logger-mul.conf
-Dflume.root.logger=INFO,console

hadoop003

先启动avro-memory-logger-mul
bin/flume-ng agent
–name mul
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/avro-memory-logger-mul.conf
-Dflume.root.logger=INFO,console

hadoop001

exec-memory-avro-mul.conf

bin/flume-ng agent
–name mul
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/exec-memory-avro-mul.conf
-Dflume.root.logger=INFO,console

Agent端通过curl -X POST -d ‘json数据’ 模拟HTTP请求,Agent Souce将其转换为event,并根据header中key为country的不同value值,进行分发
value为china,则分发给C1,最终到达Collecotor1; value为singapore, 则分发给C2,最终到达Collector2; 其他取值,则分发给默认通道C1

curl -X POST -d ‘[{“headers”:{“country”:“china”},“body”:“china”}]’ http://192.168.217.129:6666

curl -X POST -d ‘[{“headers”:{“country”:“singapore”},“body”:“singapore”}]’ http://192.168.217.129:6666

你可能感兴趣的:(flume,技术,解决问题)