Flume中Multiplexing Channel Selector(分类选择器)的使用

1.需求: 在102机器启动agent1(netcatsource—memorychannel—avrosink)

​ agent2(execsource—memorychannel—avrosink)

​ 在103机器启动agent3( avrosouce---- 2 memorychannel—2sink(loggersink,hdfssink))


​ loggersink只写出来自agent1的数据!

​ hdfssink只写出,来自agent2的数据!

Flume中Multiplexing Channel Selector(分类选择器)的使用_第1张图片

2.所需组件
2.1 Multiplexing Channel Selector
Multiplexing Channel Selector: 将event分类到不同的channel!

​ 如何分类: 固定根据配置读取event header中指定key的value,根据value的映射,分配到不同的channel!

配置:

selector.type replicating multiplexing
selector.header flume.selector.header 默认读取event中的header的名称
selector.default 默认分配到哪个channel
selector.mapping.* 自定义的映射规则

示例:

a1.sources = r1
a1.channels = c1 c2 c3 c4
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = state
a1.sources.r1.selector.mapping.CZ = c1
a1.sources.r1.selector.mapping.US = c2 c3
a1.sources.r1.selector.default = c4

2.2 Static Interceptor

​ Static Interceptor允许用户向event添加一个静态的key-value!

type static
preserveExisting true If configured header already exists, should it be preserved - true or false
key key key的名称
value value value的值

示例:

a1.sources = r1
a1.channels = c1
a1.sources.r1.channels =  c1
a1.sources.r1.type = seq
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = datacenter
a1.sources.r1.interceptors.i1.value = NEW_YORK

配置如下
agent1:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop102
a1.sources.r1.port = 44444

#配置拦截器
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = mykey
a1.sources.r1.interceptors.i1.value = agent1
a1.sources.r1.interceptors.i2.type = timestamp

# 配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname=hadoop103
a1.sinks.k1.port=12345

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000

# 绑定和连接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

agent2

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = exec
a1.sources.r1.command=tail -f /home/atguigu/hello.txt

#配置拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = mykey
a1.sources.r1.interceptors.i1.value = agent2

# 配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname=hadoop103
a1.sinks.k1.port=12345

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000

# 绑定和连接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

agent3

a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# 配置source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop103
a1.sources.r1.port = 12345

#配置channel 选择器
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = mykey
a1.sources.r1.selector.mapping.agent1 = c2
a1.sources.r1.selector.mapping.agent2 = c1

# 配置sink
a1.sinks.k2.type = logger

# 配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop102:9000/flume/%Y%m%d/%H%M
#上传文件的前缀
a1.sinks.k1.hdfs.filePrefix = logs-
#滚动目录 一分钟滚动一次目录
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = minute
#是否使用本地时间戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#配置文件滚动
a1.sinks.k1.hdfs.rollInterval = 30
a1.sinks.k1.hdfs.rollSize = 134217700
a1.sinks.k1.hdfs.rollCount = 0
#使用文件格式存储数据
a1.sinks.k1.hdfs.fileType=DataStream 

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000

# 绑定和连接组件
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

你可能感兴趣的:(flume)