flume 负载均配置

Flume的负载配置通过slink的group来实现,每次按照一定的算法选择slink输出到指定的地方,如果文件输出量很大的情况下负载均衡还是很有必要的,通过多通道输出缓解输出压力。
Flume内置的负载均衡的算法默认是round robin(轮询算法)
文件从主机传到HDFS上。
集群信息如下:
Flume集群采用4台主机
Flumeapp1 load_balance
Flumeapp2 slink1
Flumeapp3 slink2
Flumeapp4 slink3

Load_balance配置如下(文件采用默认的配置文件名conf/flume-conf.properties):

agent1.sources=source1
agent1.sinks=sink1 sink2 sink3
agent1.channels = channel1

source

agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /e3base/spooldir

配置原文件中与目标文件名相同

agent1.sources.source1.basenameHeader=true
agent1.sources.source1.basenameHeaderKey=fileName

sink group

agent1.sinkgroups=group1
agent1.sinkgroups.group1.sinks=sink1 sink2 sink3
agent1.sinkgroups.group1.processor.type=load_balance
agent1.sinkgroups.group1.processor.backoff=true
agent1.sinkgroups.group1.processor.selector=round_robin

sink1

agent1.sinks.sink1.type=avro
agent1.sinks.sink1.hostname=134.32.50.13
agent1.sinks.sink1.port=21000

sink2

agent1.sinks.sink2.type=avro
agent1.sinks.sink2.hostname=134.32.50.14
agent1.sinks.sink2.port=21000

sink3

agent1.sinks.sink3.type=avro
agent1.sinks.sink3.hostname=134.32.152.49
agent1.sinks.sink3.port=21000

channel

agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity=100

bind

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink2.channel = channel1
agent1.sinks.sink3.channel = channel1

Flumeapp2~Flumeapp4配置相同如下(文件采用默认的配置文件名conf/flume-conf.properties):

agent1.sources=source1
agent1.channels=channel1
agent1.sinks = sink1

source

agent1.sources.source1.type=avro
agent1.sources.source1.bind= 134.32.152.49
agent1.sources.source1.port=21000
agent1.sources.source1.basenameHeader=true
agent1.sources.source1.basenameHeaderKey=filename

channels

agent1.channels.channel1.type=memory
agent1.channels.channel1.capacity=1000
agent1.channels.channel1.transactionCapacity=100

sinks

agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=hdfs://drmcluster/test_bak/flume/
agent1.sinks.sink1.hdfs.filePrefix=%{fileName}
agent1.sinks.sink1.hdfs.fileType=DataStream
agent1.sinks.sink1.hdfs.rollCount=0
agent1.sinks.sink1.hdfs.rollSize=134217728
agent1.sinks.sink1.hdfs.rollInterval=60
agent1.sinks.sink1.hdfs.writeFormat=Text
agent1.sinks.sink1.hdfs.useLocalTimeStamp=true
agent1.sources.source1.channels=channel1
agent1.sinks.sink1.channel=channel1

agent1.sources=source1
agent1.channels=channel1
agent1.sinks = sink1
agnet1.channel=channel1

在集群中四个主机启动 flume-ng
启动命令:(conf,properties尽量用绝对路径,否则会有意想不到的错误)
./flume-ng agent -c /e3base/flume/conf -f /e3base/flume/conf/flume-conf.properties -n agent1 -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
顺序先启sink1~sink3 然后再启动load_balanc3否则主报端口找不到。

你可能感兴趣的:(flume 负载均配置)