详解scribe+flume搭建高可用的负载均衡日志收集系统入hadoop和kafka

一、系统架构

为增强系统的可靠性,flume系统分为agent层和collector层

agent层是每个每个需要收集日志的主机,有若干台,可自由扩展;每台agent机器运行一个carpenter程序将相应目录下的日志信息发送给本机上的flume source,对应avro sink将数据推送到两台collector(采用均衡负载的方式推送,若其中一台collector故障则全部推送给另一台)

collector层的两台机器收到数据后,按照分配策略把数据发送给hdfs和kafka:

详解scribe+flume搭建高可用的负载均衡日志收集系统入hadoop和kafka_第1张图片

 

二、安装flume

1、 所有运行flume的机器需安装java环境,agent层机器安装scribe,collector层机器安装并配置好hadoop、kafka环境(安装方式见相应文档)

 

2、Flume压缩包直接解压即可使用

tar-zxvf apache-flume-1.6.0-bin.tar.gz  -C/usr/local


3、用flume-ng-kafka-sink-1.6.0.jar替换flume主目录lib文件夹内的flume-ng-kafka-sink-1.6.0.jar

 

4、配置系统环境变量:

vi/etc/profile

添加:

exportFLUME_HOME=/usr/local/apache-flume-1.6.0-bin
exportPATH=$PATH:$FLUME_HOME/bin 


执行:

sorce/etc/profile 

 

5、配置flume主目录conf下的flume-env.sh

添加:

exportJAVA_HOME=/usr/java/jdk1.7.0_79

 

6、运行:

flume-ngversion

出现:

Flume1.6.0
Sourcecode repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision:2561a23240a71ba20bf288c7c2cda88f443c2080
Compiledby hshreedharan on Mon May 11 11:15:44 PDT 2015
Fromsource with checksum b29e416802ce9ece3269d34233baf43f

表示flume安装成功

 

7、在collector机器上创建file channel的数据持久化目录,目录名以配置文件中file channel 指定的两个目录为标准

collector.channels.c1.checkpointDir=/usr/local/apache-flume-1.6.0-bin/fileChannel/checkpoint
collector.channels.c1.dataDir =/usr/local/apache-flume-1.6.0-bin/fileChannel/data

 

三、Flume配置

配置文件存放在flume主目录conf文件夹下

agent配置:

#声明各个节点
agent.sources= r1
agent.sinks= k1 k2
agent.channels= c1

# 配置agent的source,由于需要兼容scribe,此处source类型配置为ScribeSource
# 端口是scribe需要发送的端口
agent.sources.r1.type= org.apache.flume.source.scribe.ScribeSource
agent.sources.r1.port= 5140
agent.sources.r1.channels= c1
 
#配置agent的组策略,k1、k2采用均衡负载的方式发送到两个collector
agent.sinkgroups=g1
agent.sinkgroups.g1.sinks=k1k2
agent.sinkgroups.g1.processor.type=load_balance
agent.sinkgroups.g1.processor.backoff=true
agent.sinkgroups.g1.processor.selector=round_robin
 
#配置sink1,采用avro方式,发送到其中一台collector
agent.sinks.k1.type=avro
agent.sinks.k1.hostname=10.0.3.82
agent.sinks.k1.port=5150
 
#配置sink2,采用avro方式,发送到其中一台collector
agent.sinks.k2.type=avro
agent.sinks.k2.hostname=10.0.3.83
agent.sinks.k2.port=5150
 
# channel采用默认的内存channel,最大缓存20000条,一次最多发送1000条
agent.channels.c1.type= memory
agent.channels.c1.capacity= 20000
agent.channels.c1.transactionCapacity= 10000
 
# 绑定channel与source、sink的关系
agent.sources.r1.channels= c1
agent.sinks.k1.channel= c1
agent.sinks.k2.channel=c1

collector配置:

#声明各个节点
collector.sources= r1
collector.channels= c1 c2
collector.sinks= k1 k2
 
# 定义collector的source,监听本机5150端口,source将数据选择性地发送给c1、c2两个channel
#event的header中category值为flume_hdfs,则推送给c1,值为flume_kafka则推送给c2
# 若event无标识,则默认推送给c1
collector.sources.r1.type= avro
collector.sources.r1.port= 5150
collector.sources.r1.bind= 0.0.0.0
collector.sources.r1.channels= c1 c2
collector.sources.r1.selector.type= multiplexing
collector.sources.r1.selector.header= category
collector.sources.r1.selector.mapping.flume_hdfs= c1
collector.sources.r1.selector.mapping.flume_kafka= c2
collector.sources.r1.selector.default= c1
 
# 定义channel c1、c2
#c1为文件channel,若数据发送失败,会将数据持久化到配置的目录下
#c2为内存channel,数据发送失败超时后将会丢弃
collector.channels.c1.type= file
collector.channels.c1.checkpointDir= /usr/local/apache-flume-1.6.0-bin/fileChannel/checkpoint
collector.channels.c1.dataDir= /usr/local/apache-flume-1.6.0-bin/fileChannel/data
 
collector.channels.c2.type= memory
collector.channels.c2.capacity= 1000
collector.channels.c2.transactionCapacity= 100
 
# 定义sink1,该sink获取c1上的数据并发送到hdfs,路径中将会自动解析category的具体值
collector.sinks.k1.type= hdfs
collector.sinks.k1.channel= c1
collector.sinks.k1.hdfs.path= /quantone/flume/%{category}/10.0.3.82
collector.sinks.k1.hdfs.fileType= DataStream
collector.sinks.k1.hdfs.writeFormat= TEXT
collector.sinks.k1.hdfs.rollInterval= 300
collector.sinks.k1.hdfs.filePrefix= %Y-%m-%d
collector.sinks.k1.hdfs.round= true
collector.sinks.k1.hdfs.roundValue= 5
collector.sinks.k1.hdfs.roundUnit= minute
collector.sinks.k1.hdfs.useLocalTimeStamp= true
#collector.sinks.k1.serializer.appendNewline= false
 
# 定义sink2,该sink获取c2上的数据并发送到kafka
collector.sinks.k2.type= org.apache.flume.sink.kafka.KafkaSink
collector.sinks.k2.channel= c2
collector.sinks.k2.brokerList= 10.0.3.178:9092,10.0.3.179:9092
collector.sinks.k2.requiredAcks= 1
collector.sinks.k2.batchSize= 20

四、启动flume

1、 所有agent机器启动scribe程序,并启动kafka、hadoop

 

2、启动两台flume collector:

bin/flume-ngagent --conf ./conf/ -f conf/collector.conf -Dflume.root.logger=INFO,console -n collector

 

3、启动所有flume agent:

bin/flume-ngagent --conf ./conf/ -f conf/agent.conf -Dflume.root.logger=INFO,console -nagent


你可能感兴趣的:(Flume/Scribe)