flume-ng与kafka整合

一) 安装好flume-ng集群(四台cdh2,cdh3,cdh4 172.17.199.107为远程日志文件所在的主机)

二) 安装好kafka集群(三台cdh1,cdh2,cdh3)

三) 自定义flume kafka sink,打jar包放到flume安装目录的lib目录下

四) 配置flume-ng的配置文件
Cdh2的配置文件fm_kfk.conf如下:
producer.sources=avro
producer.sinks=kfk
producer.channels=mem

producer.sources.avro.type=avro
producer.sources.avro.bind=cdh2
producer.sources.avro.port=4141
producer.sources.avro.channels=mem

producer.sinks.kfk.type=org.apache.flume.plugins.KafkaSink(自定义的类)
producer.sinks.kfk.metadata.broker.list=cdh1:9092,cdh2:9092,cdh3:9092
producer.sinks.kfk.partition.key=0
producer.sinks.kfk.partitioner.class=org.apache.flume.plugins.SinglePartition(自定义的类)
producer.sinks.kfk.serializer.class=kafka.serializer.StringEncoder
producer.sinks.kfk.request.required.acks=0
producer.sinks.kfk.max.message.size=1000000
producer.sinks.kfk.producer.type=async
producer.sinks.kfk.custom.encoding=UTF-8
producer.sinks.kfk.custom.topic.name=my_topic
producer.sinks.kfk.channel=mem

producer.channels.mem.type=memory
producer.channels.mem.capacity=1000
producer.channels.mem.transcationCapacity=100
自定义类jar包地址
http://pan.baidu.com/s/15bXLk
Cdh3的配置文件fm_kfk.conf如下:
C1.sources=avro1
C1.sinks=avro2
C1.channels=mem
C1.sources.avro1.type=avro
C1.sources.avro1.bind=cdh3
C1.sources.avro1.port=4142
C1.sources.avro1.channels=mem

C1.sinks.avro2.type=avro
C1.sinks.avro2.hostname=cdh2
C1.sinks.avro2.port=4141
C1.sinks.avro2.channel=mem

C1.channels.mem.type=memory
C1.channels.mem.capacity=1000
C1.channels.mem.transcationCapacity=100
Cdh4的配置文件fm_kfk.conf如下:
C1.sources=avro1
C1.sinks=avro2
C1.channels=mem

C1.sources.avro1.type=avro
C1.sources.avro1.bind=cdh4
C1.sources.avro1.port=4143
C1.sources.avro1.channels=mem

C1.sinks.avro2.type=avro
C1.sinks.avro2.hostname=cdh2
C1.sinks.avro2.port=4143
C1.sinks.avro2.channel=mem

C1.channels.mem.type=memory
C1.channels.mem.capacity=1000
C1.channels.mem.transcationCapacity=100
172.17.199.107的配置文件fm_kfk.conf如下:
A1.sources=dir
A1.sinks=k1 k2
A1.channels=c1

A1.sinkgroups=g1
A1.sinkgroups.g1.sinks=k1 k2
A1.sinkgroups.g1.processor.type=load_balance
A1.sinkgroups.g1.processor.backoff=true
A1.sinkgroups.g1.processor.selector=round_robin

A1.sources.dir.type=spooldir
A1.sources.dir.spoolDir=/Music
A1.sources.dir.fileHeader=false
A1.sources.dir.channels=c1

A1.sinks.k1.type=avro
A1.sinks.k1.hostname=172.17.199.61(cdh3)
A1.sinks.k1.port=4142
A1.sinks.k1.channel=c1

A1.sinks.k2.type=avro
A1.sinks.k2.hostname=172.17.199.62(cdh4)
A1.sinks.k2.port=4143
A1.sinks.k2.channel=c1

A1.channels.c1.type=memory
A1.channels.c1.capacity=1000
A1.channels.c1.transcationCapacity=100
五) 启动kafka集群
分别在cdh1,cdh2,cdh3上执行如下命令:
/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/kafka_2.9.2-0.8.1.1/bin/kafka-server-start.sh config/server.properties

六) 启动flume-ng集群
Cdh2
Flume-ng agent –c /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/flume-ng/conf –f fm_kfk.conf –n producer –Dflume.root.logger=INFO,console
Cdh3
Flume-ng agent –c /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/flume-ng/conf –f fm_kfk.conf –n c1 –Dflume.root.logger=INFO,console
Cdh4
Flume-ng agent –c /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/flume-ng/conf –f fm_kfk.conf –n c1 –Dflume.root.logger=INFO,console
172.17.199.107
Flume-ng agent –c /opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/flume-ng/conf –f fm_kfk.conf –n a1 –Dflume.root.logger=INFO,console
七) 在cdh1上创建一个consume
/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/kafka_2.9.2-0.8.1.1/bin/kafka-console-consumer.sh –zookeeper cdh1:2181,cdh2:2181,cdh3:2181 –from-beginning –topic my_topic
八) 在172.17.199.107的/Music目录下新增加一个文件。并编辑文件的内容
九) 在cdh1上创建的consumer控制台可以看到新增加文件的内容

你可能感兴趣的:(flume,kafka)