flumn的配置启动和关闭

flume 的三大组件
source 对应的是 数据源,有 http, avro ,log等,可以自定义拦截器 interceptor 来做数据缘的分发
channel 是一个持久化存储,或者说是传输到sink的的一种机制。
会把数据放入 内存 memory,WAL等 存储缓存起来。可以修改customChannel来定义
sink 可以写入hdfs,kafka,hive,hbase,looger里。

查看flume 的进程
ps aux|grep flume

以下是阿一个配置例子。其中自定义了source的的拦截器,对log里的日志进行分发。

1 flume 配置 从 log 到 hdfs

cat pingback_sdk_app.conf
p1.sources = r1
p1.sinks = k1 k2
p1.channels = c1 c2

p1.sources.r1.type = avro
p1.sources.r1.bind = w133
p1.sources.r1.port = 33333
p1.sources.r1.interceptors = i1
p1.sources.r1.interceptors.i1.type = com.wutong.flume.LogCollInterceptor$Builder

p1.channels.c1.type=memory
p1.channels.c1.capacity=1000000
p1.channels.c1.transactionCapacity=100000

p1.channels.c2.type=memory
p1.channels.c2.capacity=1000000
p1.channels.c2.transactionCapacity=100000
p1.channels.r1.threads=30

Describe the sink

p1.sinks.k1.type = hdfs
p1.sinks.k1.hdfs.path = hdfs://w112:9000/logs/pingback_logs/sdk_app/%Y/%m/%d
p1.sinks.k1.hdfs.fileType = DataStream
p1.sinks.k1.hdfs.filePrefix = sdk-app
p1.sinks.k1.hdfs.fileSuffix=.log
p1.sinks.k1.hdfs.rollInterval = 3600
p1.sinks.k1.hdfs.rollSize = 0
p1.sinks.k1.hdfs.rollCount = 0
p1.sinks.k1.hdfs.timeZone = GMT+8
p1.sinks.k1.connect-timeout=40000

p1.sinks.k2.type = org.apache.flume.sink.kafka.KafkaSink
p1.sinks.k2.kafka.bootstrap.servers = w112:9092,w133:9092,w189:9092
p1.sinks.k2.kafka.topic =pingback-sdk-app
p1.sinks.k2.kafka.flumeBatchSize = 50
#a1.sinks.k2.requiredAcks=1
p1.sinks.k2.kafka.producer.acks = 1

p1.sources.r1.channels = c1 c2
p1.sinks.k1.channel = c1
p1.sinks.k2.channel = c2

./flume-ng agent --conf …/conf --conf-file …/conf/pingback_sdk_app.conf --name p1 -Dflume.root.logger=INFO,console

nohup ./flume-ng agent --conf …/conf --conf-file …/conf/pingback_sdk_app.conf --name p1 -Dflume.root.logger=INFO,console -n agent &

查询进程: ps -ef|grep java|grep flume|awk ‘{print $2;}’
nohup ./flume-ng agent --conf …/conf --conf-file …/conf/iotlog.conf --name a1 -Dflume.root.logger=INFO,console -n agent &

你可能感兴趣的:(基础架构)