Flume实战 监控一个文件实时采集新增的数据

Fulme的关键是写配置文件

首先要确定Agent 的选型, exec source+ memory channel +logger sink

配置:

# Name the components on this agent
a1.sources = r1  #a1代表agent名称,r1:数据源的名称
a1.sinks = k1    #k1 sink名称
a1.channels = c1  #c1 channel名称

# Describe/configure the source 同的source配置信息请查看官网
a1.sources.r1.type = exec #设置a1中,r1这个source的类型,一个agent可能有多个source
a1.sources.r1.command = tail -F /home/hadoop/data/data.log  #指定linux命令
a1.sources.r1.shell = /bin/sh -c 

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2.在Flume的conf文件目录下:根据上面内容创建一个配置文件 exec-memory-logger.conf 

3.启动fulme

flume-ng agent --name a1 --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-logger.conf \

-Dflume.root.logger=INFO,console

注意:处理离线数据,一般将数据存hdfs上,sink则使用HDFS sink

         处理实时数据,一般将数据存kafak, sink 则使用Kafak Sink




你可能感兴趣的:(FLUME)