flume采集数据到hdfs

说明:flume1.5,hadoop2.2
1、配置JAVA_HOME和HADOOP_HOME
说明:HADOOP_HOME用于获取flume操作hdfs所需的jar和配置文件,如果不配置,也可以手动拷贝jar包和配置文件
2、解压flume,执行bin目录下的flume-ng

flume-ng agent -f /master/env/fc/a4.conf -n a4 -c /master/env/flume/conf -Dflume.root.logger=INFO,console

命令解释:
agent run a Flume agent
global options:
–conf,-c use configs in directory、(配置文件的路径)
-Dproperty=value sets a Java system property value(java参数)
agent options:
–conf-file,-f specify a config file (required)(agent的启动配置文件)
–name,-n the name of this agent(agent的名称,必须和a4.conf中的agent名称一致)

a4.conf,以下是a4.conf

#定义agent名, source、channel、sink的名称
a4.sources = r1
a4.channels = c1
a4.sinks = k1

#具体定义source
a4.sources.r1.type = spooldir
a4.sources.r1.spoolDir = /home/hadoop/logs

#具体定义channel
a4.channels.c1.type = memory
a4.channels.c1.capacity = 10000
a4.channels.c1.transactionCapacity = 100

#定义拦截器,为消息添加时间戳
a4.sources.r1.interceptors = i1
a4.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder


#具体定义sink
a4.sinks.k1.type = hdfs
a4.sinks.k1.hdfs.path = hdfs://ns1/flume/%Y%m%d
a4.sinks.k1.hdfs.filePrefix = events-
a4.sinks.k1.hdfs.fileType = DataStream
#不按照条数生成文件
a4.sinks.k1.hdfs.rollCount = 0
#HDFS上的文件达到128M时生成一个文件
a4.sinks.k1.hdfs.rollSize = 134217728
#HDFS上的文件达到60秒生成一个文件
a4.sinks.k1.hdfs.rollInterval = 60

#组装source、channel、sink
a4.sources.r1.channels = c1
a4.sinks.k1.channel = c1

具体使用方法,参考官方文档http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

你可能感兴趣的:(大数据)