大数据集群搭建(9)——Flume收集文件到HDFS

flume将文件收集到HDFS上面

1.将hadoop的core-site.xml,hdfs-site.xml复制到flume/conf下

(文件在hadoop根目录的etc/hadoop下)

2.将hadoop的三个jar包拷贝到flume/lib下

share/hadoop/common/hadoop-common-2.2.0.jar 

share/hadoop/common/lib/hadoop-auth-2.2.0.jar

share/hadoop/common/lib/commons-configuration-1.6.jar  

3.写配置文件 flume_hdfs.conf

#定义agent名, source、channel、sink的名称  

a4.sources = r1  

a4.channels = c1  

a4.sinks = k1  

#具体定义source  

a4.sources.r1.type = spooldir

 #先创建此目录,保证里面空的

a4.sources.r1.spoolDir = /logs  

#具体定义channel  

a4.channels.c1.type = memory  

a4.channels.c1.capacity = 10000  

a4.channels.c1.transactionCapacity = 100  

#定义拦截器,为消息添加时间戳  

a4.sources.r1.interceptors = i1  

a4.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder  

#具体定义sink  

a4.sinks.k1.type = hdfs  

                   #集群的nameservers名字

                  #单节点的直接写:hdfs://master:9000/xxx

a4.sinks.k1.hdfs.path = hdfs://master:9000/flume/%Y%m%d  

a4.sinks.k1.hdfs.filePrefix = events-  

a4.sinks.k1.hdfs.fileType = DataStream  

#不按照条数生成文件  

a4.sinks.k1.hdfs.rollCount = 0

#HDFS上的文件达到128M时生成一个文件  

a4.sinks.k1.hdfs.rollSize = 134217728  

#HDFS上的文件达到60秒生成一个文件  

a4.sinks.k1.hdfs.rollInterval = 60  

#组装source、channel、sink  

a4.sources.r1.channels = c1  

a4.sinks.k1.channel = c1

4.启动  

bin/flume-ng agent -n a1 -c conf -f conf/文件名 -Dflume.root.logger=INFO,console

5.再开终端测试,浏览器查看

大数据集群搭建(9)——Flume收集文件到HDFS_第1张图片

 

你可能感兴趣的:(大数据集群搭建(9)——Flume收集文件到HDFS)