1.先从官网下载flume的jar包。我们下载最新的Apache Flume binary (tar.gz) 1.8版本
地址:http://www.apache.org/dyn/closer.lua/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
2.之后再hadoop中解压
tar -zxvf apache-flume-1.8.0-bin.tar.gz
然后我们可以改变文件名,这样好操作(可以不改)
mv apache-flume-1.8.0-bin.tar.gz flume
3.之后前往/etc/profile配置环境变量(不配也可以,配了之后命令好操作)
export ZOOKEEPER_HOME=/home/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin:/home/flume/bin
4.之后创建一个文件test
# example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = node1 a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
注意:a1.sources.r1.bind = localhost(改成你的主机地址)
5.运行
$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console注意:由于设置source设置的是netcat,是tcp 所以我们要使用tcp协议的输入
我们可以使用telnet 192.168.56.101 44444来检测日志是否生成。
最后我们
a1.sinks.k1.type = logger 我们sink输出的是logger所以就在node1节点打印输出6.上面的netcat是被动输出日志的,我 们还可以使用一种主动地生成日志
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /opt/flume
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://192.168.56.101:8020/flume/%Y-%m-%d/%H%M
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 60
a1.sinks.k1.hdfs.rollSize = 10240
a1.sinks.k1.hdfs.idleTimeout = 3
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 5
a1.sinks.k1.hdfs.roundUnit = minute
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
7.spooldir 是一种主动生成日志的方式
之后我们往/opt/flume传入一个文件 发现日志文件在hadoop集群上面主动生成
如果遇到此错误
2017-11-22 21:22:26,047 (pool-3-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:280)] FATAL: Spool Directory source r1: { spoolDir: /opt/flume }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:283)
at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:132)
at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:70)
at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:89)
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readDeserializerEvents(ReliableSpoolingFileEventReader.java:343)
at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:318)
at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:250)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
是因为你传入的文件有特殊符号,请更改就没错误了,至此单点的flume创建完成。/