flume遇见的一些问题

我想要做的是需要将本地文件夹目录下的文件传输到hdfs上,
1.首先,安装flume工具
2.配置flume的配置文件
flume]# ls
bin conf docs etc lib tools
在conf的目录下创建一个配置文件fk
flume]# cd conf/
[root@hdp-gp-dk01 conf]# ls
agent flume.conf flume-env.ps1 log4j.properties test2 test4 test6
fk flume-conf.properties.template flume-env.sh.template
3.配置文件内容如下:

[root@hdp-gp-dk01 conf]# cat fk
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type =spooldir 
a1.sources.r1.spoolDir=/opt/data/
a1.sources.r1.ignorePattern = ^(.)*\\.tmp$

# Describe the sink
a1.sinks.k1.type=hdfs 
a1.sinks.k1.hdfs.path=hdfs://127.0.0.1:8020/flume/data/%Y-%m-%d-%H
a1.sinks.k1.hdfs.rollSize=10240000
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.idleTimeout=5
a1.sinks.k1.hdfs.fileType=DataStream


# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1 

遇到的问题
1.ERROR flume.SinkRunner: Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
这是HDFS sink 提示的异常,这是因为在hdfs路径中有时间参数,然后取event中的header的timestamp 参数,取不到提示的异常;所以解决方法是:
检查配置文件conf/fk 中是否用到了时间参数,去掉他们。
2.就是agent老是会挂掉,报这个异常:
ERROR org.apache.flume.source.SpoolDirectorySource: FATAL: Spool Directory source source1: { spoolDir: /opt/data/ }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
  java.nio.charset.MalformedInputException: Input length = 1
  at java.nio.charset.CoderResult.throwException(CoderResult.java:277)
  at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:195)
  at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:134)
  at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:72)
  at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:91)
  at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:241)
  at org.apache.flume.source.SpoolDirectorySource SpoolDirectoryRunnable.run(SpoolDirectorySource.java:224)  atjava.util.concurrent.Executors RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
  at java.util.concurrent.ScheduledThreadPoolExecutor ScheduledFutureTask.access 301(ScheduledThreadPoolExecutor.java:178)
  at java.util.concurrent.ScheduledThreadPoolExecutor ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)  atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)  atjava.util.concurrent.ThreadPoolExecutor Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)

解决办法:原因的inputCharset属性的默认值UTF-8,但是所读取的日志文件的字符集却是GBK,所以更改一下这个属性值就可以了

[root@hdp-gp-dk01 conf]# cat fk
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type =spooldir 
a1.sources.r1.spoolDir=/opt/data/
a1.sources.r1.inputCharset = GBK
a1.sources.r1.ignorePattern = ^(.)*\\.tmp$

# Describe the sink
a1.sinks.k1.type=hdfs 
a1.sinks.k1.hdfs.path=hdfs://127.0.0.1:8020/flume/data/test
a1.sinks.k1.hdfs.rollSize=10240000
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.idleTimeout=5
a1.sinks.k1.hdfs.fileType=DataStream


# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1 

添加:a1.sources.r1.inputCharset = GBK

你可能感兴趣的:(开发笔记,大数据)