官方网站:http://flume.apache.org/
Apache版1.6.0下载地址:http://www.apache.org/dyn/closer.cgi/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
案例1:Avro
Avro可以发送一个给定的文件给Flume,Avro 源使用AVRO RPC机制。a)创建agent配置文件
[hadoop@h71 conf]$ vi avro.confa1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.8.71
a1.sources.r1.port = 4141
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 02:00:34 INFO source.AvroSource: Avro source r1 started.
12/12/13 02:03:42 INFO ipc.NettyServer: [id: 0x060035f0, /192.168.8.71:56184 => /192.168.8.71:4141] OPEN
12/12/13 02:03:42 INFO ipc.NettyServer: [id: 0x060035f0, /192.168.8.71:56184 => /192.168.8.71:4141] BOUND: /192.168.8.71:4141
12/12/13 02:03:42 INFO ipc.NettyServer: [id: 0x060035f0, /192.168.8.71:56184 => /192.168.8.71:4141] CONNECTED: /192.168.8.71:56184
12/12/13 02:03:42 INFO ipc.NettyServer: [id: 0x060035f0, /192.168.8.71:56184 :> /192.168.8.71:4141] DISCONNECTED
12/12/13 02:03:42 INFO ipc.NettyServer: [id: 0x060035f0, /192.168.8.71:56184 :> /192.168.8.71:4141] UNBOUND
12/12/13 02:03:42 INFO ipc.NettyServer: [id: 0x060035f0, /192.168.8.71:56184 :> /192.168.8.71:4141] CLOSED
12/12/13 02:03:42 INFO ipc.NettyServer: Connection to /192.168.8.71:56184 disconnected.
12/12/13 02:03:44 INFO sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world }
补充:当启动命令最后加-Dflume.root.logger=INFO,console参数时会将日志信息打印到控制台中。
当不加-Dflume.root.logger=INFO,console参数的时候,会在flume的家目录中生成logs目录,并且在该目录下生成flume.log文件。当你打开flume.log文件的时候你会发现内容其实就是当你加-Dflume.root.logger=INFO,console参数时在控制台所打印出来的信息(好像格式稍微有点不同,比如时间的表示方式,这个应该得在log4j中配置吧)
查看flume默认的log4j.properties文件(在你flume安装目录的cong目录下)你会发现这么几行:
#flume.root.logger=DEBUG,console
flume.root.logger=INFO,LOGFILE
flume.log.dir=./logs
flume.log.file=flume.log
a)创建agent配置文件
[hadoop@h71 conf]$ vi spool.confa1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = /home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/logs
a1.sources.r1.fileHeader = true
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 02:19:50 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
12/12/13 02:20:23 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
12/12/13 02:20:23 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/logs/spool_text.log to /home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/logs/spool_text.log.COMPLETED
12/12/13 02:20:23 INFO sink.LoggerSink: Event: { headers:{file=/home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/logs/spool_text.log} body: 73 70 6F 6F 6C 20 74 65 73 74 31 spool test1 }
a)创建agent配置文件
[hadoop@h71 conf]$ vi exec_tail.confa1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /home/hadoop/flume-1.5.0-bin/log_exec_tail
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 02:33:06 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
12/12/13 02:34:00 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 31 exec tail1 }
12/12/13 02:34:00 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 32 exec tail2 }
12/12/13 02:34:00 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 33 exec tail3 }
12/12/13 02:34:00 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 34 exec tail4 }
12/12/13 02:34:00 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 35 exec tail5 }
....
....
....
12/12/13 02:34:09 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 39 38 exec tail98 }
12/12/13 02:34:09 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 39 39 exec tail99 }
12/12/13 02:34:09 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 31 30 30 exec tail100 }
a)创建agent配置文件
[hadoop@h71 conf]$ vi syslog_tcp.confa1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.8.71
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 02:40:51 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
12/12/13 02:40:51 INFO node.Application: Starting Sink k1
12/12/13 02:40:51 INFO node.Application: Starting Source r1
12/12/13 02:40:51 INFO source.SyslogTcpSource: Syslog TCP Source starting...
12/12/13 02:42:09 WARN source.SyslogUtils: Event created from Invalid Syslog data.
12/12/13 02:42:09 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.port = 8888
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a)创建agent配置文件
[hadoop@h71 conf]$ vi hdfs_sink.confa1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.8.71
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs://192.168.8.71:9000/user/flume/syslogtcp
a1.sinks.k1.hdfs.filePrefix = Syslog
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 03:00:57 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
12/12/13 03:00:57 INFO node.Application: Starting Source r1
12/12/13 03:00:58 INFO source.SyslogTcpSource: Syslog TCP Source starting...
12/12/13 03:01:01 WARN source.SyslogUtils: Event created from Invalid Syslog data.
12/12/13 03:01:02 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
12/12/13 03:01:02 INFO hdfs.BucketWriter: Creating hdfs://192.168.8.71:9000/user/flume/syslogtcp/Syslog.1355338862051.tmp
12/12/13 03:01:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/12/13 03:01:33 INFO hdfs.BucketWriter: Closing hdfs://192.168.8.71:9000/user/flume/syslogtcp/Syslog.1355338862051.tmp
12/12/13 03:01:33 INFO hdfs.BucketWriter: Renaming hdfs://192.168.8.71:9000/user/flume/syslogtcp/Syslog.1355338862051.tmp to hdfs://192.168.8.71:9000/user/flume/syslogtcp/Syslog.1355338862051
12/12/13 03:01:33 INFO hdfs.HDFSEventSink: Writer callback called.
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5555
a1.sources.r1.host = 192.168.8.71
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/logs
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 03:10:33 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
12/12/13 03:10:33 INFO node.Application: Starting Source r1
12/12/13 03:10:33 INFO sink.RollingFileSink: RollingFileSink k1 started.
12/12/13 03:10:34 INFO source.SyslogTcpSource: Syslog TCP Source starting...
12/12/13 03:11:38 WARN source.SyslogUtils: Event created from Invalid Syslog data.
12/12/13 03:12:44 WARN source.SyslogUtils: Event created from Invalid Syslog data.
-rw-rw-r-- 1 hadoop hadoop 50 Dec 13 03:19 1355339980196-1
(默认它会每30秒生成一个文件,这样的话会在产生很多的小文件,如果嫌麻烦的话可以添加参数a1.sinks.k1.sink.rollInterval = 0,后面跟的数字自己定义,就是多少秒产生一个新的文件,我这里设置的0,那么就会只产生一个文件)
Flume支持Fan out flow(扇出流)从一个源到多个通道。有两种方式,一种是用来复制(Replication),另一种是用来分流(Multiplexing)。
案例8:Replicating Channel Selector
Replication方式,可以将最前端的数据源复制多份,分别传递到多个channel中,每个channel接收到的数据都是相同的。
这次我们需要用到h71,h72两台机器
a)在h71创建replicating_Channel_Selector配置文件
[hadoop@h71 conf]$ vi replicating_Channel_Selector.confa1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.8.71
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = replicating
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.8.71
a1.sinks.k1.port = 5555
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.8.72
a1.sinks.k2.port = 5555
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.8.71
a1.sources.r1.port = 5555
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 06:36:01 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
12/12/13 06:36:01 INFO source.AvroSource: Avro source r1 started.
12/12/13 06:36:46 INFO ipc.NettyServer: [id: 0x54f65bf3, /192.168.8.71:43038 => /192.168.8.71:5555] OPEN
12/12/13 06:36:46 INFO ipc.NettyServer: [id: 0x54f65bf3, /192.168.8.71:43038 => /192.168.8.71:5555] BOUND: /192.168.8.71:5555
12/12/13 06:36:46 INFO ipc.NettyServer: [id: 0x54f65bf3, /192.168.8.71:43038 => /192.168.8.71:5555] CONNECTED: /192.168.8.71:43038
12/12/13 06:36:47 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }
h72上:
2012-12-13 06:31:28,547 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SOURCE, name: r1 started
2012-12-13 06:31:28,549 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:253)] Avro source r1 started.
2012-12-13 06:31:38,500 (New I/O server boss #3) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x0a5fa6e0, /192.168.8.71:49630 => /192.168.8.72:5555] OPEN
2012-12-13 06:31:38,501 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x0a5fa6e0, /192.168.8.71:49630 => /192.168.8.72:5555] BOUND: /192.168.8.72:5555
2012-12-13 06:31:38,501 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x0a5fa6e0, /192.168.8.71:49630 => /192.168.8.72:5555] CONNECTED: /192.168.8.71:49630
2012-12-13 06:33:18,375 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }
猜想:我记得以前并没有设置selector.type的类型为replicating,也就是说并没有该项,只是简单的在一个配置文件中配置了一个source和两个channel和两个sink,并且两个channel和两个sink是一一对应,然而也成功了,而且就是复制的效果。所以我这里大胆猜测当你不设置这一项的时候,它默认就是复制(replicating),如果你想用多路复用的话还必须得配置该项为Multiplexing。
案例9:Multiplexing Channel Selector
Multiplexing方式,selector可以根据header的值来确定数据传递到哪一个channel
a)在h71创建Multiplexing_Channel_Selector配置文件
[hadoop@h71 conf]$ vi Multiplexing_Channel_Selector.confa1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = type
#映射允许每个值通道可以重叠。默认值可以包含任意数量的通道。
a1.sources.r1.selector.mapping.baidu = c1
a1.sources.r1.selector.mapping.ali = c2
a1.sources.r1.selector.default = c1
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.8.71
a1.sinks.k1.port = 5555
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.8.72
a1.sinks.k2.port = 5555
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.8.71
a1.sources.r1.port = 5555
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12/12/13 08:12:23 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
12/12/13 08:12:23 INFO source.AvroSource: Avro source r1 started.
12/12/13 08:13:08 INFO ipc.NettyServer: [id: 0x7c761258, /192.168.8.71:52767 => /192.168.8.71:5555] OPEN
12/12/13 08:13:08 INFO ipc.NettyServer: [id: 0x7c761258, /192.168.8.71:52767 => /192.168.8.71:5555] BOUND: /192.168.8.71:5555
12/12/13 08:13:08 INFO ipc.NettyServer: [id: 0x7c761258, /192.168.8.71:52767 => /192.168.8.71:5555] CONNECTED: /192.168.8.71:52767
12/12/13 08:15:33 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 }
12/12/13 08:15:33 INFO sink.LoggerSink: Event: { headers:{type=qq} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 33 idoall_TEST3 }
2012-12-13 08:09:18,316 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SOURCE, name: r1 started
2012-12-13 08:09:18,317 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:253)] Avro source r1 started.
2012-12-13 08:09:40,430 (New I/O server boss #3) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0xcb673fb5, /192.168.8.71:46032 => /192.168.8.72:5555] OPEN
2012-12-13 08:09:40,432 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0xcb673fb5, /192.168.8.71:46032 => /192.168.8.72:5555] BOUND: /192.168.8.72:5555
2012-12-13 08:09:40,432 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0xcb673fb5, /192.168.8.71:46032 => /192.168.8.72:5555] CONNECTED: /192.168.8.71:46032
2012-12-13 08:12:05,774 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 32 idoall_TEST2 }
a)在h71创建Flume_Sink_Processors配置文件
[hadoop@h71 conf]$ vi Flume_Sink_Processors.confa1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
#这个是配置failover的关键,需要有一个sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
#处理的类型是failover
a1.sinkgroups.g1.processor.type = failover
#优先级,数字越大优先级越高,每个sink的优先级必须不相同
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
#设置为10秒,当然可以根据你的实际状况更改成更快或者很慢
a1.sinkgroups.g1.processor.maxpenalty = 10000
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.type = replicating
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.8.71
a1.sinks.k1.port = 5555
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c2
a1.sinks.k2.hostname = 192.168.8.72
a1.sinks.k2.port = 5555
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.8.71
a1.sources.r1.port = 5555
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2012-12-13 05:51:57,248 (lifecycleSupervisor-1-4) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start EventDrivenSourceRunner: { source:Avro source r1: { bindAddress: 192.168.8.71, port: 5555 } } - Exception follows.
org.jboss.netty.channel.ChannelException: Failed to bind to: /192.168.8.71:5555
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at org.apache.avro.ipc.NettyServer.(NettyServer.java:106)
at org.apache.flume.source.AvroSource.start(AvroSource.java:236)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
... 3 more
Info: Including Hive libraries found via () for Hive access
+ exec /usr/jdk1.7.0_25/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin:/home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/lib/*:/lib/*' -Djava.library.path= org.apache.flume.node.Application -f conf/avro.conf -n a1
log4j:WARN No appenders could be found for logger (org.apache.flume.lifecycle.LifecycleSupervisor).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
后来上网查说是因为使用 -c 指定的conf位置出错;于是修改执行命令
[hadoop@h72 apache-flume-1.6.0-cdh5.5.2-bin]$ bin/flume-ng agent -c /home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/conf/ -f conf/avro.conf -n a1 -Dflume.root.logger=INFO,console
(原来是我在h72机器的.bash_profile文件中没有添加hadoop的环境变量才出现了这个问题,于是添加环境变量后好使了
HADOOP_HOME=/home/hadoop/hadoop-2.6.0-cdh5.5.2
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME HADOOP_CONF_DIR PATH
然后再让该配置文件生效[hadoop@h72 ~]$ source .bash_profile
思考:按理说-c后面跟的应该是conf目录,那么跟相对路径和绝对路径都是可以的,但是在.bash_profile中添加了hadoop的环境变量后在flume的家目录下-c后面跟.为什么也会识别啊,.的意思不是本目录下吗,即/home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin,而不是/home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/conf啊
)
2012-12-13 06:04:42,892 (New I/O server boss #3) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0xcdd2cc86, /192.168.8.71:37143 => /192.168.8.72:5555] OPEN
2012-12-13 06:04:42,892 (New I/O worker #2) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0xcdd2cc86, /192.168.8.71:37143 => /192.168.8.72:5555] BOUND: /192.168.8.72:5555
2012-12-13 06:04:42,892 (New I/O worker #2) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0xcdd2cc86, /192.168.8.71:37143 => /192.168.8.72:5555] CONNECTED: /192.168.8.71:37143
2012-12-13 06:04:52,000 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 30 idoall.org test1 }
12/12/13 06:08:10 INFO ipc.NettyServer: [id: 0x45a46286, /192.168.8.71:55655 => /192.168.8.71:5555] OPEN
12/12/13 06:08:10 INFO ipc.NettyServer: [id: 0x45a46286, /192.168.8.71:55655 => /192.168.8.71:5555] BOUND: /192.168.8.71:5555
12/12/13 06:08:10 INFO ipc.NettyServer: [id: 0x45a46286, /192.168.8.71:55655 => /192.168.8.71:5555] CONNECTED: /192.168.8.71:55655
12/12/13 06:16:13 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 30 idoall.org test1 }
12/12/13 06:16:13 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 }
2012-12-13 06:14:14,191 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SOURCE, name: r1 started
2012-12-13 06:14:14,192 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:253)] Avro source r1 started.
2012-12-13 06:14:18,934 (New I/O server boss #3) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x45dd9ffb, /192.168.8.71:57973 => /192.168.8.72:5555] OPEN
2012-12-13 06:14:18,936 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x45dd9ffb, /192.168.8.71:57973 => /192.168.8.72:5555] BOUND: /192.168.8.72:5555
2012-12-13 06:14:18,936 (New I/O worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x45dd9ffb, /192.168.8.71:57973 => /192.168.8.72:5555] CONNECTED: /192.168.8.71:57973
2012-12-13 06:14:22,935 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 }
2012-12-13 06:16:07,028 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 }
2012-12-13 06:16:07,028 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 }
a)在h71创建Load_balancing_Sink_Processors配置文件
[hadoop@h71 conf]$ vi Load_balancing_Sink_Processors.confa1.sources = r1
a1.sinks = k1 k2
a1.channels = c1
#这个是配置Load balancing的关键,需要有一个sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 192.168.8.71
a1.sinks.k1.port = 5555
a1.sinks.k2.type = avro
a1.sinks.k2.channel = c1
a1.sinks.k2.hostname = 192.168.8.72
a1.sinks.k2.port = 5555
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 192.168.8.71
a1.sources.r1.port = 5555
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = 192.168.8.71
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.type = hbase
a1.sinks.k1.table = test_idoall_org
a1.sinks.k1.columnFamily = name
a1.sinks.k1.column = idoall
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel = memoryChannel
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
1 row(s) in 0.0160 seconds
参考文章:http://www.jb51.net/article/53542.htm
(在此文章的基础上做了一些修改和完善)