因为flume版本不同,source、channel和sink的接口都是不一样的,所以需要使用对应版本的接口。
本文以flume1.6.0为例,参考http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.7.0/FlumeUserGuide.html
(1)功能
侦听Avro端口并从外部Avro客户端流接收事件。 适用于:分层的数据收集。
(2)必须配置的参数
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be avro |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
(3)实例
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
(1)功能
监控文件。适用场景:监控日志
(2)必须配置的参数
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be exec |
command | – | The command to execute |
(3)实例
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure
a1.sources.r1.channels = c1
(1)功能
监控某一个文件目录。
(2)必须配置的参数
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be spooldir. |
spoolDir | – | The directory from which to read files from. |
(3)实例
a1.channels = ch-1
a1.sources = src-1
a1.sources.src-1.type = spooldir
a1.sources.src-1.channels = ch-1
a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool
a1.sources.src-1.fileHeader = true
(1)功能
事件存储在具有可配置最大大小的内存队列中。适用场景:需要更高吞吐量并准备在代理故障的情况下丢失上载数据的流的理想选择。
缺点:Memory Channel是一个不稳定的隧道,它在内存中存储所有事件。如果进程异常停止,内存中的数据将不能让恢复。受内存大小的限制。
(2)必须配置的参数
Property Name | Default | Description |
---|---|---|
type | – | The component type name, needs to be memory |
(3)实例
a1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000
(1)功能
是一个持久化的channel,数据安全并且只要磁盘空间足够,它就可以将数据存储到磁盘上
(2)必须配置的参数
Property Name Default | Description | |
---|---|---|
type | – | The component type name, needs to be file. |
(3)实例
a1.channels = c1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /mnt/flume/checkpoint
a1.channels.c1.dataDirs = /mnt/flume/data
参数解析:
checkpointDir:检查数据完整性,存放检查点目录,可以检测出哪些数据已被抽取,哪些还没有
dataDirs:存放数据的目录,dataDirs可以是多个目录,以逗号隔开,
用独立的多个磁盘上的多个目录可以提高file channel的性能。
(1)功能
此接收器将事件写入Hadoop分布式文件系统(HDFS)
(2)必须配置的参数
Name | Default | Description |
---|---|---|
channel | – | |
type | – | The component type name, needs to be hdfs |
hdfs.path | – | HDFS directory path (eg hdfs://namenode/flume/webdata/) |
(3)实例
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
(1)功能
此接收器将包含定界文本或JSON数据的事件直接传输到Hive表或分区。
(2)必须配置的参数
Name | Default | Description |
---|---|---|
channel | – | |
type | – | The component type name, needs to be hive |
hive.metastore | – | Hive metastore URI (eg thrift://a.b.com:9083 ) |
hive.database | – | Hive database name |
hive.table | – | Hive table name |
(3)实例
a1.channels = c1
a1.channels.c1.type = memory
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
a1.sinks.k1.hive.database = logsdb
a1.sinks.k1.hive.table = weblogs
a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = "\t"
a1.sinks.k1.serializer.serdeSeparator = '\t'
a1.sinks.k1.serializer.fieldnames =id,,msg
(1)功能
把数据写入hbase。
(2)必须配置的参数
Property Name | Default | Description |
---|---|---|
channel | – | |
type | – | The component type name, needs to be hbase |
table | – | The name of the table in Hbase to write to. |
columnFamily | – | The column family in Hbase to write to. |
(3)实例
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hbase
a1.sinks.k1.table = foo_table
a1.sinks.k1.columnFamily = bar_cf
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel = c1
(1)功能
avro sink形成了Flume分层收集支持的一半。 发送到此接收器的Flume事件将转换为Avro事件并发送到配置的主机名/端口对。 事件从已配置的通道以批量配置的批处理大小获取
(2)必须配置的参数
Property Name | Default | Description |
---|---|---|
channel | – | |
type | – | The component type name, needs to be avro. |
hostname | – | The hostname or IP address to bind to. |
port | – | The port # to listen on. |
(3)实例
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 10.10.10.10
a1.sinks.k1.port = 4545
(1)功能
把数据写入kafka对应的topic中。
(2)必须配置的参数
Property Name | Default | Description |
---|---|---|
type | – | Must be set to org.apache.flume.sink.kafka.KafkaSink |
brokerList | – | List of brokers Kafka-Sink will connect to, to get the list of topic partitions This can be a partial list of brokers, but we recommend at least two for HA. The format is comma separated list of hostname:port |
(3)实例
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = mytopic
a1.sinks.k1.brokerList = localhost:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
a1.sinks.k1.channel = c1