flume往hbase或kafka写入数据

flume

1. flume介绍

flume 作为 cloudera 开发的实时日志收集系统,受到了业界的认可与广泛应用。Flume 初始的发行版本目前被统称为 Flume OG(original generation),属于 cloudera。

但随着 FLume 功能的扩展,Flume OG 代码工程臃肿、核心组件设计不合理、核心配置不标准等缺点暴露出来,尤其是在 Flume OG 的最后一个发行版本 0.9.4. 中,日

志传输不稳定的现象尤为严重,为了解决这些问题,2011 年 10 月 22 号,cloudera 完成了 Flume-728,对 Flume 进行了里程碑式的改动:重构核心组件、核心配置以

及代码架构,重构后的版本统称为 Flume NG(next generation);改动的另一原因是将 Flume 纳入 apache 旗下,cloudera Flume 改名为 Apache Flume。

备注:Flume参考资料

官方网站: http://flume.apache.org/
    用户文档: http://flume.apache.org/FlumeUserGuide.html
    开发文档: http://flume.apache.org/FlumeDeveloperGuide.html
    flume往hbase或kafka写入数据_第1张图片

2. flume -> kafka

代码示例如下:

events.sources = eventsSource
events.channels = eventsChannel
events.sinks = eventsSink

events.sources.eventsSource.type = spooldir
events.sources.eventsSource.spoolDir = /opt/flumelog/oridata/events
events.sources.eventsSource.deserializer = LINE
events.sources.eventsSource.deserializer.maxLineLength = 32000
events.sources.eventsSource.includePattern = events_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
events.sources.eventsSource.interceptors = head_filter
events.sources.eventsSource.interceptors.head_filter.type = regex
events.sources.eventsSource.interceptors.head_filter.regex = ^user_id*
events.sources.eventsSource.interceptors.head_filter.excludeEvents=true

events.channels.eventsChannel.type = file
events.channels.eventsChannel.checkpointDir = /opt/flumelog/checkpoint/events
events.channels.eventsChannel.dataDirs = /opt/flumelog/data/events

events.sinks.eventsSink.type = org.apache.flume.sink.kafka.KafkaSink
events.sinks.eventsSink.batchSize = 640
events.sinks.eventsSink.brokerList = 192.168.126.166:9092
events.sinks.eventsSink.topic = events

events.sources.eventsSource.channels = eventsChannel
events.sinks.eventsSink.channel = eventsChannel

3. flume -> HDFS

代码示例如下:

users.sources = usersSource
users.channels = usersChannel
users.sinks = usersSink

users.sources.usersSource.type = spooldir
users.sources.usersSource.spoolDir = /opt/flumelog/users
users.sources.usersSource.includePattern = users_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
users.sources.usersSource.deserializer = LINE
users.sources.usersSource.deserializer.maxLineLength = 1280
users.sources.usersSource.interceptors = head_filter
users.sources.usersSource.interceptors.head_filter.type = regex_filter
users.sources.usersSource.interceptors.head_filter.regex = ^user_id*
users.sources.usersSource.interceptors.head_filter.excludeEvents = true

users.channels.usersChannel.type = file
users.channels.usersChannel.checkpointDir = /opt/flumelog/checkpoint/users
users.channels.usersChannel.dataDirs = /opt/flumelog/data/users

users.sinks.usersSink.type = hdfs
users.sinks.usersSink.hdfs.fileType = DataStream
users.sinks.usersSink.hdfs.filePrefix = users
users.sinks.usersSink.hdfs.fileSuffix = .csv
users.sinks.usersSink.hdfs.path = hdfs://192.168.126.166:9000/tmp/users/%Y-%m-%d
users.sinks.usersSink.hdfs.useLocalTimeStamp = true
users.sinks.usersSink.hdfs.batchSize = 640
users.sinks.usersSink.hdfs.rollCount = 0
users.sinks.usersSink.hdfs.rollSize = 100000000
users.sinks.usersSink.hdfs.rollInterval = 30

users.sinks.usersSink.channel = usersChannel
users.sources.usersSource.channels = usersChannel

4. flume -> channel:Kafka -> sink:HDFS

代码示例如下:

event.sources = eventSource
event.channels = kfkChannel
event.sinks = hdfsSink

event.sources.eventSource.type = spooldir
event.sources.eventSource.spoolDir = /opt/flumelog/oridata/events/
event.sources.eventSource.includePattern = event_1.csv
event.sources.usersSource.deserializer = LINE
event.sources.usersSource.deserializer.maxLineLength = 1280
event.sources.eventSource.channels = kfkChannel

event.channels.kfkChannel.type = org.apache.flume.channel.kafka.KafkaChannel
event.channels.kfkChannel.capacity = 10000
event.channels.kfkChannel.transactionCapacity = 1000
event.channels.kfkChannel.brokerList = 192.168.126.166:9092
event.channels.kfkChannel.topic = exkfk
event.channels.kfkChannel.zookeeperConnect = 192.168.126.166:2181
event.channels.kfkChannel.parseAsFlumeEvent = true

event.sinks.hdfsSink.type = hdfs
event.sinks.hdfsSink.hdfs.path = hdfs://192.168.126.166:9000/tmp/kafka/channel
event.sinks.hdfsSink.hdfs.filePrefix = events
event.sinks.hdfsSink.hdfs.fileSuffix = .csv
event.sinks.hdfsSink.hdfs.rollInterval = 5
event.sinks.hdfsSink.hdfs.rollSize = 100000000
event.sinks.hdfsSink.hdfs.rollCount = 0
event.sinks.hdfsSink.hdfs.fileType = DataStream
event.sinks.hdfsSink.channel = kfkChannel

暂时就记这么多,后面再添加。

你可能感兴趣的:(flume往hbase或kafka写入数据)