flume 学习笔记 -- 持续更新中

简介:flume 主要用于流式处理,基本流程为 : source---channel---skin,source相当于数据源,skin相当于目的地,两者通过channel连接。一个agent 一个 .conf 的配置文件

安装:
1、JDK
2、下载地址 http://www.apache.org/dyn/closer.lua/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
3、解压: tar -zxvf apache-flume-1.7.0-bin.tar.gz
cd apache-flume-1.7.0-bin
cp conf/flume-env.sh.template conf/flume-env.sh

使用

1、监听端口 : 收集端口信息
# Name the components on this agent(source_name sinks_name channel_name)
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source   (source 属性)
a1.sources.r1.type = netcat
a1.sources.r1.bind = 127.0.0.1    # ip
a1.sources.r1.port = 44444

# Describe the sink (日志形式返回)
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory 
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel (source_channel = sinks_channel)
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

# 测试  telnet  127.0.0.1  44444  big data world!
2、监听目录 : 监听新增文件及内容

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /tmp     #(监听的目录)
a1.sources.r1.fileHeader = true   
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3、监听数据库并保存到本地文件

mysql.sources = r1
mysql.channels = ch1
mysql.sinks = k1

# source 
mysql.sources.r1.type = org.keedio.flume.source.SQLSource
mysql.sources.r1.connection.url = jdbc:mysql://47.106.94.209:3306/test
mysql.sources.r1.user = zwj
mysql.sources.r1.password = xxxxxxx
mysql.sources.r1.table = a_test
mysql.sources.r1.columns.to.select = *
mysql.sources.r1.incremental.column.name = id  #增量列
mysql.sources.r1.incremental.value = 0
mysql.sources.r1.run.query.delay=5000  #延迟,5秒查一次
mysql.sources.r1.status.file.path = /tmp/flume_status #状态目录
mysql.sources.r1.status.file.name = r1.status  #状态文件


# channel 
mysql.channels.ch1.type = memory
mysql.channels.ch1.capacity = 1000
mysql.channels.ch1.transactionCapacity = 100


# skins  

mysql.sinks.k1.type = file_roll
mysql.sinks.k1.sink.directory = /tmp/flume_data  #数据目录
mysql.sinks.k1.sink.rollInterval= 0  # 每隔0秒生成一个新文件

# 此处 目录前的sink 不可缺,否则会报目录错误。https://issues.apache.org/jira/browse/FLUME-1222
# 连接数据需要3个jar包,mysql-connector-java-5.1.18-bin.jar  flume-ng-sql-source-1.3.7-sources.jar     flume-ng-sql-source-1.3.7.jar
# http://book2s.com/java/jar/f/flume-ng-sql-source/download-flume-ng-sql-source-1.3.7.html

# concat 
mysql.sources.r1.channels = ch1
mysql.sinks.k1.channel = ch1





启动命令:
flume 目录下运行:

bin/flume-ng agent --conf conf --conf-file conf/filedir.conf --name a1 -Dflume.root.logger=INFO,console
# ( flume目录)        (配置文件夹)       (配置文件)           (agent name)    (日志级别)











你可能感兴趣的:(flume,flume,流式处理,实时)