Flume日志收集系统之拦截器-----(1)

                  Flume中的拦截器(Interceptor)介绍与使用

Flume中的拦截器(interceptor),用户Source读取events发送到Sink的时候,在events header中加入一些有用的信息,或者对events的内容进行过滤,完成初步的数据清洗。

Flume-ng 1.70中目前提供了以下拦截器:

Timestamp Interceptor;
Host Interceptor;
Static Interceptor;
Search and Replace Interceptor;
Regex Filtering Interceptor;

对一个Source可以使用多个拦截器。

 

步骤1、操作步骤   通过执行命令start-all.sh启动hadoop,并且flume安装目录的conf目录中进行编写配置文件

 

Timestamp Interceptor

时间戳拦截器,将当前时间戳(毫秒)加入到events header中,key名字为:timestamp,值为当前时间戳。用的不是很多。比如在使用HDFS Sink时候,根据events的时间戳生成结果文件,hdfs.path = hdfs://master:9000/flume/%Y%m%d

#配置文件:mytime.conf

 

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = syslogtcp

a1.sources.r1.port = 50000

a1.sources.r1.host = 192.168.27.174                (根据实际情况设置)

a1.sources.r1.channels = c1

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.preserveExisting= false

a1.sources.r1.interceptors.i1.type = timestamp

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path =hdfs://master:9000/flume/%Y-%m-%d/%H%M

a1.sinks.k1.hdfs.filePrefix = looklook5.

a1.sinks.k1.hdfs.fileType=DataStream

# Use a channel which buffers events inmemory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

 

在设置好环境变量的情况下,切换到flume目录下,执行flume命令:flume-ng agent -c conf -f conf/mytime.conf -n a1 -Dflume.root.logger=INFO,console

此时打开另一个终端界面使用telnet请求命令向5000端口发送请求信息,使得flume获取生成时间戳的日志信息:

           telnet 192.168.27.174 :50000

同时可以在hdfs上查看生成的日志文件。

 

Host Interceptor

主机名拦截器。将运行Flume agent的主机名或者IP地址加入到events header中,key名字为:host(也可自定义)。设置配置文件myhost.conf

# define agent name, source/sink/channel name

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# source,http,jsonhandler

a1.sources.r1.type = http

a1.sources.r1.bind = master

a1.sources.r1.port = 50000

a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler

 

# timestamp and host interceptors work before source

a1.sources.r1.interceptors = i1 i2       

# 两个interceptor串联,依次作用于event

a1.sources.r1.interceptors.i1.type = timestamp

a1.sources.r1.interceptors.i1.preserveExisting = false  

 

a1.sources.r1.interceptors.i2.type = host  

# flume event的头部将添加 “hostname”:实际主机名

a1.sources.r1.interceptors.i2.hostHeader = hostname  

# 指定key,value将填充为flume agent所在节点的主机名

a1.sources.r1.interceptors.i2.useIP = false

# 04 hdfs sink

a1.sinks.k1.type = hdfs  

a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/%Y-%m-%d/   

# hdfs sink将根据event header中的时间戳进行替换

a1.sinks.k1.hdfs.filePrefix = %{hostname}   

# hdfs sink将根据event header中的hostnmae对应的value进行替换

a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.writeFormat = Text

a1.sinks.k1.hdfs.rollInterval = 0

a1.sinks.k1.hdfs.rollCount = 10

a1.sinks.k1.hdfs.rollSize = 1024000

# channel,memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# bind source,sink to channel

a1.sinks.k1.channel = c1

a1.sources.r1.channels = c1

 

该配置用于将source的events保存到HDFS上hdfs://master:9000/flume/

%Y%m%d的目录下,文件名为file_<主机名>.log

在设置好环境变量的情况下,切换到flume目录下,执行flume命令:flume-ng agent -c conf -f conf/myhost.conf -n a1 -Dflume.root.logger=INFO,console

此时启动flume终端,成功启动时如下所示:

 

此时打开另一个终端界面使用telnet请求命令向5000端口发送请求信息:

telnet master:5000

在hdfs上查看生成的日志文件:

 

Static Interceptor

静态拦截器,用于在events header中加入一组静态的key和value。

拦截器允许用户增加一个static的header并为所有的事件赋值。范围是所有事件。

## source 拦截器

#配置文件:static_case18.conf  

# Name the components on this agent  

a1.sources = r1  

a1.sinks = k1  

a1.channels = c1  

# Describe/configure the source  

a1.sources.r1.type = syslogtcp  

a1.sources.r1.port =  50000

a1.sources.r1.host = 192.168.27.174

a1.sources.r1.channels = c1  

a1.sources.r1.interceptors = i1  

a1.sources.r1.interceptors.i1.type = static  

a1.sources.r1.interceptors.i1.key = static_key

a1.sources.r1.interceptors.i1.value =static_value

# Describe the sink  

a1.sinks.k1.type = logger   

# Use a channel which buffers events inmemory  

a1.channels.c1.type = memory  

a1.channels.c1.capacity = 1000  

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel  

a1.sources.r1.channels = c1  

a1.sinks.k1.channel = c1

在设置好环境变量的情况下,切换到flume目录下,执行flume命令:flume-ng agent -c conf -f conf/static.conf -n a1 -Dflume.root.logger=INFO,console

 

此时启动flume终端,成功启动时如下所示:

     

这时打开另一个终端界面使用telnet请求命令向6666端口发送请求信息:

   

此时flume终端界面产生定义的static_key和static_value日志信息:

 

Body Text Serializer

Body Text Serializer拦截器将把事件的body部分写入到输出流中而不需要任何转换或者修改。事件的header将直接被忽略。

#创建配置文件body_case15.conf

 

# 设置sources名称

a1.sources = r1

#设置sinks名称

a1.sinks = k1

#设置channels名称

a1.channels = c1

 

# sources拦截器

a1.sources.r1.type = http

a1.sources.r1.port = 50000

a1.sources.r1.host = 192.168.27.174          (根据实际情况进行设置)

a1.sources.r1.channels = c1

# sink配置

 

a1.sinks.k1.type = file_roll

a1.sinks.k1.channel = c1

a1.sinks.k1.sink.directory = /tmp/logs

a1.sinks.k1.sink.serializer = text

a1.sinks.k1.sink.serializer.appendNewline =false

 

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

执行flume-ng agent -c conf -f conf/body_case15.conf -n a1 -Dflume.root.logger=INFO,console,启动flume

 

启动成功后,打开另一个端口,往侦听端口送数据:

 telnet 192.168.27.174:50000

(这里的ip地址根据自己实际情况进行修改)

此时切换至/tmp/logs目录中查看拦截器过滤后的信息

hadoop@master:/tmp/logs$ ll

total 32

drwxrwxrwx  2 root    root    20480 Jul 31 20:14 ./

drwxrwxrwt 19 root    root     4096 Jul 31 20:14 ../

-rw-rw-r--  1 hadoop hadoop    14 Jul 31 19:48 1533082885173-294

 

hadoop@master:/tmp/logs$ cat 1533082885173-294

Hellolooklook

此时这里的日志结果为我们发送请求的body模块中指定的内容

 

Regex FilteringInterceptor

Regex Filtering Interceptor拦截器用于过滤事件,筛选出与配置的正则表达式相匹配的事件。可以用于包含事件和排除事件。常用于数据清洗,通过正则表达式把数据过滤出来

#创建配置文件:regex_filter_case19.conf

 

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = syslogtcp

a1.sources.r1.port = 50000

a1.sources.r1.host = 192.168.233.128

a1.sources.r1.channels = c1

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type =regex_filter

a1.sources.r1.interceptors.i1.regex =^[0-9]*$

a1.sources.r1.interceptors.i1.excludeEvents =true

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events inmemory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

 

这里定义的正则为 ^[0-9]*$,规则为对开头字母是数字的数据,全部过滤

 

下面启动flume终端

启动成功后,打开另一个终端输入,往侦听端口送数据a,1222,a222:

     telnet 192.168.233.128:50000

在启动flume发送的代理终端查看输出结果

从终端输出结果中可以看出,向50000端口发送的 ”a”,”1222”,”a222”三条数据中,经过拦截器的过滤,终端中输出结果中只有”a”与”a222”,可以看出”1222” 被认为是无效的数据没有传输出来。

通过Regex FilteringInterceptor我们也可以根据自己的需求来改写正则表达式匹配的内容,达到匹配任何我们想要的内容。

Regex extractor interceptor

   作用:

    ①将event body的内容和配置中指定的正则表达式进行匹配
如果内容匹配,将配合配置文件中给定的key, 组成key:value添加到event的header中
event body中的内容不会变化

 

      #regex_exter_case.conf

a1.sources = r1

a1.sinks = k1

a1.channels = c1

 

# 02 source,http,jsonhandler

a1.sources.r1.type = http

a1.sources.r1.bind = 127.0.0.1

a1.sources.r1.port = 6666

a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler

# 03 regex extractor interceptor,match event body to extract character and digitala1.sources.r1.interceptors = i1  a1.sources.r1.interceptors.i1.type = regex_extractora1.sources.r1.interceptors.i1.regex = (^[a-zA-Z]*)\\s([0-9]*$)  # regex匹配并进行分组,匹配结果将有两个部分, 注意\s空白字符要进行转义# specify key for 2 matched part

a1.sources.r1.interceptors.i1.serializers = s1 s2

# key name

a1.sources.r1.interceptors.i1.serializers.s1.name = word

a1.sources.r1.interceptors.i1.serializers.s2.name = digital

 

# logger sink

a1.sinks.k1.type = logger

 

# channel,memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

下面启动flume终端

启动成功后,打开另一个终端输入,往侦听端口送数据:

 telnet 127.0.0.1:6666

在启动flume发送的代理终端查看输出结果

 

你可能感兴趣的:(大数据)