flume中3大高级组件,Flume Interceptors:拦截器,Flume Channel Selectors :选择器 ,Flume Sink Processors(sink的处理器)

flume中3大高级组件

Flume Interceptors:拦截器,与Spring中拦截器是类似

功能:通过拦截器对每条数据进行过滤护着包装

Timestamp Interceptor:时间拦截器

在每一个event的头部添加一个KeyValue

       key: timestamp

       value:当前封装event的时间


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=timestamp






#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




 

Host Interceptor:主机名拦截器

       在每一个event的头部添加一个KeyValue

       key: host

       value:当前封装event所在机器的主机名


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type  =  host 
a1.sources.s1.interceptors.i1.hostHeader  =  hostname




#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




 

 

Static Interceptor:自定义拦截器


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=static
a1.sources.s1.interceptors.i1.key=tttt
a1.sources.s1.interceptors.i1.value=sgl








#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




 

Regex Filtering Interceptor通过自定义正则表达式,实现对数据过滤

       符号该正则,该条数据才会被留下

 

作业:通过时间拦截器和正则拦截器对数据进行过滤

1,2,3,4

{4,5,6,7}

4,7,8,8

{5,3,2,2}

只采集带括号的行,

 


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 
a1.sinks = k1 

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.interceptors=i1 i2
a1.sources.s1.interceptors.i1.type=timestamp
a1.sources.s1.interceptors.i2.type=regex_filter
a1.sources.s1.interceptors.i2.regex=\\{.*\\} 




#define channel  
a1.channels.c1.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#defined sinks
a1.sinks.k1.type=logger

#bond
a1.sources.s1.channels = c1 
a1.sinks.k1.channel = c1




Flume Channel Selectors

selector.type根据该值确定功能

Replicating Channel Selector(默认)

       source将每条数据发给每一个channel

       source将数据发了多份


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 c2 c3 
a1.sinks = k1 k2 k3

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.optional = c3




#define channel  
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#define sinks
a1.sinks.k1.type = logger
a1.sinks.k2.type = logger
a1.sinks.k3.type = logger


#bond

a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3




Multiplexing Channel Selector

       source选择性的将数据发送给channel


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per a1, 
# in this case called 'a1'

a1.sources = s1
a1.channels = c1 c2 c3 
a1.sinks = k1 k2 k3

# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c

a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.type  =  multiplexing 
a1.sources.s1.selector.header  =  state 
a1.sources.s1.selector.mapping.CZ  =  c1 
a1.sources.s1.selector.mapping.US  =  c2 
a1.sources.s1.selector.default  =  c3




#define channel  
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory

#设置检查点,  记录相关传输的信息,比如取了多少event

#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100


#define sinks

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=/flume/selector1/hhh1
a1.sinks.k1.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text

a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path=/flume/selector1/hhh2
a1.sinks.k2.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k2.hdfs.fileType=DataStream
a1.sinks.k2.hdfs.writeFormat=Text


a1.sinks.k3.type = hdfs
a1.sinks.k3.hdfs.path=/flume/selector1/hhh3
a1.sinks.k3.hdfs.useLocalTimeStamp=true

#设置文件类型和写的格式
a1.sinks.k3.hdfs.fileType=DataStream
a1.sinks.k3.hdfs.writeFormat=Text


#bond

a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3




Flume Sink Processors(sink的处理器)

Failover Sink Processor(故障转移)

processor.type= failover

启动了多个,但是工作的只有一个,只有active状态进程死掉,其他才可能接替工作。

那么多个有多个sink到底谁先工作,根据权重来,谁的权重高,谁先干活

一般故障转移的话,2sink的类型不一样(HDFS sink ,file sink

      比如往HDFS写数据,HDFS宕机了,数据不丢失,往文件里写

Load balancing Sink Processor(负载均衡)

processor.type=load_balance

processor.selector = round_robin(轮询)|random(随机)

 

 

负载均衡与故障转移,只能实现一个,不能同时实现,往往选择负载均衡

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(flume)