功能:通过拦截器对每条数据进行过滤护着包装
在每一个event的头部添加一个KeyValue
key: timestamp
value:当前封装event的时间
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c
a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=timestamp
#define channel
a1.channels.c1.type=memory
#设置检查点, 记录相关传输的信息,比如取了多少event
#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100
#defined sinks
a1.sinks.k1.type=logger
#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
在每一个event的头部添加一个KeyValue
key: host
value:当前封装event所在机器的主机名
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c
a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type = host
a1.sources.s1.interceptors.i1.hostHeader = hostname
#define channel
a1.channels.c1.type=memory
#设置检查点, 记录相关传输的信息,比如取了多少event
#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100
#defined sinks
a1.sinks.k1.type=logger
#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c
a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=static
a1.sources.s1.interceptors.i1.key=tttt
a1.sources.s1.interceptors.i1.value=sgl
#define channel
a1.channels.c1.type=memory
#设置检查点, 记录相关传输的信息,比如取了多少event
#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100
#defined sinks
a1.sinks.k1.type=logger
#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
符号该正则,该条数据才会被留下
作业:通过时间拦截器和正则拦截器对数据进行过滤
1,2,3,4
{4,5,6,7}
4,7,8,8
{5,3,2,2}
只采集带括号的行,
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c
a1.sources.s1.interceptors=i1 i2
a1.sources.s1.interceptors.i1.type=timestamp
a1.sources.s1.interceptors.i2.type=regex_filter
a1.sources.s1.interceptors.i2.regex=\\{.*\\}
#define channel
a1.channels.c1.type=memory
#设置检查点, 记录相关传输的信息,比如取了多少event
#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100
#defined sinks
a1.sinks.k1.type=logger
#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
selector.type根据该值确定功能
source将每条数据发给每一个channel
source将数据发了多份
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1 c2 c3
a1.sinks = k1 k2 k3
# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c
a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.optional = c3
#define channel
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory
#设置检查点, 记录相关传输的信息,比如取了多少event
#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100
#define sinks
a1.sinks.k1.type = logger
a1.sinks.k2.type = logger
a1.sinks.k3.type = logger
#bond
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3
source选择性的将数据发送给channel
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1 c2 c3
a1.sinks = k1 k2 k3
# defined sources
#如果是自己编译的类,这里写类的全路径
a1.sources.s1.type=exec
a1.sources.s1.command= tail -F /opt/datas/wordcount
a1.sources.s1.shell=/bin/sh -c
a1.sources.s1.channels = c1 c2 c3
a1.sources.s1.selector.type = multiplexing
a1.sources.s1.selector.header = state
a1.sources.s1.selector.mapping.CZ = c1
a1.sources.s1.selector.mapping.US = c2
a1.sources.s1.selector.default = c3
#define channel
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.channels.c3.type=memory
#设置检查点, 记录相关传输的信息,比如取了多少event
#容量
a1.channels.c1.capacity=1000
#瓶口大小
a1.channels.c1.transactionCapacity=100
#define sinks
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=/flume/selector1/hhh1
a1.sinks.k1.hdfs.useLocalTimeStamp=true
#设置文件类型和写的格式
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path=/flume/selector1/hhh2
a1.sinks.k2.hdfs.useLocalTimeStamp=true
#设置文件类型和写的格式
a1.sinks.k2.hdfs.fileType=DataStream
a1.sinks.k2.hdfs.writeFormat=Text
a1.sinks.k3.type = hdfs
a1.sinks.k3.hdfs.path=/flume/selector1/hhh3
a1.sinks.k3.hdfs.useLocalTimeStamp=true
#设置文件类型和写的格式
a1.sinks.k3.hdfs.fileType=DataStream
a1.sinks.k3.hdfs.writeFormat=Text
#bond
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3
processor.type= failover
启动了多个,但是工作的只有一个,只有active状态进程死掉,其他才可能接替工作。
那么多个有多个sink到底谁先工作,根据权重来,谁的权重高,谁先干活
一般故障转移的话,2个sink的类型不一样(HDFS sink ,file sink)
比如往HDFS写数据,HDFS宕机了,数据不丢失,往文件里写
processor.type=load_balance
processor.selector = round_robin(轮询)|random(随机)
负载均衡与故障转移,只能实现一个,不能同时实现,往往选择负载均衡