flume 拦截器简单实现

1.创建maven工程  自定义拦截器类

        1.导入依赖

        


        
            org.apache.flume
            flume-ng-core
            1.9.0
        

    

        

package myInterceptor;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.util.List;

/**
 * @author jsh
 * @Description:
 * @create 2021-11-14-21:40
 */
public class CustomInterceptor implements Interceptor {
    @Override
    public void initialize() {

    }
    // 单个event拦截规则
    @Override
    public Event intercept(Event event) {
        byte[] body = event.getBody();
        if (body[0] < 'z' && body[0] > 'a'){
            event.getHeaders().put("type","letter");
        }else if (body[0] > '0' && body[0] < '9'){
            event.getHeaders().put("type","number");
        }
        return event;
    }
    // 批量处理event 其他两个方法可以不用实现
    @Override
    public List intercept(List list) {
        for (Event event : list) {
            intercept(event);
        }
        return list;
    }
    public static class Builder implements Interceptor.Builder{

        @Override
        public Interceptor build() {
            return new CustomInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }

    @Override
    public void close() {

    }
}

2.将该类打成jar包

flume 拦截器简单实现_第1张图片

3.将jar包放在flume的lib目录下

4.书写自定义配置文件

H.案例需求:多路
  hadoop101 的flume1 监控 8888端口的数据 并将已字母开头的发给Hadoop104,数字开头的发送给Hadoop103
  flume2,flume3打印到控制台
        flume   sources   channels  sinks
  分析: flume1  netcat    memory    avro
        flume2   avro     memory    logger
        flume3   avro     memory    logger
  flume3.conf        
  #name
  a3.sources = r1
  a3.channels = c1
  a3.sinks = k1

  #sources
  a3.sources.r1.type = avro
  a3.sources.r1.bind = hadoop104
  a3.sources.r1.port = 8888

  #channels
  a3.channels.c1.type =memory
  a3.channels.c1.capacity = 10000
  a3.channels.c1.transactionCapacity = 100

  #sinks
  a3.sinks.k1.type = logger

  #bind
  a3.sources.r1.channels = c1
  a3.sinks.k1.channel = c1

  flume2.conf
  #name
  a2.sources = r1
  a2.channels = c1
  a2.sinks = k1

  #sources
  a2.sources.r1.type = avro
  a2.sources.r1.bind = hadoop103
  a2.sources.r1.port = 7777

  #channels
  a2.channels.c1.type =memory
  a2.channels.c1.capacity = 10000
  a2.channels.c1.transactionCapacity = 100

  #sinks
  a2.sinks.k1.type = logger

  #bind
  a2.sources.r1.channels = c1
  a2.sinks.k1.channel = c1

  flume1.conf
  #name
  a1.sources = r1
  a1.channels = c1 c2
  a1.sinks = k1 k2

  #sources
  a1.sources.r1.type = netcat
  a1.sources.r1.bind = localhost
  a1.sources.r1.port = 6666

  # Interceptor
  a1.sources.r1.interceptors = i1
  # myInterceptor.CustomInterceptor 为类的引用路径 Builder 自定义拦截类的内部类
  a1.sources.r1.interceptors.i1.type = myInterceptor.CustomInterceptor$Builder

  #channel selector
  a1.sources.r1.selector.type = multiplexing
  a1.sources.r1.selector.header = type
  a1.sources.r1.selector.mapping.letter = c1
  a1.sources.r1.selector.mapping.number = c2

  #channels
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 10000
  a1.channels.c1.transactionCapacity = 100 

  a1.channels.c2.type = memory
  a1.channels.c2.capacity = 10000
  a1.channels.c2.transactionCapacity = 100

  #sink
  a1.sinks.k1.type = avro
  a1.sinks.k1.hostname = hadoop104
  a1.sinks.k1.port = 8888

  a1.sinks.k2.type = avro
  a1.sinks.k2.hostname = hadoop103
  a1.sinks.k2.port = 7777

  #bind
  a1.sources.r1.channels = c1 c2
  a1.sinks.k1.channel = c1
  a1.sinks.k2.channel = c2      

  启动3:
  flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/jobs/multiplexing/flume3.conf -n a3 -Dflume.root.logger=INFO,console
  启动2:
  flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/jobs/multiplexing/flume2.conf -n a2 -Dflume.root.logger=INFO,console
  启动1:
  flume-ng agent -c $FLUME_HOME/conf -f $FLUME_HOME/jobs/multiplexing/flume1.conf -n a1 -Dflume.root.logger=INFO,console

你可能感兴趣的:(大数据,flume,maven,interceptor)