Flume之自定义Sink

Flume之自定义Sink

1)介绍

​ Sink不断地轮询Channel中的事件且批量地移除它们,并将这些事件批量写入到存储或索引系统、或

者被发送到另一个Flume Agent。

​ Sink是完全事务性的。在从Channel批量删除数据之前,每个Sink用Channel启动一个事务。批量事

件一旦成功写出到存储系统或下一个Flume Agent,Sink就利用Channel提交事务。事务一旦被提

交,该Channel从自己的内部缓冲区删除事件。

​ Sink组件目的地包括hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定义。官方提供

的Sink类型已经很多,但是有时候并不能满足实际开发当中的需求,此时我们就需要根据实际需求自

定义某些Sink。

​ 官方也提供了自定义sink的接口:

​ https://flume.apache.org/FlumeDeveloperGuide.html#sink根据官方说明自定义MySink需要继承

AbstractSink类并实现Configurable接口。

​ 实现相应方法:

​ configure(Context context)//初始化context(读取配置文件内容)

​ process()//从Channel读取获取数据(event),这个方法将被循环调用。

​ 使用场景:读取Channel数据写入MySQL或者其他文件系统。

2)需求

​ 使用flume接收数据,并在Sink端给每条数据添加前缀和后缀,输出到控制台。前后缀可在flume任务

配置文件中配置。

流程分析:

Flume之自定义Sink_第1张图片

3)编码

package com.xiaoxiao.flume.sink;

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.concurrent.TimeUnit;


/**
 * 自定义Sink,需要继承Flume提供的AbstractSink类,实现Configurable接口
 * @author xiaohu
 * @create 2020-08-25 22:36
 */
public class MySink extends AbstractSink implements Configurable {

    //定义Logger对象
      Logger logger = LoggerFactory.getLogger(MySink.class);

    /**
     * Sink的核心处理方法
     *
     * 在flume的内部会循环调用process方法
     * @return
     * @throws EventDeliveryException
     */
    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;
        //获取channel对象
        Channel channel = getChannel();
        //获取事务对象
        Transaction transaction = channel.getTransaction();
        try {
            //开启事务
            transaction.begin();

            //take数据
            Event event ;
            while(true){
                event = channel.take();
                if(event != null){
                    break;
                }
                //没有take到数据,休息一会
                TimeUnit.SECONDS.sleep(1);
            }
            //处理event
            processEvent(event);

            //提交事务
            transaction.commit();
            //正常处理
            status = Status.READY;
        } catch (Throwable t) {
            //回滚事务
            transaction.rollback();
            //出现问题
            status = Status.BACKOFF;
        }finally {
            transaction.close();
        }
        return status;
    }

    /**
     * 对event的处理
     *
     * 需求:使用Logger的方式打印到控制台
     * @param event
     */
    public void processEvent(Event event) {
        logger.info(new String(event.getBody()));
    }

    @Override
    public void configure(Context context) {

    }
}

4)测试

(1)打包

将写好的代码打包,并放到flume的lib目录(/opt/module/flume)下。

(2)配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = com.xiaoxiao.flume.source.MySource
a1.sources.r1.prefix = LOG

# Describe the sink
a1.sinks.k1.type = com.xiaoxiao.flume.sink.MySink

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

(3)开启任务

[xiao @hadoop102 flume-1.9.0]$ bin/flume-ng agent -c conf -f datas/mysource-flume-mysink.conf -n a1 -Dflume.root.logger=INFO,console

(4)结果展示

2020-08-27 18:57:52,395 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.xiaoxiao.flume.sink.MySink.processEvent(MySink.java:75)] LOG--d1662360-6cd7-4f14-9e42-a9ef7db8e426
2020-08-27 18:57:53,397 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.xiaoxiao.flume.sink.MySink.processEvent(MySink.java:75)] LOG--49665808-3047-4f0f-855a-66442e74cef7
2020-08-27 18:57:54,397 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.xiaoxiao.flume.sink.MySink.processEvent(MySink.java:75)] LOG--0d54bb41-ab17-499c-96de-5a74d660bd73
2020-08-27 18:57:55,398 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.xiaoxiao.flume.sink.MySink.processEvent(MySink.java:75)] LOG--58570043-2e20-4717-b9db-18d341edc970
2020-08-27 18:57:56,399 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.xiaoxiao.flume.sink.MySink.processEvent(MySink.java:75)] LOG--c0d5ce4e-50e1-41cb-a690-09bba38711d9
2020-08-27 18:57:57,404 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.xiaoxiao.flume.sink.MySink.processEvent(MySink.java:75)] LOG--cd20e263-282a-4d4b-9212-8e98efda0246

你可能感兴趣的:(Flume,flume)