Flink批量写入HBase效率问题

Flink批量写入HBase效率问题

标题 HBase批量写入单次提交数据量大小问题

生产环境写入HBase体验:
如果写入HBase效率瓶颈在连接HBase费时上,keyBy(字段).countWindow(50000).apply(new …)比任何复杂的自定义时间窗口效率都高,太有感触了T T。

DataStream> sourceList = dataStream.timeWindowAll(Time.seconds(2)).apply(new OrderToHBaseFunctionOutPutFormat()).name("format");
        sourceList.writeUsingOutputFormat(new BulkPutHBaseOutputFormat() {
            @Override
            public String getTableName() {
                return "ORDER_DOC_SN_TEST";
            }
            @Override
            public String getColumnFamily() {
                return "CFD";
            }
            @Override
            public List writeList(List list) {
                return list;
            }
        }).name("write");

如果消费的数据量比较大的时候,写入HBase可能会出现阻塞,如:waiting for 2001 actions to finish on table: ORDER_DOC_SN_TEST,从而影响实时计算的效率。
单从代码层面来讲,它是跟HBase配置hbase.client.write.buffer大小有关系。默认是2097152=2M,可以通过该参数调整单次提交写入HBase数据量,一般建议设置2M到6M。


hbase.client.write.buffer
2097152
hbase-default.xml

可以通过自定义触发器Trigger,实现按照窗口中数据条数或者窗口时间触发后续操作。
trigger 接口有5个方法如下:
1.onElement()方法,添加到每个窗口的元素都会调用此方法
2.onEventTime()方法,当注册的事件时间计时器触发时,将调用此方法
3.onProcessingTime()方法,当注册的处理时间计时器触发时,将调用此方法
4.onMerge()方法,与状态性触发器相关,当使用会话窗口时,两个触发器对应的窗口合并时,合并两个触发器的状态。
5.clear()方法执行任何需要清除的相应窗口

TriggerResult返回的操作:
1.CONTINUE:什么也不做
2.FIRE:触发计算
3.PURGE:清除窗口中的数据
4.FIRE_AND_PURGE:触发计算并清除窗口中的数据
需求:
1.窗口内每500条记录写入一次HBase
2.窗口内数据不足500条,窗口时间触发写入HBase

public class SinkOrderToHBaseTestJob {


    public static void main(String[] args) throws Exception{

        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.enableCheckpointing(1000*60*10, CheckpointingMode.EXACTLY_ONCE);
        
        Properties prop = new Properties();
        prop.load(SinkOrderToHBaseTestJob.class.getClassLoader().getResourceAsStream("config.properties"));

        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", prop.getProperty("kafka.bootstrap.servers"));
        properties.setProperty("group.id", "SinkOrderToHBaseTestJob");

        //设置IngestionTime作为水位时间
        env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
        //flink连接kafka
        FlinkKafkaConsumer consumer = new FlinkKafkaConsumer("order_correct",new CustomOrderSourceYundaDeSerializationSchema(),properties);
        consumer.setStartFromEarliest();

        DataStream dataStream = env.addSource(consumer);

        //在窗口内每500条数据触发一次
        DataStream> sourceList = dataStream.timeWindowAll(Time.seconds(5)).trigger(OrderTrigger .create(500)).apply(new OrderToHBaseFunctionOutPutFormat()).name("format");
        sourceList.writeUsingOutputFormat(new BulkPutHBaseOutputFormat() {

            @Override
            public String getTableName() {
                return "ORDER_DOC_SN_TEST";
            }

            @Override
            public String getColumnFamily() {
                return "CFD";
            }

            @Override
            public List writeList(List list) {
   //             System.out.println("写入到hbase数据量:"+list.size());
                return list;
            }
        }).name("write");

        env.execute("SinkOrderToHBaseTestJob");
    }
}
class OrderTrigger extends Trigger {

    private static final long serivalVersionUID = 1L;
    public CustomTrigger(){}

    private static int flag = 0;

    public static int threshold = 0;

    @Override
    public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
        ctx.registerEventTimeTimer(window.maxTimestamp());
        flag++;
        if(flag>=threshold){
            flag = 0;
            ctx.deleteProcessingTimeTimer(window.maxTimestamp());
            return TriggerResult.FIRE_AND_PURGE;
        }
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
        if(flag>0){
            System.out.println("到达窗口时间执行触发:"+flag);
            flag = 0;
            return TriggerResult.FIRE_AND_PURGE;
        }
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
        if(time>=window.maxTimestamp()&&flag>0){
            System.out.println("到达时间窗口且有数据,触发操作!");
            flag=0;
            return TriggerResult.FIRE_AND_PURGE;
        }else if(time>=window.maxTimestamp()&&flag==0){
            //清除窗口但不触发
            return TriggerResult.PURGE;
        }
        return TriggerResult.CONTINUE;
    }

    @Override
    public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
        ctx.deleteProcessingTimeTimer(window.maxTimestamp());
        ctx.deleteEventTimeTimer(window.maxTimestamp());
    }


    public static CustomTrigger create(int value){
        threshold = value;
        return new CustomTrigger();
    }
}

你可能感兴趣的:(Flink批量写入HBase效率问题)