Flink Sql UDF计算TP90

文章目录

    • 需求:
    • 思路:
    • 优化:

需求:

我们要统计网站一分钟之内的的响应时间的TP90,正常的处理逻辑就是把这一分钟之内所有的网站的响应时间从小到大排序,然后计算出总条数count,然后计算出排名在90%的响应时间是多少(count*0.9),就是我们要的值。
计算 TP50、TP90、TP99、TP999

思路:

创建一个List,没来一条数据放入list里面,窗口时间到之后,在getValue方法里面对list排序,取出相应位置的TP值。
缺点:窗口时间长、数据量大。导致list里面存储大量数据,会造成checkpoint过大,时间过长。另外有oom的风险,导致程序失败!

优化:

有序List可以用TreeMap代替,key就存指标,比如响应时间。value就存对应出现的次数。这样响应时间相同的数据就可以聚合起来。在getValue方法里面,把value累加就能得到count,然后计算tp相应的位置ceil(postion)。然后从头累加value(次数),当value大于等于postion时,对应的key即为所求。

  • 代码:
package flink_sql.udf;

import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.table.functions.AggregateFunction;

import java.time.ZoneId;
import java.util.HashMap;
import java.util.Map;
import java.util.TreeMap;

/**
 * @author lzx
 * @date 2023/6/16 15:23
 * @description: TODO TP90测试案例 基于flink 1.14
 */
public class TP90 {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.setInteger("rest.port", 8085);

        // 可以基于现有的 StreamExecutionEnvironment 创建 StreamTableEnvironment 来与 DataStream API 进行相互转换
        StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
        StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);

        // 指定国内时区
        tEnv.getConfig().setLocalTimeZone(ZoneId.of("Asia/Shanghai"));

        // 创建输入表
        String sql = " CREATE TABLE source ( " +
                "  response_time INT, " +
                "  ts AS localtimestamp, " +
                "  WATERMARK FOR ts AS ts," +
                "  proctime as proctime() " +
                " ) WITH ( " +
                "  'connector' = 'datagen', " +
                "  'rows-per-second'='1000', " +
                "  'fields.response_time.min'='1', " +
                "  'fields.response_time.max'='1000' " +
                " ) ";

        tEnv.executeSql(sql);

        // tEnv.executeSql("select * from source").print();

        tEnv.createTemporaryFunction("mytp", CustomTpFunc.class);

        String selectSql = "   select    " +
                "           TUMBLE_START(proctime,INTERVAL '1' MINUTE)  as starttime, " +
                "           mytp(response_time,90) as tp90    " +
                "   from source   " +
                "   group by TUMBLE(proctime,INTERVAL '1' MINUTE) ";

        tEnv.executeSql(selectSql).print();
        // env.execute();

    }

   public static class TpAccu {
        public Integer tp;
        public Map<Integer, Integer> map = new HashMap<>();
    }

   public static class CustomTpFunc extends AggregateFunction<Integer, TpAccu> {

        @Override
        public TpAccu createAccumulator() {
            return new TpAccu();
        }

        @Override
        public Integer getValue(TpAccu tpAccu) {
            if (tpAccu.map.size() == 0) {
                return null;
            } else {
                TreeMap<Integer, Integer> treeMap = new TreeMap<>(tpAccu.map); //排序
                Integer sum = treeMap.values().stream().reduce(0, Integer::sum);
                int tp = tpAccu.tp;
                int responseTime = 0;
                int p = 0; //位置
                int pos = (int) Math.ceil(sum * (tp / 100D));
                for (Map.Entry<Integer, Integer> entry : treeMap.entrySet()) {
                    p += entry.getValue();
                    if (p >= pos) {
                        responseTime = entry.getKey();
                        break;
                    }
                }
                return responseTime;
            }
        }

        public void accumulate(TpAccu acc,Integer iValue,Integer tp){
            acc.tp = tp;
            if (acc.map.containsKey(iValue)) {
                acc.map.put(iValue,acc.map.get(iValue) + 1);
            }else {
                acc.map.put(iValue,1);
            }
        }
    }
}

你可能感兴趣的:(Flink,sql,flink,java)