Flink1.10入门:TableAPI自定义UDF实现WordCount

微信公众号:大数据开发运维架构

关注可了解更多大数据相关的资讯。问题或建议,请公众号留言;

如果您觉得“大数据开发运维架构”对你有帮助,欢迎转发朋友圈

从微信公众号拷贝过来,格式有些错乱,建议直接去公众号阅读


一、概述

    本篇文章作为Flink的TableAPI&SQL的入门案例,在TableAPI自定义UDF函数,继承了TableFunction()函数来实现WordCount单词统计,这里只做了简单的实现,让你对TableAPI&SQL有一个简单的认识。

二、代码实战

1.pom依赖,这里只贴了新引用的依赖:


  org.apache.flink
  flink-table-api-java-bridge_2.11
  1.10.0


  org.apache.flink
  flink-table-api-scala-bridge_2.11
  1.10.0


  org.apache.flink
  flink-table-common
  1.10.0


  org.apache.flink
  flink-table-planner_2.11
  1.10.0


  org.apache.flink
  flink-table-planner-blink_2.11
  1.10.0

2.自定义UDF函数,代码如下:

package com.hadoop.ljs.flink110.tableApi;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.table.functions.TableFunction;
import org.apache.flink.types.Row;
/**
 * @author: Created By lujisen
 * @company ChinaUnicom Software JiNan
 * @date: 2020-05-06 16:10
 * @version: v1.0
 * @description: com.hadoop.ljs.flink110.tableApi
 */
public class UDFWordCount  extends TableFunction {
    /*设置返回类型*/
    @Override
    public TypeInformation getResultType() {
        return Types.ROW(Types.STRING, Types.INT);
    }
    /*消息处理*/
    public void eval(String line){
        String[] wordSplit=line.split(",");
        for(int i=0;i

3.主函数,代码如下:

package com.hadoop.ljs.flink110.tableApi;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
/**
 * @author: Created By lujisen
 * @company ChinaUnicom Software JiNan
 * @date: 2020-05-06 15:55
 * @version: v1.0
 * @description: com.hadoop.ljs.flink110.tableApi
 */
public class WordCountByTableAPI {
    public static void main(String[] args) throws Exception {
        // FLINK STREAMING QUERY  tableAPI执行环境初始化
        EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
        StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);

        DataStream sourceDS = fsEnv.socketTextStream("localhost", 9000);
        Table table = fsTableEnv.fromDataStream(sourceDS,"inputLine");
        // 注册自定义UDF函数
        fsTableEnv.registerFunction("wordCountUDF", new UDFWordCount());

        Table wordCount= table.joinLateral("wordCountUDF(inputLine) as (word, countOne)")
                .groupBy("word")
                .select("word, countOne.sum as countN");
        /*table 转换DataStream*/
        DataStream> result = fsTableEnv.toRetractStream(wordCount, Types.ROW(Types.STRING, Types.INT));

        /*统计后会在 统计记录前面加一个true  false标识 这里你可以注释掉跑下看看 对比下*/
        result.filter(new FilterFunction>() {
            @Override
            public boolean filter(Tuple2 value) throws Exception {
                if(value.f0==false){
                    return false;
                }else{
                    return true;
                }
            }
        }).print();
        fsEnv.execute();
    }
}

4.函数测试:

Flink1.10入门:TableAPI自定义UDF实现WordCount_第1张图片

     后续我们会常用的几个自定义函数的抽象类和接口简单给大家介绍一下,这里只做一个入门程序。

    如果觉得我的文章能帮到您,请关注微信公众号“大数据开发运维架构”,并转发朋友圈,谢谢支持!

Flink1.10入门:TableAPI自定义UDF实现WordCount_第2张图片

 

你可能感兴趣的:(Flink)