Apache Flink 学习笔记（三）

本篇将演示如何用Table API 实现上一篇demo3的功能。上一篇传送门 Apache Flink 学习笔记（二）
Flink 中DataSet和DataStream 都能与Table 互转，每一种操作也都有相对应的 api

补充：使用Table API 以及下一章的SQL，请添加以下依赖项



  org.apache.flink
  flink-table_2.11
  1.6.0




  org.apache.flink
  flink-scala_2.11
  1.6.0




  org.apache.flink
  flink-streaming-scala_2.11
  1.6.0

首先我把 pojo Bean3 抽离出来作为公用，使用pojo记住这四点

pojo 必须声明为public，如果是内部类必须是static的

必须为pojo创建一个无参的构造函数

必须声明pojo的字段为public，或者生成public的get，set方法

必须使用Flink 支持的数据类型

import java.io.Serializable;

/**
 * pojo
 */
public class Bean3 implements Serializable{
    public Long timestamp;
    public String appId;
    public String module;

    public Bean3() {
    }

    public Bean3(Long timestamp, String appId, String module) {
        this.timestamp = timestamp;
        this.appId = appId;
        this.module = module;
    }

    public long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    public String getAppId() {
        return appId;
    }

    public void setAppId(String appId) {
        this.appId = appId;
    }

    public String getModule() {
        return module;
    }

    public void setModule(String module) {
        this.module = module;
    }

    @Override
    public String toString() {
        return "Bean3{" +
                "timestamp=" + timestamp +
                ", appId='" + appId + '\'' +
                ", module='" + module + '\'' +
                '}';
    }
}

demo5 代码部分

import com.alibaba.fastjson.JSONObject;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.table.api.java.Tumble;
import org.apache.flink.types.Row;
import org.apache.flink.util.Collector;

import java.util.Date;

/**
 * Table API
 */
public class Demo5 {
    private static final String APP_NAME = "app_name";

    public static void main(String[] args) {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.getConfig().enableSysoutLogging();
        env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime); //设置窗口的时间单位为process time
        env.setParallelism(1);//全局并发数

        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "kafka bootstrap.servers");
        //设置topic和 app name
        //FlinkKafkaManager 源码见笔记二
        FlinkKafkaManager manager = new FlinkKafkaManager("kafka.topic", APP_NAME, properties);
        FlinkKafkaConsumer09 consumer = manager.build(JSONObject.class);
        consumer.setStartFromLatest();

        //获取DataStream，并转成Bean3
        DataStream stream = env.addSource(consumer).map(new FlatMap());

        final StreamTableEnvironment tableEnvironment = StreamTableEnvironment.getTableEnvironment(env);
        //timestamp,appId,module 是pojo的字段名，最后的tt是随意指定的扩展字段，.proctime用来标识process time
        Table table = tableEnvironment.fromDataStream(stream, "timestamp,appId,module,tt.proctime");
        tableEnvironment.registerTable("common", table);//注册表名

        //或者使用 registerDataStream
        //tableEnvironment.registerDataStream("common", stream, "timestamp,appId,module,tt.proctime");//注册表名

        Table query =
                tableEnvironment
                        .scan("common") //等价from
                        .window(Tumble.over("10.seconds").on("tt").as("dd"))// 每10s执行一次，必须要取别名，且不能和tt相同，这里还没有搞清楚原理
                        .groupBy("dd,appId")//必须要用window找那个指定的dd别名聚合
                        .select("appId,COUNT(module) as totals") //COUNT(module)也可以写成 module.count
                        .where("appId == '100007336' || appId == '100013668'"); //等价于 filter(); 用or 报错。奇葩的是用=，==，=== 都能通过

        DataStream result = tableEnvironment.toAppendStream(query, Row.class);
        result.process(new ProcessFunction() {
            @Override
            public void processElement(Row value, Context ctx, Collector

Apache Flink 学习笔记（三）

你可能感兴趣的:(Apache Flink 学习笔记（三）)