本文将持续更新。。。
详细见官网:计划器对比
// create a TableEnvironment for specific planner batch or streaming
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section
// create an input Table
tableEnv.executeSql("CREATE TEMPORARY TABLE table1 ... WITH ( 'connector' = ... )");
// register an output Table
tableEnv.executeSql("CREATE TEMPORARY TABLE outputTable ... WITH ( 'connector' = ... )");
// create a Table object from a Table API query
Table table2 = tableEnv.from("table1").select(...);
// create a Table object from a SQL query
Table table3 = tableEnv.sqlQuery("SELECT ... FROM table1 ... ");
// emit a Table API result Table to a TableSink, same for SQL result
TableResult tableResult = table2.executeInsert("outputTable");
tableResult...
TableEnvironment 是 Table API 和 SQL 的核心概念。它负责:
Table 总是与特定的 TableEnvironment 绑定。不能在同一条查询中使用不同 TableEnvironment 中的表,例如,对它们进行 join 或 union 操作。
TableEnvironment 可以通过静态方法 BatchTableEnvironment.create() 或者 StreamTableEnvironment.create() 在 StreamExecutionEnvironment 或者 ExecutionEnvironment 中创建,TableConfig 是可选项。TableConfig可用于配置TableEnvironment或定制的查询优化和转换过程(参见 查询优化)。
// **********************
// FLINK STREAMING QUERY
// **********************
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);
// or TableEnvironment fsTableEnv = TableEnvironment.create(fsSettings);
// ******************
// FLINK BATCH QUERY
// ******************
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.BatchTableEnvironment;
ExecutionEnvironment fbEnv = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment fbTableEnv = BatchTableEnvironment.create(fbEnv);
// **********************
// BLINK STREAMING QUERY -- 常用的流式环境创建
// **********************
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(bsEnv, bsSettings);
// or TableEnvironment bsTableEnv = TableEnvironment.create(bsSettings);
// ******************
// BLINK BATCH QUERY
// ******************
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;
EnvironmentSettings bbSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
TableEnvironment bbTableEnv = TableEnvironment.create(bbSettings);
临时表与单个 Flink 会话(session)的生命周期相关,临时表通常保存于内存中并且仅在创建它们的 Flink 会话持续期间存在。这些表对于其它会话是不可见的。它们不与任何 catalog 或者数据库绑定但可以在一个命名空间(namespace)中创建。即使它们对应的数据库被删除,临时表也不会被删除。
// 创建环境
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section
// 查询结果
Table projTable = tableEnv.from("X").select(...);
// 创建临时表
tableEnv.createTemporaryView("projectedTable", projTable);
永久表的创建需要 catalog(例如 Hive Metastore),以维护表的元数据。一旦永久表被创建,它将对任何连接到 catalog 的 Flink 会话可见且持续存在,直至被明确删除。
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.SqlDialect;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.catalog.hive.HiveCatalog;
public class hive_catalog{
public static void main(String[] args) {
EnvironmentSettings settings = EnvironmentSettings.newInstance()
.inStreamingMode()
.useBlinkPlanner()
.build();
TableEnvironment tableEnv = TableEnvironment.create(settings);
String name = "myhive";
String defaultDatabase = "default";
String hiveConfDir = "/xxx/hive-conf"; // hive的配置文件路径
// 创建HiveCatalog
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
// 注册HiveCatalog
tableEnv.registerCatalog("myhive", hive);
// 使用 HiveCatalog
tableEnv.useCatalog("myhive");
// 指定数据库
tableEnv.useDatabase(defaultDatabase);
// 配置 hive 方言 - 语法使用hive sql语法
tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE);
// 配置 default 方言 - 语法使用 flink sql语法
tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT);
// 创建表
tableEnv.executeSql("create table table_name (name string, age int) with('connector' = 'print')");
}
}
// scan registered Orders table
Table orders = tableEnv.from("Orders");
// compute revenue for all customers from France
Table revenue = orders
.filter($("cCountry").isEqual("FRANCE"))
.groupBy($("cID"), $("cName"))
.select($("cID"), $("cName"), $("revenue").sum().as("revSum"));
---->
SQL语句:
Table revenue = tableEnv.sqlQuery(
"SELECT cID, cName, SUM(revenue) AS revSum " +
"FROM Orders " +
"WHERE cCountry = 'FRANCE' " +
"GROUP BY cID, cName"
);
Table 通过写入 TableSink 输出。TableSink 是一个通用接口,用于支持多种文件格式(如 CSV、Apache Parquet、Apache Avro)、存储系统(如 JDBC、Apache HBase、Apache Cassandra、Elasticsearch)或消息队列系统(如 Apache Kafka、RabbitMQ)。
批处理 Table 只能写入 BatchTableSink,而流处理 Table 需要指定写入 AppendStreamTableSink,RetractStreamTableSink 或者 UpsertStreamTableSink。
方法 Table.executeInsert(String tableName) 将 Table 发送至已注册的 TableSink。该方法通过名称在 catalog 中查找 TableSink 并确认Table schema 和 TableSink schema 一致。
// get a TableEnvironment
TableEnvironment tableEnv = ...; // see "Create a TableEnvironment" section
// create an output Table
final Schema schema = new Schema()
.field("a", DataTypes.INT())
.field("b", DataTypes.STRING())
.field("c", DataTypes.BIGINT());
tableEnv.connect(new FileSystem().path("/path/to/file"))
.withFormat(new Csv().fieldDelimiter('|').deriveSchema())
.withSchema(schema)
.createTemporaryTable("CsvSinkTable");
// compute a result Table using Table API operators and/or SQL queries
Table result = ...
// emit the result Table to the registered TableSink
result.executeInsert("CsvSinkTable");
<!-- flink 依赖包 -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.13.5</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>1.13.5</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>1.13.5</version>
</dependency>
<!-- flink table api 依赖包 -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_2.11</artifactId>
<version>1.13.5</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_2.11</artifactId>
<version>1.13.5</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<version>1.13.5</version>
</dependency>
<!-- 其他依赖包 -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.79</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.14.1</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.14.1</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.14.1</version>
<scope>runtime</scope>
</dependency>
package com.ali.flink.demo.driver;
import com.ali.flink.demo.bean.Event;
import com.ali.flink.demo.utils.DataGeneratorImpl003;
import com.ali.flink.demo.utils.FlinkEnv;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.datagen.DataGeneratorSource;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import static org.apache.flink.table.api.Expressions.$;
public class FlinkTableApiDemo001 {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = FlinkEnv.FlinkDataStreamRunEnv();
env.setParallelism(1);
StreamTableEnvironment tableEnv = FlinkEnv.getStreamTableEnv(env);
DataGeneratorSource<String> dataGeneratorSource = new DataGeneratorSource<>(new DataGeneratorImpl003());
DataStream<String> sourceStream = env.addSource(dataGeneratorSource).returns(String.class);
// sourceStream.print("source");
DataStream<Event> mapStream = sourceStream.map(new MapFunction<String, Event>() {
@Override
public Event map(String s) throws Exception {
JSONObject jsonObject = JSON.parseObject(s);
String name = jsonObject.getString("name");
JSONObject title = jsonObject.getJSONObject("title");
String title_name = title.getString("title_name");
int title_number = title.getIntValue("title_number");
JSONArray user_info = jsonObject.getJSONArray("user_info");
String address = user_info.getJSONObject(0).getString("address");
JSONObject time_info = jsonObject.getJSONObject("time_info");
long timestamp = time_info.getLongValue("timestamp");
return new Event(name, title.toJSONString(), title_name, title_number, user_info.toJSONString(), address, time_info.toJSONString(), timestamp);
}
}).returns(Event.class);
mapStream.print("map source");
// 将 DataStream 转换为 Table
Table sourceTable = tableEnv.fromDataStream(mapStream, $("name").as("username"), $("title"), $("title_name"), $("title_number")
, $("user_info"), $("address"), $("time_info"), $("timestamp"));
//
// // 注册临时表
tableEnv.createTemporaryView("source_table", sourceTable);
// sql语句
String sql = "select\n" +
"username\n" +
", title\n" +
", title_name\n" +
", title_number\n" +
", user_info\n" +
", address\n" +
", time_info\n" +
", `timestamp`\n" +
"from source_table";
// 执行sql
Table result = tableEnv.sqlQuery(sql);
// 输出 将 Table 转换为 DataStream
tableEnv.toDataStream(result).print("out");
env.execute("job start");
}
}
package com.ali.flink.demo.bean;
public class Event {
// 属性设置为private时,必须增加set、get方法,否则报错:flink Too many fields referenced from an atomic type
public String name;
public String title;
public String title_name;
public int title_number;
public String user_info;
public String address;
public String time_info;
public Long timestamp;
// 必须有无参构造函数,否则将会报错:flink Too many fields referenced from an atomic type
public Event() {
}
public Event(String name, String title, String title_name, int title_number, String user_info, String address, String time_info, Long timestamp) {
this.name = name;
this.title = title;
this.title_name = title_name;
this.title_number = title_number;
this.user_info = user_info;
this.address = address;
this.time_info = time_info;
this.timestamp = timestamp;
}
@Override
public String toString() {
return "Event{" +
"name='" + name + '\'' +
", title=" + title +
", title_name='" + title_name + '\'' +
", title_number=" + title_number +
", user_info=" + user_info +
", address='" + address + '\'' +
", time_info=" + time_info +
", timestamp='" + timestamp + '\'' +
'}';
}
}
map source> Event{name='Tom3', title={"title_number":3,"title_name":"表情包"}, title_name='表情包', title_number=3, user_info=[{"address":"北京市","city":"beijing"},{"address":"上海市","city":"shanghai"}], address='北京市', time_info={"timestamp":1657332118000}, timestamp='1657332118000'}
out> +I[Tom3, {"title_number":3,"title_name":"表情包"}, 表情包, 3, [{"address":"北京市","city":"beijing"},{"address":"上海市","city":"shanghai"}], 北京市, {"timestamp":1657332118000}, 1657332118000]
map source> Event{name='Tom4', title={"title_number":3,"title_name":"表情包"}, title_name='表情包', title_number=3, user_info=[{"address":"北京市","city":"beijing"},{"address":"上海市","city":"shanghai"}], address='北京市', time_info={"timestamp":1657332118000}, timestamp='1657332118000'}
out> +I[Tom4, {"title_number":3,"title_name":"表情包"}, 表情包, 3, [{"address":"北京市","city":"beijing"},{"address":"上海市","city":"shanghai"}], 北京市, {"timestamp":1657332118000}, 1657332118000]
Exception in thread "main" org.apache.flink.table.api.ValidationException: Field reference expression expected.
at org.apache.flink.table.typeutils.FieldInfoUtils.extractFieldInfoFromAtomicType(FieldInfoUtils.java:487)
at org.apache.flink.table.typeutils.FieldInfoUtils.extractFieldInformation(FieldInfoUtils.java:296)
at org.apache.flink.table.typeutils.FieldInfoUtils.getFieldsInfo(FieldInfoUtils.java:260)
at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.lambda$asQueryOperation$1(StreamTableEnvironmentImpl.java:596)
at java.util.Optional.map(Optional.java:215)
at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.asQueryOperation(StreamTableEnvironmentImpl.java:593)
at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.fromDataStream(StreamTableEnvironmentImpl.java:456)
at com.ali.flink.demo.driver.FlinkTableApiDemo001.main(FlinkTableApiDemo001.java:50)
本文将持续更新。。。