前言:今天是学习 flink 的第 15 天啦!学习了 flinkSQL 基础入门,主要是解决大数据领域数据处理采用表的方式,而不是写复杂代码逻辑,学会了如何初始化环境,鹅湖将流数据转化为表数据,以及如何查询表数据,结合自己实验猜想和代码实践,总结了很多自己的理解和想法,希望和大家多多交流!
Tips:"分享是快乐的源泉,在我的博客里,不仅有知识的海洋,还有满满的正能量加持,快来和我一起分享这份快乐吧!
喜欢我的博客的话,记得点个红心❤️和小关小注哦!您的支持是我创作的动力!"
在 FlinkSQL 学习阶段,需要的 pom 文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.itcast</groupId>
<artifactId>flinksql_pro</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<properties>
<flink.version>1.13.1</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.11</scala.binary.version>
<hadoop.version>2.7.5</hadoop.version>
<hbase.version>2.0.0</hbase.version>
<zkclient.version>0.8</zkclient.version>
<hive.version>2.1.1</hive.version>
<mysql.version>5.1.47</mysql.version>
</properties>
<!--仓库配置-->
<repositories>
<repository>
<id>nexus-aliyun</id>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</repository>
<repository>
<id>central_maven</id>
<name>central maven</name>
<url>https://repo1.maven.org/maven2</url>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/junit/junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>compile</scope>-->
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-runtime-blink_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-scala-bridge_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--kafka-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-connector-kafka_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.3.0</version>
</dependency>
<!--es6-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch6_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!--link-jdbc-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-jdbc_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<!--flink-hbase-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hbase-2.2_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<!--hadoop-->
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-shaded-hadoop-2-uber</artifactId>
<version>2.7.5-10.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-csv</artifactId>
<version>${flink.version}</version>
</dependency>
<!--flink-hbase-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-json</artifactId>
<version>${flink.version}</version>
<!--<scope>test</scope>-->
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<!--<scope>test</scope>-->
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
<!--<scope>test</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/ch.qos.logback/logback-core -->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>1.2.3</version>
</dependency>
<!-- json -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.5</version>
</dependency>
<!-- On hive -->
<!-- Flink Dependency -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hive_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!-- Hive Dependency -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.2</version>
<scope>provided</scope>
</dependency>
<!-- mysql 连接驱动 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
<scope>compile</scope>
</dependency>
</dependencies>
</project>
概述:TableEnvironment 是 Table API 和 SQL 的核心概念
作用:
// **********************// FLINK STREAMING QUERY// **********************
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings fsSettings = EnvironmentSettings.newInstance()
.useOldPlanner() // 使用老版本planner
.inStreamingMode() // 流处理模式
.build();
StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);
// or TableEnvironment fsTableEnv = TableEnvironment.create(fsSettings);
// ******************// FLINK BATCH QUERY// ******************
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.BatchTableEnvironment;
ExecutionEnvironment fbEnv = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment fbTableEnv = BatchTableEnvironment.create(fbEnv);
和老版本的区别在于:useBlinkPlanner()
// **********************// BLINK STREAMING QUERY// **********************
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
StreamExecutionEnvironment bsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings bsSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()// 使用新版本planner
.inStreamingMode()// 流处理模式
.build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(bsEnv, bsSettings);
// or TableEnvironment bsTableEnv = TableEnvironment.create(bsSettings);
和老版本的区别在于:
老版本用的是BatchTableEnvironment
,传入fbEnv
新版本用的是TableEnvironment
,传入bbSettings
// ******************// BLINK BATCH QUERY// ******************
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;
EnvironmentSettings bbSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()// 使用新版本planner
.inBatchMode()// 批处理模式
.build();
TableEnvironment bbTableEnv = TableEnvironment.create(bbSettings);
import static org.apache.flink.table.api.Expressions.*;
# 借助 table 环境,找到数据源表
Table orderTable = tableEnv.from("inputTable");
# 调用 table API 进行查询
# select中,$(字段名)
# filter中,可以过滤操作
Table resultTable = orderTable
.select($("id"),$("timestamp"),$("category"),$("areaName"),$("money"))
.filter($("areaName").isEqual("北京"));
# 如果有分组聚类的话,groupBy 需要写在 select 前面
Table aggResultSqlTable = orderTable
.groupBy($("areaName"))
.select($("areaName"), $("id").count().as("cnt"));
# 借助 table 环境,找到数据源表
Table orderTable = tableEnv.from("inputTable");
# 借助 sqlQuery 方法进行 SQL 查询
Table resultTable2 = tableEnv.sqlQuery("select id,`timestamp`,category,areaName,money from inputTable where areaName='北京'");
# 读取的数据文件可以放在 resource 目录下
String filePath = 所在的类名.class.getClassLoader.getResource(“文件名”).getPath();
# 读取数据 ->
env.readTextFile()
# map函数转化类型
# 将数据流转化为表格
tableEnv.fromDataStream(dataStream)
案例:将 DataStream 转化为 表
数据源:order.csv,放在 resource 目录下
user_001,1621718199,10.1,电脑
user_001,1621718201,14.1,手机
user_002,1621718202,82.5,手机
user_001,1621718205,15.6,电脑
user_004,1621718207,10.2,家电
user_001,1621718208,15.8,电脑
user_005,1621718212,56.1,电脑
user_002,1621718260,40.3,家电
user_001,1621718580,11.5,家居
user_001,1621718860,61.6,家居
代码:
package cn.itcast.day01;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
import static org.apache.flink.table.api.Expressions.$;
/**
* @author lql
* @time 2024-03-12 14:34:40
* @description TODO:将 DataStream 转化为表
*/
public class DataStreamToTable {
public static void main(String[] args) throws Exception {
// todo 1) 初始化 table 环境
// 1.1 流处理环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 1.2 setting环境
EnvironmentSettings bsSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build();
// 1.3 表环境
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);
// todo 2) 用流环境读取数据源
String filePath = DataStreamToTable.class.getClassLoader().getResource("order.csv").getPath();
DataStreamSource<String> inputStream = env.readTextFile(filePath);
// todo 3) map 成为样例数据类型
SingleOutputStreamOperator<OrderInfo> dataStream = inputStream.map(new MapFunction<String, OrderInfo>() {
@Override
public OrderInfo map(String data) throws Exception {
String[] dataArray = data.split(",");
return new OrderInfo(
dataArray[0],
dataArray[1],
Double.parseDouble(dataArray[2]),
dataArray[3]);
}
});
// todo 4) 将数据流化成表
Table dataTable = bsTableEnv.fromDataStream(dataStream);
// todo 5) 读取表格数据
// 方法一: 调用 api 获得数据
Table resultTable = dataTable
.select($("id"), $("timestamp"), $("money"),$("category"))
.filter($("category").isEqual("电脑"));
// 将表转化成为流打印
bsTableEnv.toAppendStream(resultTable, Row.class).print("方法一:调用api的结果");
// 方法二:临时表,sql查询获得数据
bsTableEnv.createTemporaryView("inputTable",dataTable);
Table resultTable1 = bsTableEnv.sqlQuery("SELECT * FROM inputTable WHERE category = '电脑'");
bsTableEnv.toAppendStream(resultTable1, Row.class).print("方法二:调用sql查询的结果");
env.execute();
}
@Data
@AllArgsConstructor
@NoArgsConstructor
public static class OrderInfo {
private String id;
private String timestamp;
private Double money;
private String category;
}
}
结果:
方法一:调用api的结果:5> +I[user_001, 1621718205, 15.6, 电脑]
方法二:调用sql查询的结果:5> +I[user_001, 1621718205, 15.6, 电脑]
方法一:调用api的结果:3> +I[user_001, 1621718199, 10.1, 电脑]
方法一:调用api的结果:7> +I[user_001, 1621718208, 15.8, 电脑]
方法二:调用sql查询的结果:3> +I[user_001, 1621718199, 10.1, 电脑]
方法二:调用sql查询的结果:7> +I[user_001, 1621718208, 15.8, 电脑]
方法一:调用api的结果:7> +I[user_005, 1621718212, 56.1, 电脑]
方法二:调用sql查询的结果:7> +I[user_005, 1621718212, 56.1, 电脑]
总结: