分享一个大神的人工智能教程。零基础!通俗易懂!风趣幽默!还带黄段子!希望你也加入到人工智能的队伍中来!
点击浏览教程
Table API是用于流和批处理的统一关系API。 Table API查询可以在批量或流式输入上运行而无需修改。 Table API是SQL语言的超级集合,专门用于与Apache Flink一起使用。 Table API是Scala和Java语言集成API。 Table API查询不是像SQL一样将字符串值指定为SQL,而是在Java或Scala中以嵌入语言的样式定义,并支持自动完成和语法验证等IDE支持。
这里先上完整代码。
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-scala_2.11artifactId>
<version>${flink.version}version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table_2.11artifactId>
<version>${flink.version}version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-streaming-scala_2.11artifactId>
<version>${flink.version}version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-connector-kafka-0.8_2.11artifactId>
<version>${flink.version}version>
dependency>
从kafka消费数据,转换为table,然后进行sql查询。
用scala开发,需要导入的包,不要漏掉,否则会有问题。
import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
import org.apache.flink.table.api.TableEnvironment
import org.apache.flink.table.api.scala._
package com.ddxygq.bigdata.flink.sql
import java.util.Properties
import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
import org.apache.flink.table.api.TableEnvironment
import org.apache.flink.table.api.scala._
/**
* @ Author: keguang
* @ Date: 2019/2/22 16:13
* @ version: v1.0.0
* @ description:
*/
object TableDemo {
def main(args: Array[String]): Unit = {
demo
}
def demo2(): Unit ={
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)
val input:DataSet[WC] = env.fromElements(WC("hello", 1), WC("hello", 1), WC("ciao", 1))
val input2:DataSet[WC] = env.fromElements(WC("hello", 1), WC("hello", 1))
val table = input.toTable(tEnv, 'word, 'frequency)
val table2 = input2.toTable(tEnv, 'word2, 'frequency2)
val result = table.join(table2).where('word == 'word2).select('word, 'frequency)
result.toDataSet[(String, Long)].print()
}
def demo: Unit ={
val sEnv = StreamExecutionEnvironment.getExecutionEnvironment
val sTableEnv = TableEnvironment.getTableEnvironment(sEnv)
// 连接kafka
val ZOOKEEPER_HOST = "qcloud-test-hadoop01:2181"
val KAFKA_BROKERS = "qcloud-test-hadoop01:9092,qcloud-test-hadoop02:9092,qcloud-test-hadoop03:9092"
val TRANSACTION_GROUP = "transaction"
val kafkaProps = new Properties()
kafkaProps.setProperty("zookeeper.connect",ZOOKEEPER_HOST)
kafkaProps.setProperty("bootstrap.servers", KAFKA_BROKERS)
kafkaProps.setProperty("group.id",TRANSACTION_GROUP)
val input = sEnv.addSource(
new FlinkKafkaConsumer08[String]("flink-test", new SimpleStringSchema(), kafkaProps)
)
.flatMap(x => x.split(" "))
.map(x => (x, 1L))
val table = sTableEnv.registerDataStream("Words", input, 'word, 'frequency)
val result = sTableEnv
.scan("Words")
.groupBy("word")
.select('word, 'frequency.sum as 'cnt)
sTableEnv.toRetractStream[(String, Long)](result).print()
sTableEnv.sqlQuery("select * from Words").toAppendStream[(String, Long)].print()
sEnv.execute("TableDemo")
}
}
这里有两个地方:
1、这里举例用了table的算子,和标准的sql查询语法,为了演示table的基本用法。
val result = sTableEnv
.scan("Words")
.groupBy("word")
.select('word, 'frequency.sum as 'cnt)
这个分组聚合统计其实可以替换成:
val result = sTableEnv.sqlQuery("select word,sum(frequency) as cnt from Words group by word")
// 打印到控制台
sTableEnv.toRetractStream[(String, Long)](result).print()
那么这个与下面的查询结果有什么区别呢?
sTableEnv.sqlQuery("select * from Words").toAppendStream[(String, Long)].print()
区别很明显,这里消费kafka的实时数据,那么Words
表是一个动态的流表,数据在不断append,一个是group by
的分组聚合,结果需要不断更新,比如当前是(hello,4)
,这时候又来了一个词语hello
,就需要update结果为(hello,5)
,如果有新词,还需要insert
,而后者是select * from Words
,只是追加结果。
所以,这里只是展示打印到控制台的写法不同,前者调用的是toRetractStream
方法,而后者是调用toAppendStream
。
将a转换Table为a 有两种模式DataStream:
/ get TableEnvironment.
// registration of a DataSet is equivalent
val tableEnv = TableEnvironment.getTableEnvironment(env)
// Table with two fields (String name, Integer age)
val table: Table = ...
// convert the Table into an append DataStream of Row
val dsRow: DataStream[Row] = tableEnv.toAppendStream[Row](table)
// convert the Table into an append DataStream of Tuple2[String, Int]
val dsTuple: DataStream[(String, Int)] dsTuple =
tableEnv.toAppendStream[(String, Int)](table)
// convert the Table into a retract DataStream of Row.
// A retract stream of type X is a DataStream[(Boolean, X)].
// The boolean field indicates the type of the change.
// True is INSERT, false is DELETE.
val retractStream: DataStream[(Boolean, Row)] = tableEnv.toRetractStream[Row](table)
// get TableEnvironment
// registration of a DataSet is equivalent
val tableEnv = TableEnvironment.getTableEnvironment(env)
// Table with two fields (String name, Integer age)
val table: Table = ...
// convert the Table into a DataSet of Row
val dsRow: DataSet[Row] = tableEnv.toDataSet[Row](table)
// convert the Table into a DataSet of Tuple2[String, Int]
val dsTuple: DataSet[(String, Int)] = tableEnv.toDataSet[(String, Int)](table)
我的个人博客