你可能感兴趣的文章:
1-Flink入门
2-本地环境搭建&构建第一个Flink应用
3-DataSet API
4-DataSteam API
5-集群部署
6-分布式缓存
7-重启策略
8-Flink中的窗口
9-Flink中的Time
Flink时间戳和水印
Broadcast广播变量
FlinkTable&SQL
Flink实战项目实时热销排行
Flink写入RedisSink
Flink消费Kafka写入Mysql
Flink组件和逻辑计划
Flink执行计划生成
JobManager中的基本组件(1)
JobManager中的基本组件(2)
JobManager中的基本组件(3)
TaskManager
算子
网络
水印WaterMark
CheckPoint
任务调度与负载均衡
异常处理
Alibaba Blink新特性
Java高级特性增强-集合
Java高级特性增强-多线程
Java高级特性增强-Synchronized
Java高级特性增强-volatile
Java高级特性增强-并发集合框架
Java高级特性增强-分布式
Java高级特性增强-Zookeeper
Java高级特性增强-JVM
Java高级特性增强-NIO
Java高级特性增强-Netty
阿里的一篇文章,可以先看看会对动态表有一个模糊的概念
动态表就是一个根据流在动态变化的表。从阿里的例子可以看出,当一个表Stream发生改变的时候,就会引起Keyed Table这张表的一个动态变化,表Stream是一个无法撤回的表,Stream表是只能不停增加的一张表,但是Keyed Table 会根据Stream中数据的增长的变化来修改自己count出来的值,随着count值的改变就会使得以count为key的第二张表的改变,第二张表才是我们需要的结果。第一张表只是一个过渡的表,但是有了第一张表才能满足我们第二张的要求。
将阿里的第一张表以java代码写出
package com.yjp.flink.retraction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
public class RetractionITCase {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.getTableEnvironment(env);
env.getConfig().disableSysoutLogging();
DataStream> dataStream =
env.fromElements(
new Tuple2<>("hello", 1),
new Tuple2<>("word", 1),
new Tuple2<>("hello", 1),
new Tuple2<>("bark", 1),
new Tuple2<>("bark", 1),
new Tuple2<>("bark", 1),
new Tuple2<>("bark", 1),
new Tuple2<>("bark", 1),
new Tuple2<>("bark", 1),
new Tuple2<>("flink", 1)
);
tEnv.registerDataStream("demo1", dataStream, "word ,num");
Table table = tEnv.sqlQuery("select * from demo1 ").groupBy("word")
.select("word AS word ,num.sum AS count")
.groupBy("count").select("count , word.count as frequency");
tEnv.toRetractStream(table, Word.class).print();
env.execute("demo");
}
}
package com.yjp.flink.retraction;
public class Word {
private Integer count;
private Long frequency;
public Word() {
}
public Integer getCount() {
return count;
}
public void setCount(Integer count) {
this.count = count;
}
public Long getFrequency() {
return frequency;
}
public void setFrequency(Long frequency) {
this.frequency = frequency;
}
@Override
public String toString() {
return "Word{" +
"count=" + count +
", frequency=" + frequency +
'}';
}
}
结果:
2> (true,Word{count=1, frequency=1})
2> (false,Word{count=1, frequency=1})
2> (true,Word{count=1, frequency=2})
4> (true,Word{count=3, frequency=1})
4> (false,Word{count=3, frequency=1})
4> (true,Word{count=4, frequency=1})
4> (false,Word{count=4, frequency=1})
2> (false,Word{count=1, frequency=2})
2> (true,Word{count=1, frequency=3})
2> (false,Word{count=1, frequency=3})
3> (true,Word{count=6, frequency=1})
1> (true,Word{count=2, frequency=1})
1> (false,Word{count=2, frequency=1})
1> (true,Word{count=5, frequency=1})
1> (false,Word{count=5, frequency=1})
1> (true,Word{count=2, frequency=1})
2> (true,Word{count=1, frequency=2})
2> (false,Word{count=1, frequency=2})
2> (true,Word{count=1, frequency=3})
2> (false,Word{count=1, frequency=3})
2> (true,Word{count=1, frequency=2})
从结果来分析,我们所希望达到的的目标是:6,1 6个bark 2,1两个hello 1,2 分别是word flink
前面数字相同的是同一组操作,true代表的是写入,false代表的是撤回,true和false一样就会抵消,然后就会发现结果和我们预想的结果是一样的,如果没有撤回操作,阿里的文章已经说明了。
我们在看阿里的第二个例子:看第二个例子的时候会好奇StringLast这个函数应该怎样去实现,java实现如下
package com.yjp.flink.retract;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
public class ALiTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.getTableEnvironment(env);
env.getConfig().disableSysoutLogging();
DataStreamSource> dataStream = env.fromElements(
new Tuple3<>("0001", "中通", 1L),
new Tuple3<>("0002", "中通", 2L),
new Tuple3<>("0003", "圆通", 3L),
new Tuple3<>("0001", "圆通", 4L)
);
tEnv.registerDataStream("Ali", dataStream, "order_id ,company,timestamp");
tEnv.registerFunction("agg", new AliAggrete());
Table table = tEnv.sqlQuery("select * from Ali ")
.groupBy("order_id").select(" order_id,agg(company,timestamp) As company")
.groupBy("company").select("company , order_id.count as order_cnt");
tEnv.toRetractStream(table, ALi.class).print();
env.execute("ALi");
}
}
package com.yjp.flink.retract;
import org.apache.flink.table.functions.AggregateFunction;
public class AliAggrete extends AggregateFunction {
@Override
public ALiAccum createAccumulator() {
return new ALiAccum();
}
@Override
public String getValue(ALiAccum aLiAccum) {
return aLiAccum.company;
}
//更改累加器中的结果
public void accumulate(ALiAccum aLiAccum, String company, Long time) {
if (time > aLiAccum.timeStamp) {
aLiAccum.company = company;
}
}
// public void retract(ALiAccum aLiAccum, String company, Long time) {
// aLiAccum.company = company;
// aLiAccum.timeStamp = time;
// }
// public void resetAccumulator(ALiAccum aLiAccum) {
// aLiAccum.company = null;
// aLiAccum.timeStamp = 0L;
// }
// public void merge(ALiAccum acc, Iterable it) {
// Iterator iter = it.iterator();
// while (iter.hasNext()) {
// ALiAccum aLiAccum = iter.next();
// if (aLiAccum.timeStamp > acc.timeStamp) {
// acc.company = aLiAccum.company;
// }
// }
// }
}
package com.yjp.flink.retract;
public class ALiAccum {
public String company = null;
public Long timeStamp = 0L;
}
package com.yjp.flink.retract;
public class ALi {
private String company;
private Long order_cnt;
public ALi() {
}
public String getCompany() {
return company;
}
public void setCompany(String company) {
this.company = company;
}
public Long getOrder_cnt() {
return order_cnt;
}
public void setOrder_cnt(Long order_cnt) {
this.order_cnt = order_cnt;
}
@Override
public String toString() {
return "ALi{" +
"company='" + company + '\'' +
", order_cnt=" + order_cnt +
'}';
}
}
这个整个就是阿里第二个例子用代码去实现,timestamp这个字段其实可以不用给,因为每个流进入的时候就会自带一个时间戳,但是会有乱序的考虑,如果不考虑乱序就用自带的时间戳就可以了。
分析整个逻辑代码
tEnv.registerFunction(“agg”, new AliAggrete());
将我们自己实现的聚合的函数注册, Table table = tEnv.sqlQuery(“select * from Ali “)将流转换为第一张Stream表, .groupBy(“order_id”).select(” order_id,agg(company,timestamp) As company”)以订单id分组,相同id的订单会进入同一组,然后我们通过我们自定义的聚合函数去实现只发送时间戳最大的那个记录,实现的原理,ALiAccum这个类是为了将我们company,timestamp两个字段形成映射关系,然后AggregateFunction