Flink是很火的数据流处理框架,它有什么特点,在业务又是如何应用的呢?本文将会从以下介绍Flink相关
首先,Flink的应用场景
其次,Flink有什么特点
第三,怎么快速run一个job,运行时情况是什么样的
第四,Flink内部是怎么实现的
带着这样的问题,我们开始Flink学习
在实际场景中,我们有很多流处理框架可以选择,它们各有优势,但相比flink劣势很明显。
Flink是一个流处理框架,和他相似的还有Java系列的Spark,它是Flink之前流行的框架,是基于RDD计算模型的,是一个批处理系统。
是一个基于内存的流处理系统
这是go的一个流处理框架,可以很方便地搭建架构
- 主从架构
分为JobManager和TaskManager,JM负责接收client提交的job,做任务分发,和协调各TM的checkPoint以便保证分布式状态一致
基于EventTime和WaterMark可以给各个数据打水印,避免数据乱序
我们将从一个wordCount开始
4.0.0
org.example
flinktest
1.0-SNAPSHOT
org.apache.flink
flink-java
1.13.1
org.apache.flink
flink-streaming-java_2.11
1.13.1
org.apache.flink
flink-clients_2.11
1.13.1
8
8
package com.demo;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
public class WordCount {
public static void main(String[] args) throws Exception {
//定义socket的端口号
int port;
try {
ParameterTool parameterTool = ParameterTool.fromArgs(args);
port = parameterTool.getInt("port");
} catch (Exception e) {
System.err.println("没有指定port参数,使用默认值9000");
port = 9000;
}
//获取运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//连接socket获取输入的数据
DataStreamSource<String> text = env.socketTextStream("127.0.0.1", port, "\n");
//计算数据
DataStream<WordWithCount> windowCount = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
String[] splits = value.split("\\s");
for (String word : splits) {
out.collect(new WordWithCount(word, 1L));
}
}
})//打平操作,把每行的单词转为类型的数据
.keyBy("word")//针对相同的word数据进行分组
.timeWindow(Time.seconds(2), Time.seconds(1))//指定计算数据的窗口大小和滑动窗口大小
.sum("count");
//把数据打印到控制台
windowCount.print()
.setParallelism(1);//使用一个并行度
//注意:因为flink是懒加载的,所以必须调用execute方法,上面的代码才会执行
env.execute("streaming word count");
}
/**
* 主要为了存储单词以及单词出现的次数
*/
public static class WordWithCount {
public String word;
public long count;
public WordWithCount() {
}
public WordWithCount(String word, long count) {
this.word = word;
this.count = count;
}
@Override
public String toString() {
return "WordWithCount{" +
"word='" + word + '\'' +
", count=" + count +
'}';
}
}
}
webUI提供了丰富的功能,
包括TaskManager中各subTask的Log,按Log类型的LogList,反压,Metric指标曲线等,详见http://www.54tianzhisheng.cn/2019/02/28/blink/
(base) /usr/local/Cellar/apache-flink/1.13.1/libexec/bin flink --version
Version: 1.13.1, Commit ID: a7f3192
(base) /usr/local/Cellar/apache-flink/1.13.1/libexec/bin /usr/local/Cellar/apache-flink/1.13.1/libexec/bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host localhost.
Starting taskexecutor daemon on host localhost.
(base) ~/go/src/codes/javacodes/flinktest mvn clean package -Dmaven.test.skip=true
[INFO] Scanning for projects...
[INFO]
[INFO] -----------------------< org.example:flinktest >------------------------
[INFO] Building flinktest 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ flinktest ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ flinktest ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 0 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ flinktest ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Compiling 2 source files to /Users/sunyuchuan/go/src/codes/javacodes/flinktest/target/classes
[WARNING] /Users/sunyuchuan/go/src/codes/javacodes/flinktest/src/main/java/com/demo/WordCount.java: /Users/sunyuchuan/go/src/codes/javacodes/flinktest/src/main/java/com/demo/WordCount.java使用或覆盖了已过时的 API。
[WARNING] /Users/sunyuchuan/go/src/codes/javacodes/flinktest/src/main/java/com/demo/WordCount.java: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ flinktest ---
[INFO] Not copying test resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ flinktest ---
[INFO] Not compiling test sources
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ flinktest ---
[INFO] Tests are skipped.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ flinktest ---
[INFO] Building jar: /Users/sunyuchuan/go/src/codes/javacodes/flinktest/target/flinktest-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.350 s
[INFO] Finished at: 2021-08-01T17:07:26+08:00
[INFO] ------------------------------------------------------------------------
然后我们开启监听端口
nc -l 9000
然后提交作业
/usr/local/Cellar/apache-flink/1.13.1/libexec/bin/flink run -c com.demo.WordCount /Users/sunyuchuan/go/src/codes/javacodes/flinktest/target/flinktest-1.0-SNAPSHOT.jar 127.0.0.1 9000
(base) ~/go/src/codes/javacodes/flinktest/src/main/java/com/demo /usr/local/Cellar/apache-flink/1.13.1/libexec/bin/flink run -c com.demo.WordCount /Users/sunyuchuan/go/src/codes/javacodes/flinktest/target/flinktest-1.0-SNAPSHOT.jar 127.0.0.1 9000
没有指定port参数,使用默认值9000
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner (file:/usr/local/Cellar/apache-flink/1.13.1/libexec/lib/flink-dist_2.12-1.13.1.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Job has been submitted with JobID e452e3e05a3d646a2c19e1bad54d4d41
这时出现了一个running的job
然后在stdout就打印了消息
本机的目录flink的log如下,分为了client, JM和TM的日志
(base) ~ ll /usr/local/Cellar/apache-flink/1.13.1/libexec/log
total 152
-rw-r--r-- 1 sunyuchuan admin 5.4K 8 1 01:28 flink-sunyuchuan-client-DG.local.log
-rw-r--r-- 1 sunyuchuan admin 6.7K 8 1 17:13 flink-sunyuchuan-client-localhost.log
-rw-r--r-- 1 sunyuchuan admin 23K 8 1 17:13 flink-sunyuchuan-standalonesession-0-localhost.log
-rw-r--r-- 1 sunyuchuan admin 615B 8 1 17:00 flink-sunyuchuan-standalonesession-0-localhost.out
-rw-r--r-- 1 sunyuchuan admin 25K 8 1 17:13 flink-sunyuchuan-taskexecutor-0-localhost.log
-rw-r--r-- 1 sunyuchuan admin 897B 8 1 17:14 flink-sunyuchuan-taskexecutor-0-localhost.out
然后我们看一下输出结果
(base) ✘ ~ tailf /usr/local/Cellar/apache-flink/1.13.1/libexec/log/flink-sunyuchuan-taskexecutor-0-localhost.out
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
WordWithCount{
word='', count=1}
WordWithCount{
word='', count=1}
WordWithCount{
word='hello', count=1}
WordWithCount{
word='world', count=1}
WordWithCount{
word='abc', count=1}
WordWithCount{
word='hello', count=1}
WordWithCount{
word='abc', count=1}
WordWithCount{
word='world', count=1}
参考https://blog.csdn.net/daska110/article/details/119357742
mvn clean package