Maven 3.0.4 (or higher)
Java 11
进入flink下载页面
https://flink.apache.org/zh/downloads.html
笔者选择的版本是1.15.1
若不想打开页面,可以直接使用下载链接
https://dlcdn.apache.org/flink/flink-1.15.1/flink-1.15.1-bin-scala_2.12.tgz
文件大小 435.6MB 需要等待一段时间…
选择 Apache Flink 1.15.1 for Scala 2.12 下载
注:这篇文章写时最新版本是 Apache Flink 1.15.1
解压
$ tar -xzf flink-1.15.1-bin-scala_2.12.tgz
$ cd flink-1.15.1
启动
$ ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host.
Starting taskexecutor daemon on host.
查看 flink 运行状态
http://localhost:8081/
$ ./bin/stop-cluster.sh
运行以下程序时,Flink需是运行状态
直接使用指令创建maven项目(推荐)
mvn archetype:generate
-DarchetypeGroupId=org.apache.flink
-DarchetypeArtifactId=flink-quickstart-java
-DarchetypeVersion=1.15.0
-DgroupId=flink-project
-DartifactId=flink-project
-Dversion=0.1
-Dpackage=myflink
-DinteractiveMode=false
得到 flink-project/
如下
4.0.0
flink-project
flink-project
0.1
jar
Flink Quickstart Job
UTF-8
1.15.0
1.8
2.12
${target.java.version}
${target.java.version}
2.17.1
apache.snapshots
Apache Development Snapshot Repository
https://repository.apache.org/content/repositories/snapshots/
false
true
org.apache.flink
flink-streaming-java
${flink.version}
provided
org.apache.flink
flink-clients
${flink.version}
provided
org.apache.logging.log4j
log4j-slf4j-impl
${log4j.version}
runtime
org.apache.logging.log4j
log4j-api
${log4j.version}
runtime
org.apache.logging.log4j
log4j-core
${log4j.version}
runtime
org.apache.maven.plugins
maven-compiler-plugin
3.1
${target.java.version}
org.apache.maven.plugins
maven-shade-plugin
3.1.1
package
shade
org.apache.flink:flink-shaded-force-shading
com.google.code.findbugs:jsr305
org.slf4j:*
org.apache.logging.log4j:*
*:*
META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA
myflink.DataStreamJob
org.eclipse.m2e
lifecycle-mapping
1.0.0
org.apache.maven.plugins
maven-shade-plugin
[3.1.1,)
shade
org.apache.maven.plugins
maven-compiler-plugin
[3.1,)
testCompile
compile
添加 一个简单的单词计数程序
WordCount.class
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
/**
* @author ximu
* @date 2022/7/24
* @description
*/
public class WordCount {
//
// Program
//
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// get input data
DataSet text = env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,"
);
DataSet> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new LineSplitter())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0) //(i,1) (am,1) (chinese,1)
.sum(1);
// execute and print result
counts.print();
}
//
// User Functions
//
/**
* Implements the string tokenizer that splits sentences into words as a user-defined
* FlatMapFunction. The function takes a line (String) and splits it into
* multiple pairs in the form of "(word,1)" (Tuple2<String, Integer>).
*/
public static final class LineSplitter implements FlatMapFunction> {
@Override
public void flatMap(String value, Collector> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2(token, 1));
}
}
}
}
}
运行得到结果
如果运行报错
错误: 无法初始化主类 myflink.WordCount
修改pom.xml
org.apache.flink
flink-clients
${flink.version}
provided
org.apache.flink
flink-streaming-java
${flink.version}
provided
修改为
org.apache.flink
flink-clients
${flink.version}
compile
org.apache.flink
flink-streaming-java
${flink.version}
compile
即可
添加 一个滑动窗口的单词计数程序
WindowWordCount.class
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
public class WindowWordCount {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream> dataStream = env
.socketTextStream("localhost", 9999)
.flatMap(new Splitter())
.keyBy(value -> value.f0)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.sum(1);
dataStream.print();
env.execute("Window WordCount");
}
public static class Splitter implements FlatMapFunction> {
@Override
public void flatMap(String sentence, Collector> out) throws Exception {
for (String word: sentence.split(" ")) {
out.collect(new Tuple2(word, 1));
}
}
}
}
请勿直接运行,因为本程序监听了本地端口9999所输入的数据,所以在运行程序之前,需要先开启端口9999
在Terminal输入
nc -lk 9999
进入输入模式,此时启动程序 WindowWordCount
程序启动完成后
在Terminal输入一些数据
程序输出单词与出现的次数
笔者使用的是5s的滑窗,可根据参数调整时间
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
mvn clean package -Dmaven.test.skip=true
在flink目录下执行
bin/flink run -c ${类相对路径} ${Jar绝对路径}
如
bin/flink run -c myflink.WordCount /Users/ximu/Project/Java/flink-project/target/flink-project-0.1.jar
好了,Flink的探索先告一段落了,欢迎给我留言~
参考
Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用.md
如何使用 Maven 配置您的项目
Flink DataStream API 编程指南