一、flink单节点安装部署
1)下载安装包
[root@localhost ~]# wget http://us.mirrors.quenda.co/apache/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.11.tgz
2)解压
[root@localhost ~]# tar -zxvf flink-1.9.1-bin-scala_2.11.tgz -C /usr/local
3)启动
# 切换目录
[root@localhost ~]# cd /usr/local/flink-1.9.1/
# 启动
[root@localhost flink-1.9.1]# ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host localhost.localdomain.
Starting taskexecutor daemon on host localhost.localdomain.
5)停止
[root@localhost flink-1.9.1]# ./bin/stop-cluster.sh
Stopping taskexecutor daemon (pid: 72657) on host localhost.localdomain.
Stopping standalonesession daemon (pid: 72220) on host localhost.localdomain.
6)访问ui界面
http://192.168.169.128:8081/
二、flink集群安装部署
集群规划:
主机名 | IP地址 | 角色 |
---|---|---|
Master | 192.168.56.101 | jobmanager |
slave1 | 192.168.56.102 | TaskManager |
slave2 | 192.168.56.103 | TaskManager |
- 安装同单机。
- 配置
修改主机名:(所有主机)
192.168.56.101 master
192.168.56.102 slave1
192.168.56.103 slave2
Master:
修改配置文件
[root@master ~]# vi /usr/local/flink-1.9.1/conf/flink-conf.yaml
# 33行
jobmanager.rpc.address: master
[root@master ~]# vi /usr/local/flink-1.9.1/conf/masters
# 修改master文件
master:8081
2)修改slaves
[root@master ~]# vi /usr/local/flink-1.9.1/conf/slaves
slave1
slave2
- 分发flink到其他机器
[root@localhost ~]# scp -r /usr/local/flink-1.9.1/ [email protected]:/usr/local/
The authenticity of host '192.168.56.101 (192.168.56.102)' can't be established.
ECDSA key fingerprint is SHA256:LlxhKBx6zi06K0chAZjVk+ybYoHCn6yi45RGMn6zGPY.
ECDSA key fingerprint is MD5:64:09:e3:c3:5f:3b:b6:f5:01:73:a8:83:6f:e6:bf:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.102' (ECDSA) to the list of known hosts.
[email protected]'s password:
LICENSE
[root@localhost ~]# scp -r /usr/local/flink-1.9.1/ [email protected]:/usr/local/
The authenticity of host '192.168.56.102 (192.168.56.103)' can't be established.
ECDSA key fingerprint is SHA256:PsNlQXJhfQm/DIC0DsYoXpwInVowEwBeUKmVeuJ5RXg.
ECDSA key fingerprint is MD5:ba:8b:33:39:d6:44:79:b0:e2:99:bc:fc:89:57:44:83.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.103' (ECDSA) to the list of known hosts.
[email protected]'s password:
LICENSE
- 启动集群
[root@master ~]# cd /usr/local/flink-1.9.1/
[root@master flink-1.9.1]# ./bin/start-cluster.sh
Starting cluster.
[INFO] 1 instance(s) of standalonesession are already running on master.
Starting standalonesession daemon on host master.
root@slave1's password:
Starting taskexecutor daemon on host slave1.
root@slave2's password:
Starting taskexecutor daemon on host slave2.
- 关闭集群
[root@master ~]# cd /usr/local/flink-1.9.1/
[root@master flink-1.9.1]# ./bin/stop-cluster.sh
-
访问ui界面
http://192.168.56.101:8081/
三、运行自带示例
文档地址:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/batch/examples.html#running-an-example
运行示例:
./bin/flink run ./examples/batch/WordCount.jar
- 批处理示例
源码:https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount
[root@master flink-1.9.1]# ./bin/flink run examples/batch/WordCount.jar
Starting execution of program
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
(a,5)
(action,1)
(after,1)
(against,1)
(all,2)
(and,12)
(arms,1)
(arrows,1)
(awry,1)
(ay,1)
(bare,1)
Program execution finished
Job with JobID 47566b035440a9b1789e4cc41652f3f2 has finished.
Job Runtime: 5289 ms
Accumulator Results:
- a20407c3976a9447effcaeb4c8f99b4a (java.util.ArrayList) [170 elements]
- 流处理示例
源码:https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples
// 安装netcat
[root@master flink-1.9.1]# yum install -y nc
# 启动nc服务器
[root@master ~]# nc -l 9000
# 提交flink的批处理examples程序
[root@master flink-1.9.1]# ./bin/flink run examples/streaming/SocketWindowWordCount.jar --hostname 192.168.56.101 --port 9000
Starting execution of program
# 这是flink提供的examples下的流处理例子程序,接收socket数据传入,统计单词个数。
# 在nc端写入单词:
[root@master ~]# nc -l 9000
hello world
how are you
are you ok
查看结果:
[root@master flink-1.9.1]# cat log/flink-root-taskexecutor-0-slave2.out
hello : 1
world : 1
how : 1
you : 1
are : 1
are : 1
ok : 1
you : 1
三、WordCount简单实现
需求:实时的wordcount
往端口中发送数据,实时的计算数据
-
创建一个Maven项目
2.Maven依赖
1.9.1
org.projectlombok
lombok
1.18.10
provided
org.apache.flink
flink-java
${flink.version}
org.apache.flink
flink-streaming-java_2.11
${flink.version}
org.apache.flink
flink-clients_2.11
${flink.version}
org.apache.flink
flink-connector-wikiedits_2.11
${flink.version}
org.slf4j
slf4j-nop
1.7.29
生成可执行jar插件(这里没有用插件生成,使用时可以不加):
org.apache.maven.plugins
maven-assembly-plugin
2.2.1
com.xtsz.SocketWordCount
jar-with-dependencies
make-assembly
package
single
- CustomWordCount .java类
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
/**
* 需求:实时的wordcount
* 往端口中发送数据,实时的计算数据
*/
public class CustomWordCount {
public static void main(String[] args) throws Exception {
// 1.定义socket的端口号
int port;
try {
ParameterTool parameterTool = ParameterTool.fromArgs(args);
port = parameterTool.getInt("port");
} catch (Exception e) {
System.err.println("没有指定port参数,使用默认值9000");
port = 9000;
}
//2.创建执行环境对象
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//3.得到套接字对象(指定:主机、端口、分隔符),连接socket获取输入的数据
DataStreamSource dataStreamSource = env.socketTextStream("192.168.56.101", port, "\n");
//4.解析数据,统计数据-单词计数 hello lz hello world
DataStream> dataStream = dataStreamSource.flatMap(new FlatMapFunction>() {
public void flatMap(String s, Collector> collector) throws Exception {
String[] words = s.split(" ");
for (String word : words) {
collector.collect(new Tuple2(word, 1));
}
}
})
// 按照key进行分组,把每行的单词转为类型的数据
.keyBy(0) // 相同的单词进行分组
// 设置窗口的时间长度 5秒一次窗口 1秒计算一次
.timeWindow(Time.seconds(5), Time.seconds(1)) // 指定计算数据的窗口大小和滑动窗口大小
.sum(1);
// 5.打印可以设置并发度
dataStream.print().setParallelism(1);
//6.执行程序 因为flink是懒加载的,所以必须调用execute方法
env.execute("streaming word count");
}
}
- 服务器安装netcat
// 安装netcat
[root@master flink-1.9.1]# yum install -y nc
// 使用nc,其中9000是CustomWordCount类中定义的端口号
[root@master flink-1.9.1]# nc -lk -p 9000
开启IP为192.168.56.101的虚拟机,并开启该虚拟机的终端,在终端输入如下命令,该命令可以打开一个端口号为9000的监听,输入命令后光标会停留在如下图的地方。
-
运行CustomWordCount类的main方法
- 此时在服务器的nc下输入单词后,CustomWordCount的main方法会时时监控到该单词并进行计算处理。
在虚拟机终端开的光标停留出,输入
hello hello world world world world
然后回车。在IDEA的控制台会显示如下单词和词频的信息,表示成功。
四、打包上传实现
-
打包
-
上传到服务器运行
开启服务监听
// 使用nc,其中9000是CustomWordCount类中定义的端口号
[root@master flink-1.9.1]# nc -lk -p 9000
-
提交作业
输入数据
[root@master flink-1.9.1]# nc -lk -p 9000
hello
world
hello
world
hello
hello
hello
-
查看结果
注:因为是分布式集群,这里查看的是master节点数据,输出有可能在其它节点,这里在slave1上。
[root@slave1 flink-1.9.1]# tail -f log/*-taskexecutor-*.out
(world,1)
(hello,1)
(hello,1)
(hello,2)
(hello,2)
(hello,3)
(hello,2)
(hello,2)
(hello,1)
(hello,1)
五、常见问题
- 解决"-bash: netstat: command not found"问题
[root@master flink-1.9.1]# yum install net-tools