大数据--Flink--流处理(二)

一、flink单节点安装部署

下载

1)下载安装包

[root@localhost ~]# wget http://us.mirrors.quenda.co/apache/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.11.tgz

2)解压

[root@localhost ~]# tar -zxvf flink-1.9.1-bin-scala_2.11.tgz -C /usr/local

3)启动

# 切换目录
[root@localhost ~]#  cd /usr/local/flink-1.9.1/
# 启动
[root@localhost flink-1.9.1]# ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host localhost.localdomain.
Starting taskexecutor daemon on host localhost.localdomain.

5)停止

[root@localhost flink-1.9.1]# ./bin/stop-cluster.sh
Stopping taskexecutor daemon (pid: 72657) on host localhost.localdomain.
Stopping standalonesession daemon (pid: 72220) on host localhost.localdomain.

6)访问ui界面
http://192.168.169.128:8081/


访问ui界面

二、flink集群安装部署

集群规划:

主机名 IP地址 角色
Master 192.168.56.101 jobmanager
slave1 192.168.56.102 TaskManager
slave2 192.168.56.103 TaskManager
  1. 安装同单机。
  2. 配置
    修改主机名:(所有主机)
192.168.56.101 master
192.168.56.102 slave1
192.168.56.103 slave2

Master:

修改配置文件
[root@master ~]# vi /usr/local/flink-1.9.1/conf/flink-conf.yaml 
# 33行
jobmanager.rpc.address: master

[root@master ~]# vi /usr/local/flink-1.9.1/conf/masters 
# 修改master文件
master:8081

2)修改slaves

[root@master ~]# vi /usr/local/flink-1.9.1/conf/slaves 
slave1
slave2

  1. 分发flink到其他机器
[root@localhost ~]# scp -r /usr/local/flink-1.9.1/ [email protected]:/usr/local/
The authenticity of host '192.168.56.101 (192.168.56.102)' can't be established.
ECDSA key fingerprint is SHA256:LlxhKBx6zi06K0chAZjVk+ybYoHCn6yi45RGMn6zGPY.
ECDSA key fingerprint is MD5:64:09:e3:c3:5f:3b:b6:f5:01:73:a8:83:6f:e6:bf:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.102' (ECDSA) to the list of known hosts.
[email protected]'s password: 
LICENSE   

[root@localhost ~]# scp -r /usr/local/flink-1.9.1/ [email protected]:/usr/local/
The authenticity of host '192.168.56.102 (192.168.56.103)' can't be established.
ECDSA key fingerprint is SHA256:PsNlQXJhfQm/DIC0DsYoXpwInVowEwBeUKmVeuJ5RXg.
ECDSA key fingerprint is MD5:ba:8b:33:39:d6:44:79:b0:e2:99:bc:fc:89:57:44:83.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.103' (ECDSA) to the list of known hosts.
[email protected]'s password: 
LICENSE 

  1. 启动集群
[root@master ~]# cd /usr/local/flink-1.9.1/
[root@master flink-1.9.1]# ./bin/start-cluster.sh
Starting cluster.
[INFO] 1 instance(s) of standalonesession are already running on master.
Starting standalonesession daemon on host master.
root@slave1's password: 
Starting taskexecutor daemon on host slave1.
root@slave2's password: 
Starting taskexecutor daemon on host slave2.

  1. 关闭集群
[root@master ~]# cd /usr/local/flink-1.9.1/
[root@master flink-1.9.1]# ./bin/stop-cluster.sh

  1. 访问ui界面
    http://192.168.56.101:8081/


    节点查看

    工作管理

    切换风格

    工作管理

三、运行自带示例

文档地址:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/batch/examples.html#running-an-example
运行示例:

./bin/flink run ./examples/batch/WordCount.jar
  1. 批处理示例
    源码:https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount
[root@master flink-1.9.1]# ./bin/flink run examples/batch/WordCount.jar
Starting execution of program
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
(a,5)
(action,1)
(after,1)
(against,1)
(all,2)
(and,12)
(arms,1)
(arrows,1)
(awry,1)
(ay,1)
(bare,1)
Program execution finished
Job with JobID 47566b035440a9b1789e4cc41652f3f2 has finished.
Job Runtime: 5289 ms
Accumulator Results: 
- a20407c3976a9447effcaeb4c8f99b4a (java.util.ArrayList) [170 elements]

完成任务

完成任务
  1. 流处理示例
    源码:https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples
//  安装netcat
[root@master flink-1.9.1]# yum install -y nc
# 启动nc服务器
[root@master ~]# nc -l 9000

# 提交flink的批处理examples程序
[root@master flink-1.9.1]# ./bin/flink run examples/streaming/SocketWindowWordCount.jar --hostname 192.168.56.101 --port 9000
Starting execution of program

# 这是flink提供的examples下的流处理例子程序,接收socket数据传入,统计单词个数。
# 在nc端写入单词:
[root@master ~]# nc -l 9000
hello world
how are you
are you ok

查看结果:

[root@master flink-1.9.1]# cat log/flink-root-taskexecutor-0-slave2.out 
hello : 1
world : 1
how : 1
you : 1
are : 1
are : 1
ok : 1
you : 1

三、WordCount简单实现
需求:实时的wordcount
往端口中发送数据,实时的计算数据

  1. 创建一个Maven项目


    创建一个Maven项目

2.Maven依赖

1.9.1


    org.projectlombok
    lombok
    1.18.10
    provided



    org.apache.flink
    flink-java
    ${flink.version}


    org.apache.flink
    flink-streaming-java_2.11
    ${flink.version}


    org.apache.flink
    flink-clients_2.11
    ${flink.version}


    org.apache.flink
    flink-connector-wikiedits_2.11
    ${flink.version}



    org.slf4j
    slf4j-nop
    1.7.29


生成可执行jar插件(这里没有用插件生成,使用时可以不加):

    org.apache.maven.plugins
    maven-assembly-plugin
    2.2.1
    
        
            
                com.xtsz.SocketWordCount
            
        
        
            jar-with-dependencies
        
    
    
        
            make-assembly
            package
            
                single
            
        
    

  1. CustomWordCount .java类
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

/**
 * 需求:实时的wordcount
 * 往端口中发送数据,实时的计算数据
 */
public class CustomWordCount {

    public static void main(String[] args) throws Exception {
        // 1.定义socket的端口号
        int port;
        try {
            ParameterTool parameterTool = ParameterTool.fromArgs(args);
            port = parameterTool.getInt("port");
        } catch (Exception e) {
            System.err.println("没有指定port参数,使用默认值9000");
            port = 9000;
        }
        //2.创建执行环境对象
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //3.得到套接字对象(指定:主机、端口、分隔符),连接socket获取输入的数据
        DataStreamSource dataStreamSource = env.socketTextStream("192.168.56.101", port, "\n");

        //4.解析数据,统计数据-单词计数 hello lz hello world
        DataStream> dataStream = dataStreamSource.flatMap(new FlatMapFunction>()  {
            public void flatMap(String s, Collector> collector) throws Exception {
                String[] words = s.split(" ");
                for (String word : words) {
                    collector.collect(new Tuple2(word, 1));
                }
            }
        })
                // 按照key进行分组,把每行的单词转为类型的数据
                .keyBy(0)  // 相同的单词进行分组
                // 设置窗口的时间长度 5秒一次窗口 1秒计算一次
                .timeWindow(Time.seconds(5), Time.seconds(1))   // 指定计算数据的窗口大小和滑动窗口大小
                .sum(1);

        // 5.打印可以设置并发度
        dataStream.print().setParallelism(1);

        //6.执行程序 因为flink是懒加载的,所以必须调用execute方法
        env.execute("streaming word count");
    }
}

  1. 服务器安装netcat
//  安装netcat
[root@master flink-1.9.1]# yum install -y nc

// 使用nc,其中9000是CustomWordCount类中定义的端口号
[root@master flink-1.9.1]# nc -lk -p 9000

开启IP为192.168.56.101的虚拟机,并开启该虚拟机的终端,在终端输入如下命令,该命令可以打开一个端口号为9000的监听,输入命令后光标会停留在如下图的地方。


端口监听
  1. 运行CustomWordCount类的main方法


    运行CustomWordCount类
运行CustomWordCount类
  1. 此时在服务器的nc下输入单词后,CustomWordCount的main方法会时时监控到该单词并进行计算处理。
    在虚拟机终端开的光标停留出,输入
hello hello world world world world

然后回车。在IDEA的控制台会显示如下单词和词频的信息,表示成功。


输入内容
查看结果

四、打包上传实现

  1. 打包


    打包

    打包

    打包

    打包

    打包
生成jar包

生成jar包
生成结果
  1. 上传到服务器运行


    上传到服务器

    上传成功
  2. 开启服务监听

// 使用nc,其中9000是CustomWordCount类中定义的端口号
[root@master flink-1.9.1]# nc -lk -p 9000
开启服务监听
  1. 提交作业


    提交作业

    查看作业

    作业运行中
  2. 输入数据

[root@master flink-1.9.1]#  nc -lk -p 9000
hello
world
hello
world
hello
hello
hello

  1. 查看结果


    查看输出

    注:因为是分布式集群,这里查看的是master节点数据,输出有可能在其它节点,这里在slave1上。

[root@slave1 flink-1.9.1]# tail -f  log/*-taskexecutor-*.out
(world,1)
(hello,1)
(hello,1)
(hello,2)
(hello,2)
(hello,3)
(hello,2)
(hello,2)
(hello,1)
(hello,1)

五、常见问题

  1. 解决"-bash: netstat: command not found"问题
[root@master flink-1.9.1]# yum install net-tools

你可能感兴趣的:(大数据--Flink--流处理(二))