flink打包运行

一个简单的flink打包运行的demo

第一步:环境准备

首先你要有个flink运行环境,我这里是使用虚拟机搭建的单机模式,启动flink,在8081端口就可以看见flink UI,在这里就可以进行flink作业的管理:
flink打包运行_第1张图片

第二步:flink代码的编写

我使用flink SQL编写的代码,代码很简单,使用SQL读入kafka中一个topic的消息,写入另一个kafka topic中:

package com.ms.flinksql;

import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;

public class Kafka2Kafka {
    public static void main(String[] args) {

        EnvironmentSettings settings = EnvironmentSettings.newInstance()
                .useBlinkPlanner()
                .inStreamingMode()
                .build();
        TableEnvironment tableEnv = TableEnvironment.create(settings);

        String ddlSource = "CREATE TABLE user_behavior (\n" +
                "    user_id BIGINT,\n" +
                "    item_id BIGINT,\n" +
                "    category_id BIGINT,\n" +
                "    behavior STRING,\n" +
                "    ts TIMESTAMP(3)\n" +
                ") WITH (\n" +
                "    'connector.type' = 'kafka',\n" +
                "    'connector.version' = 'universal',\n" +
                "    'connector.topic' = 'user_behavior',\n" +
                "    'connector.startup-mode' = 'latest-offset',\n" +
                "    'connector.properties.zookeeper.connect' = '192.168.126.128:2181',\n" +
                "    'connector.properties.bootstrap.servers' = '192.168.126.128:9092',\n" +
                "    'format.type' = 'json'\n" +
                ")";

        String ddlSink = "CREATE TABLE user_behavior_sink (\n" +
                "    user_id BIGINT,\n" +
                "    item_id BIGINT\n" +
                ") WITH (\n" +
                "    'connector.type' = 'kafka',\n" +
                "    'connector.version' = 'universal',\n" +
                "    'connector.topic' = 'user_behavior_sink',\n" +
                "    'connector.properties.zookeeper.connect' = '192.168.126.128:2181',\n" +
                "    'connector.properties.bootstrap.servers' = '192.168.126.128:9092',\n" +
                "    'format.type' = 'json',\n" +
                "    'update-mode' = 'append'\n" +
                ")";

        //提取读取到的数据,然后只要两个字段,重新发送到 Kafka 新 topic
        String sql = "insert into user_behavior_sink select user_id, item_id from user_behavior";

        tableEnv.executeSql(ddlSource);
        tableEnv.executeSql(ddlSink);
        tableEnv.executeSql(sql);
    }
}

maven的依赖如下:这里一些依赖是scope是provided,是啥意思捏?其实就是scope标注为provided在编译阶段会起作用,这样你的代码就不会出现找不到依赖的情况,可以通过编译从而打包,但是这些依赖不会真的打到jar里面。为啥要这样做呢,因为flink-table-api-java-bridge,flink-streaming-scala...这些包实际上flink环境是自带的,你根部不需要在项目的jar里打进去,这样jar包整的这么大还不讨好,何必呢?

<dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-api-java-bridge_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-scala_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-common</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-planner_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-table-planner-blink_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-kafka_${scala.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-json</artifactId>
        <version>${flink.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_${scala.version}</artifactId>
        <version>${flink.version}</version>
        <scope>provided</scope>
    </dependency>

</dependencies>

第三步:打包

为了将需要的依赖都打进jar包里面,在maven的pom.xml里面指定打包工具:

<build>
   <plugins>
        <plugin>
            <!--<groupId>org.apache.maven.plugins</groupId>-->
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.8.1</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>

        <plugin>
            <artifactId>maven-assembly-plugin </artifactId>
            <configuration>
                <descriptorRefs>
                    <!-- 此处填写打包后jar包后添加的标识 -->
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <archive>
                    <manifest>
                        <!-- 此处填写程序的主入口(main方法) -->
                        <mainClass>com.ms.flinksql.Kafka2Kafka</mainClass>
                    </manifest>
                </archive>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

使用idea打包
flink打包运行_第2张图片

第四步:把jar包上传到flink环境运行

flink打包运行_第3张图片
之后我们就可以在任务界面看到我们的任务啦:
flink打包运行_第4张图片

第五步:测试一下任务

打开kafka,在source topic写入数据,sink topic中就会实时输出:
flink打包运行_第5张图片

你可能感兴趣的:(大数据,flink,kafka,java)