Flink 1.10.0 读取并插入Hive1.2.1

FLink读取+插入Hive数据入坑指南

Flink1.9以上版本可以使用hivecatalog读取Hive数据,但是1.9对于Hive的版本支持不太友好,只支持2.3.41.2.1,笔者用的Hive版本是比较老的版本1.2.1,FLink是1.10.0,接下来说一说我在读取Hive数据和插入Hive数据期间遇到的问题。

本地环境:window10,Flink:1.10.0

目的:用本地电脑IDEA运行Flink程序读取测试环境集群的Hive表数据

首先我们可以参照Flink的官方文档加入任务需要的依赖:官网地址:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/hive/

Flink中文社区的FLInk-Hive文章:https://ververica.cn/developers/flink1-9-hive/

如图是官网提供的需要的依赖:



  org.apache.flink
  flink-connector-hive_2.11
  1.10.0
  provided



  org.apache.flink
  flink-table-api-java-bridge_2.11
  1.10.0
  provided




    org.apache.hive
    hive-exec
    ${hive.version}
    provided

然后在主程序里写好代码,运行是发现报了一个错:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: tesla-cluster

这里的意思是说找不到hdfs的地址,需要我们手动把测试环境的hdfs-site.xml复制到本地,放到resources目录下,此问题解决。

再次运行代码,发现又报一个错误:

Exception in thread "main" java.lang.NoSuchMethodError: com.facebook.fb303.FacebookService$Client.sendBaseOneway(Ljava/lang/String;Lorg/apache/thrift/TBase;)V

在网上找了很多资料后来发现是Jar包冲突,原因是我从Flink1.9.1迁移过来的,FLink1.9.1提供的依赖是这样的:

<dependency>
  <groupId>org.apache.flinkgroupId>
  <artifactId>flink-connector-hive_2.11artifactId>
  <version>1.9.0version>
  <scope>providedscope>
dependency>



<dependency>
  <groupId>org.apache.flinkgroupId>
  <artifactId>flink-hadoop-compatibility_2.11artifactId>
  <version>1.9.0version>
  <scope>providedscope>
dependency>



<dependency>
  <groupId>org.apache.flinkgroupId>
  <artifactId>flink-shaded-hadoop-2-uberartifactId>
  <version>2.6.5-8.0version>
  <scope>providedscope>
dependency>


<dependency>
    <groupId>org.apache.hivegroupId>
    <artifactId>hive-metastoreartifactId>
    <version>1.2.1version>
dependency>

<dependency>
    <groupId>org.apache.hivegroupId>
    <artifactId>hive-execartifactId>
    <version>1.2.1version>
dependency>

<dependency>
    <groupId>org.apache.thriftgroupId>
    <artifactId>libfb303artifactId>
    <version>0.9.3version>
dependency>

比Flink1.10多出来几个Jar包,其中的罪魁祸首是:

<dependency>
    <groupId>org.apache.thriftgroupId>
    <artifactId>libfb303artifactId>
    <version>0.9.3version>
dependency>

它于hive-exec包里的版本冲突了,将这个依赖删除就好了。你的程序可能会因为别的依赖导致冲突,找到对应的冲突的jar删除一个不需要的版本,就好了。

最后附上最基础的Flink读取Hive数据代码:

主程序:

import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.catalog.hive.HiveCatalog;

/**
 * @author:wy
 * @date:2020/3/31
 * @version:1.0.0
 * @description:
 */
public class IntelligenceAlter {
    public static void main(String[] args) throws Exception {
        EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
        TableEnvironment tableEnv = TableEnvironment.create(settings);
        String name            = "hive";
        String defaultDatabase = "tmp";
        String hiveConfDir     = "E:\\Hadoop\\";
        String version         = "1.2.1"; 

        HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, version);
        tableEnv.registerCatalog("hive", hive);
        tableEnv.registerCatalog(name, hive);
        tableEnv.useCatalog(name);

        tableEnv.sqlQuery("select * from tmp.tmp_flink_test_2").select("product_id");
        tableEnv.sqlUpdate("insert into tmp.tmp_flink_test_2 values ('newKey')");
        tableEnv.execute("insert into tmp");
    }
}

pom.xml

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-javaartifactId>
            <version>${flink.version}version>
            <scope>providedscope>
        dependency>

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-streaming-java_2.11artifactId>
            <version>${flink.version}version>
            <scope>providedscope>
        dependency>

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-connector-hive_2.11artifactId>
            <version>${flink.version}version>
            <scope>providedscope>
        dependency>

        

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-hadoop-compatibility_2.11artifactId>
            <version>${flink.version}version>
            <scope>providedscope>
        dependency>

        

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-shaded-hadoop-2-uberartifactId>
            <version>2.6.5-8.0version>
            <scope>providedscope>
        dependency>


        <dependency>
            <groupId>org.apache.hivegroupId>
            <artifactId>hive-execartifactId>
            <version>1.2.1version>
        dependency>



        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-planner-blink_2.11artifactId>
            <version>${flink.version}version>
        dependency>

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-api-java-bridge_2.11artifactId>
            <version>${flink.version}version>
        dependency>

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-planner_2.11artifactId>
            <version>${flink.version}version>
        dependency>

        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-commonartifactId>
            <version>${flink.version}version>
        dependency>

你可能感兴趣的:(Flink,初学)