Flink-1.9.1集成读写Hive(基于CDH 6.1.0集成失败过程记录)

写在前面

本文记录了一次基于CDH-6.1.0实现Flink-1.9.1集成读写Hive的过程,介绍了Flink自带sql-client连接Hive的方式以及java实现连接Hive的小demo,但是因为版本的原因没有执行成功,目前考虑其他方式对接Hive,所以方法仅供参考。

需要提醒的是Flink在1.9.x版本才提供集成读写Hive的功能,且是beta版,Flink官方表示目前Flink集成Hive仅支持2.3.4和1.2.1两个版本,我在利用CDH-6.1.0-Hadoop-3.0.0(Hive-2.1.1)集群集成Hive过程中发现,无论配置2.3.4和1.2.1都会出现错误。解决方案还是建议使用匹配或相近的对应Hive版本(起码大版本号要对应,否则出现方法找不到等错误)。

配置Flink-1.9.1使用Hive-2.3.4

首先修改flink-1.9.1/conf/sql-client-defaults.yaml配置,为hive配置catalog相关参数,cdh版本的hive-conf目录为:/etc/hive/conf.cloudera.hive。

[root@node01 lib]# vi /opt/flink-1.9.1/conf/sql-client-defaults.yaml
...
#==============================================================================
# Catalogs
#==============================================================================

# Define catalogs here.

#catalogs: [] # empty list
# A typical catalog definition looks like:
#  - name: myhive
#    type: hive
#    hive-conf-dir: /opt/hive_conf/
#    default-database: ...

catalogs:
   - name: myhive
     type: hive
     property-version: 1
     hive-conf-dir: /etc/hive/conf.cloudera.hive
     hive-version: 2.3.4

执行bin/sql-client.sh embedded启动sql-client,第一次报错:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...

Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
        at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
        ... 2 more
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.CatalogFactory' in
the classpath.

Reason: No context matches.

The following properties are requested:
hive-conf-dir=/etc/hive/conf.cloudera.hive
hive-version=2.3.4
property-version=1
type=hive

The following factories have been considered:
org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.table.planner.StreamPlannerFactory
org.apache.flink.table.executor.StreamExecutorFactory
org.apache.flink.table.planner.delegation.BlinkPlannerFactory
org.apache.flink.table.planner.delegation.BlinkExecutorFactory
        at org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:283)
        at org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:191)
        at org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:144)
        at org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:114)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:258)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
        at java.util.HashMap.forEach(HashMap.java:1289)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.(ExecutionContext.java:135)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
        ... 4 more

加入jar:

{flink-home}/flink-connectors/flink-connector-hive/target/flink-connector-hive_2.11-1.9.1.jar
{flink-home}/flink-connectors/flink-hadoop-compatibility/target/flink-hadoop-compatibility_2.11-1.9.1.jar

运行再次报错:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...

Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
        at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
        ... 2 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/common/util/HiveVersionInfo
        at org.apache.flink.table.catalog.hive.client.HiveShimLoader.getHiveVersion(HiveShimLoader.java:58)
        at org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory.createCatalog(HiveCatalogFactory.java:82)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:259)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
        at java.util.HashMap.forEach(HashMap.java:1289)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.(ExecutionContext.java:135)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
        ... 4 more
Caused by: java.lang.ClassNotFoundException: org.apache.hive.common.util.HiveVersionInfo
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 11 more

报错显示缺少Hive相关的jar包,sql-client的jar管理直接放在{flink-home}/lib下,且Hive的版本支持2.3.4和1.2.1,我的Hive版本:Hive 2.1.1-cdh6.1.0,根据版本最近选择2.3.4,下载Hive-2.3.4的安装包:http://archive.apache.org/dist/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz(Hive-1.2.1:http://archive.apache.org/dist/hive/hive-1.2.1/)

拷贝{hive-home}/lib中的相关包:

{hive-home}/lib/hive-exec-2.3.4.jar
{hive-home}/lib/hive-common-2.3.4.jar
{hive-home}/lib/hive-metastore-2.3.4.jar
{hive-home}/lib/hive-shims-common-2.3.4.jar
{hive-home}/lib/antlr-runtime-3.5.2.jar
{hive-home}/lib/datanucleus-api-jdo-4.2.4.jar
{hive-home}/lib/datanucleus-core-4.1.17.jar
{hive-home}/lib/datanucleus-rdbms-4.1.19.jar
{hive-home}/lib/javax.jdo-3.2.0-m3.jar
{hive-home}/lib/libfb303-0.9.3.jar
{hive-home}/lib/commons-cli-1.2.jar
{hive-home}/lib/mysql-connector-java-5.1.34.jar

报错:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
        at org.apache.flink.table.client.cli.CliOptionsParser.(CliOptionsParser.java:43)
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:188)

将之前导入的{hive-home}/lib/commons-cli-1.2.jar改为:commons-cli-1.3.1.jar,再次运行成功:

[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
No default environment specified.
Searching for '/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...done.

                                   ▒▓██▓██▒
                               ▓████▒▒█▓▒▓███▓▒
                            ▓███▓░░        ▒▒▒▓██▒  ▒
                          ░██▒   ▒▒▓▓█▓▓▒░      ▒████
                          ██▒         ░▒▓███▒    ▒█▒█▒
                            ░▓█            ███   ▓░▒██
                              ▓█       ▒▒▒▒▒▓██▓░▒░▓▓█
                            █░ █   ▒▒░       ███▓▓█ ▒█▒▒▒
                            ████░   ▒▓█▓      ██▒▒▒ ▓███▒
                         ░▒█▓▓██       ▓█▒    ▓█▒▓██▓ ░█░
                   ▓░▒▓████▒ ██         ▒█    █▓░▒█▒░▒█▒
                  ███▓░██▓  ▓█           █   █▓ ▒▓█▓▓█▒
                ░██▓  ░█░            █  █▒ ▒█████▓▒ ██▓░▒
               ███░ ░ █░          ▓ ░█ █████▒░░    ░█░▓  ▓░
              ██▓█ ▒▒▓▒          ▓███████▓░       ▒█▒ ▒▓ ▓██▓
           ▒██▓ ▓█ █▓█       ░▒█████▓▓▒░         ██▒▒  █ ▒  ▓█▒
           ▓█▓  ▓█ ██▓ ░▓▓▓▓▓▓▓▒              ▒██▓           ░█▒
           ▓█    █ ▓███▓▒░              ░▓▓▓███▓          ░▒░ ▓█
           ██▓    ██▒    ░▒▓▓███▓▓▓▓▓██████▓▒            ▓███  █
          ▓███▒ ███   ░▓▓▒░░   ░▓████▓░                  ░▒▓▒  █▓
          █▓▒▒▓▓██  ░▒▒░░░▒▒▒▒▓██▓░                            █▓
          ██ ▓░▒█   ▓▓▓▓▒░░  ▒█▓       ▒▓▓██▓    ▓▒          ▒▒▓
          ▓█▓ ▓▒█  █▓░  ░▒▓▓██▒            ░▓█▒   ▒▒▒░▒▒▓█████▒
           ██░ ▓█▒█▒  ▒▓▓▒  ▓█                █░      ░░░░   ░█▒
           ▓█   ▒█▓   ░     █░                ▒█              █▓
            █▓   ██         █░                 ▓▓        ▒█▓▓▓▒█░
             █▓ ░▓██░       ▓▒                  ▓█▓▒░░░▒▓█░    ▒█
              ██   ▓█▓░      ▒                    ░▒█▒██▒      ▓▓
               ▓█▒   ▒█▓▒░                         ▒▒ █▒█▓▒▒░░▒██
                ░██▒    ▒▓▓▒                     ▓██▓▒█▒ ░▓▓▓▓▒█▓
                  ░▓██▒                          ▓░  ▒█▓█  ░░▒▒▒
                      ▒▓▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░▓▓  ▓░▒█░
          
    ______ _ _       _       _____  ____  _         _____ _ _            _  BETA   
   |  ____| (_)     | |     / ____|/ __ \| |       / ____| (_)          | |  
   | |__  | |_ _ __ | | __ | (___ | |  | | |      | |    | |_  ___ _ __ | |_ 
   |  __| | | | '_ \| |/ /  \___ \| |  | | |      | |    | | |/ _ \ '_ \| __|
   | |    | | | | | |   <   ____) | |__| | |____  | |____| | |  __/ | | | |_ 
   |_|    |_|_|_| |_|_|\_\ |_____/ \___\_\______|  \_____|_|_|\___|_| |_|\__|
          
        Welcome! Enter 'HELP;' to list all available commands. 'QUIT;' to exit.


Flink SQL> 

配置过程中参考了这篇博客:https://blog.csdn.net/h335146502/article/details/100689010

博主踩坑后的分享非常珍贵,节省了很多时间,具体解决的话需要根据报错类去找相应的hive包。

本来以为到这里大功告成了,然而在测试过程中还是会报错误,基本上都是由于版本的差异导致的,这里仿照官网在Hive中创建一个mytable表:

CREATE TABLE mytable(name string, value double);

测试效果如下:

Flink SQL> show catalogs;
default_catalog
myhive

Flink SQL> use catalog myhive;

Flink SQL> show databases;
2020-01-08 15:14:48,019 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-08 15:14:48,020 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-08 15:14:48,020 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-08 15:14:48,020 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.vectorized.input.format.excludes does not exist
default
test_myq

Flink SQL> use test_myq;

Flink SQL> show tables;
2020-01-08 15:14:56,783 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-08 15:14:56,783 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-08 15:14:56,783 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-08 15:14:56,784 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.vectorized.input.format.excludes does not exist
mytable
mytest

Flink SQL> select * from mytable;
[ERROR] Could not execute SQL statement. Reason:
org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'

这里的报错尝试了网上说的很多方法,最终判断认为是版本差异所致。

中间还尝试通过代码进行操作:

package com.mort.flink.hive;

import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TupleTypeInfo;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment;
import org.apache.flink.table.catalog.hive.HiveCatalog;

public class ReadHive {
    public static void main(String[] args) {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env);

        String catalogName     = "myhive";
        String defaultDatabase = "default";
        String hiveConfDir     = "/etc/hive/conf.cloudera.hive/";
        String version         = "2.3.4"; // or 1.2.1

        HiveCatalog hive = new HiveCatalog(catalogName, defaultDatabase, hiveConfDir, version);
        tableEnv.registerCatalog(catalogName, hive);

        try {
            tableEnv.useCatalog(catalogName);
            tableEnv.useDatabase("test_myq");

            Table mytable = tableEnv.sqlQuery("select * from mytable");
            //将table转换成DataSet
            // convert the Table into a DataSet of Tuple2 via a TypeInformation
            TupleTypeInfo<Tuple2<String, Integer>> tupleType = new TupleTypeInfo<>(
                    Types.STRING,
                    Types.DOUBLE);
            DataSet<Tuple2<String, Integer>> dsTuple = tableEnv.toDataSet(mytable, tupleType);

            dsTuple.print();
            tableEnv.execute("TestFlinkHive");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

pom.xml如下:


<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0modelVersion>

    <groupId>com.wonders.flink.hivegroupId>
    <artifactId>flink-hiveartifactId>
    <version>1.0-SNAPSHOTversion>

    
    <repositories>
        <repository>
            <id>clouderaid>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/url>
        repository>
    repositories>

    <dependencies>
        
        
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-streaming-scala_2.11artifactId>
            <version>1.9.1version>
        dependency>
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-scala_2.11artifactId>
            <version>1.9.1version>
        dependency>

        
        <dependency>
            <groupId>org.slf4jgroupId>
            <artifactId>slf4j-apiartifactId>
            <version>1.7.25version>
        dependency>
        <dependency>
            <groupId>org.slf4jgroupId>
            <artifactId>slf4j-log4j12artifactId>
            <version>1.7.25version>
        dependency>

        
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-planner_2.11artifactId>
            <version>1.9.1version>
        dependency>
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-api-scala-bridge_2.11artifactId>
            <version>1.9.1version>
        dependency>
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-api-scala_2.11artifactId>
            <version>1.9.1version>
        dependency>
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-table-commonartifactId>
            <version>1.9.1version>
        dependency>

        
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-connector-hive_2.11artifactId>
            <version>1.9.1version>
            <scope>providedscope>
        dependency>
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-hadoop-compatibility_2.11artifactId>
            <version>1.9.1version>
            <scope>providedscope>
        dependency>
        
        <dependency>
            <groupId>org.apache.flinkgroupId>
            <artifactId>flink-shaded-hadoop-2-uberartifactId>
            <version>2.7.5-8.0version>
            <scope>providedscope>
        dependency>
        
        <dependency>
            <groupId>org.apache.hivegroupId>
            <artifactId>hive-execartifactId>
            <version>2.3.4version>
        dependency>
    dependencies>

    <build>
        <plugins>
            
            <plugin>
                <groupId>org.apache.maven.pluginsgroupId>
                <artifactId>maven-compiler-pluginartifactId>
                <version>3.5.1version>
                <configuration>
                    <source>1.8source>
                    <target>1.8target>
                    <encoding>UTF-8encoding>
                configuration>
            plugin>
            
            <plugin>
                <groupId>net.alchim31.mavengroupId>
                <artifactId>scala-maven-pluginartifactId>
                <version>3.2.2version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compilegoal>
                            <goal>testCompilegoal>
                        goals>
                    execution>
                executions>
            plugin>
            
            <plugin>
                <artifactId>maven-assembly-pluginartifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependenciesdescriptorRef>
                    descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>mainClass>
                        manifest>
                    archive>
                configuration>
                <executions>
                    <execution>
                        <id>make-assemblyid>
                        <phase>packagephase>
                        <goals>
                            <goal>singlegoal>
                        goals>
                    execution>
                executions>
            plugin>
        plugins>
    build>

project>

将其打包,使用以下命令执行,但是报错依旧是找不到相应的方法:

[root@node01 flink-1.9.1]# bin/flink run --class com.mort.flink.hive.ReadHive ~/maojars/flink-hive-1.0-SNAPSHOT-jar-with-dependencies.jar 
Starting execution of program
2020-01-07 17:24:49,166 INFO  org.apache.hadoop.hive.conf.HiveConf                          - Found configuration file null
2020-01-07 17:24:49,387 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-07 17:24:49,387 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-07 17:24:49,387 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-07 17:24:49,388 WARN  org.apache.hadoop.hive.conf.HiveConf                          - HiveConf of name hive.vectorized.input.format.excludes does not exist
org.apache.flink.table.api.ValidationException: SQL validation failed. A failure occurred when accessing table. Table path [myhive, test_myq, mytable]
        at org.apache.flink.table.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:128)
        at org.apache.flink.table.api.internal.TableEnvImpl.sqlQuery(TableEnvImpl.scala:431)
        at com.wonders.flink.hive.ReadHive.main(ReadHive.java:32)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
        at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
        at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
        at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
        at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
        at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
Caused by: org.apache.flink.table.api.TableException: A failure occurred when accessing table. Table path [myhive, test_myq, mytable]
        at org.apache.flink.table.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:95)
        at org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
        at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:289)
        at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
        at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
        at org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
        at org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
        at org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
        at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:997)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:957)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3111)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3093)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3365)
        at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
        at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:997)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:957)
        at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:932)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:639)
        at org.apache.flink.table.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:124)
        ... 19 more
Caused by: org.apache.flink.table.catalog.exceptions.CatalogException: Failed to check whether table test_myq.mytable exists or not.
        at org.apache.flink.table.catalog.hive.HiveCatalog.tableExists(HiveCatalog.java:481)
        at org.apache.flink.table.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:77)
        ... 40 more
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1458)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
        at com.sun.proxy.$Proxy4.tableExists(Unknown Source)
        at org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper.tableExists(HiveMetastoreClientWrapper.java:98)
        at org.apache.flink.table.catalog.hive.HiveCatalog.tableExists(HiveCatalog.java:476)
        ... 41 more

The program didn't contain a Flink job. Perhaps you forgot to call execute() on the execution environment.

总结一下,该错误的出现是因为在使用Flink-1.9.1支持的Hive-2.3.4版本时,部分方法如get_table_req等在对接本地实际安装的Hive-2.1.1时找不到该方法。

查看源码得知在Flink-1.9.1中Select操作之前会判断表是否存在,调用以下方法:

public boolean tableExists(String databaseName, String tableName)
	throws MetaException, TException, UnknownDBException {
	return client.tableExists(databaseName, tableName);
}

而client.tableExitsts是hive-metastore-2.3.4.jar包中HiveMetaStoreClient类的一个方法,具体实现如下:

public boolean tableExists(String databaseName, String tableName) throws MetaException,
    TException, UnknownDBException {
  try {
    GetTableRequest req = new GetTableRequest(databaseName, tableName);
    req.setCapabilities(version);
    return filterHook.filterTable(client.get_table_req(req).getTable()) != null;
  } catch (NoSuchObjectException e) {
    return false;
  }
}

可以看到这里最终调用了thrift接口中的get_table_req方法,这一步在hive-metastore-2.1.1.jar中的实现如下:

public boolean tableExists(String databaseName, String tableName) throws MetaException,
  TException, UnknownDBException {
  try {
    return filterHook.filterTable(client.get_table(databaseName, tableName)) != null;
  } catch (NoSuchObjectException e) {
    return false;
  }
}

可以看到其实现使用的是get_table方法,所以Flink执行过程中才会报找不到thrift接口中的get_table_req方法的错误。

参考:

Hive-2.3.6 API文档

Hive-2.1.1 API文档

配置Flink-1.9.1使用Hive-1.2.1

于是决定退而求其次,将lib下的hive相关jar包替换成1.2.1版本:

{hive-home}/lib/hive-exec-1.2.1.jar
{hive-home}/lib/hive-common-1.2.1.jar
{hive-home}/lib/hive-metastore-1.2.1.jar
{hive-home}/lib/hive-shims-common-1.2.1.jar
{hive-home}/lib/antlr-runtime-3.4.jar
{hive-home}/lib/datanucleus-api-jdo-3.2.6.jar
{hive-home}/lib/datanucleus-core-3.2.10.jar
{hive-home}/lib/datanucleus-rdbms-3.2.9.jar
{hive-home}/lib/javax.jdo-3.2.0-m3.jar
{hive-home}/lib/libfb303-0.9.3.jar
{hive-home}/lib/commons-cli-1.3.1.jar
{hive-home}/lib/mysql-connector-java-5.1.34.jar
{hive-home}/lib/libthrift-0.9.2.jar
{hive-home}/lib/hive-serde-1.2.1.jar

继续运行,依然报错:

No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...

Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
        at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
        at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
        ... 2 more
Caused by: java.lang.ExceptionInInitializerError
        at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105)
        at org.apache.flink.table.catalog.hive.HiveCatalog.createHiveConf(HiveCatalog.java:151)
        at org.apache.flink.table.catalog.hive.HiveCatalog.(HiveCatalog.java:130)
        at org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory.createCatalog(HiveCatalogFactory.java:84)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:259)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
        at java.util.HashMap.forEach(HashMap.java:1289)
        at org.apache.flink.table.client.gateway.local.ExecutionContext.(ExecutionContext.java:135)
        at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
        ... 4 more
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-cdh6.1.0
        at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
        at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
        at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
        at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:368)
        ... 13 more

这个错误的出现意味着,该Hive版本的jar包不支持该CDH的hadoop版本(Hadoop 3.0.0-cdh6.1.0)。

hive版本 1.2.1 支持hadoop版本:
Hadoop 1.x.y, 2.x.y

总结

Flink官方表示目前Flink集成Hive仅支持2.3.4和1.2.1两个版本,我在利用CDH-6.1.0-Hadoop-3.0.0(Hive-2.1.1)集群集成Hive过程中发现,无论配置2.3.4和1.2.1都会出现错误。解决方案还是建议使用匹配或相近的Hive版本(起码大版本要匹配),否则相关方法不匹配会报错。

你可能感兴趣的:(Hive,Flink)