本文记录了一次基于CDH-6.1.0实现Flink-1.9.1集成读写Hive的过程,介绍了Flink自带sql-client连接Hive的方式以及java实现连接Hive的小demo,但是因为版本的原因没有执行成功,目前考虑其他方式对接Hive,所以方法仅供参考。
需要提醒的是Flink在1.9.x版本才提供集成读写Hive的功能,且是beta版,Flink官方表示目前Flink集成Hive仅支持2.3.4和1.2.1两个版本,我在利用CDH-6.1.0-Hadoop-3.0.0(Hive-2.1.1)集群集成Hive过程中发现,无论配置2.3.4和1.2.1都会出现错误。解决方案还是建议使用匹配或相近的对应Hive版本(起码大版本号要对应,否则出现方法找不到等错误)。
首先修改flink-1.9.1/conf/sql-client-defaults.yaml配置,为hive配置catalog相关参数,cdh版本的hive-conf目录为:/etc/hive/conf.cloudera.hive。
[root@node01 lib]# vi /opt/flink-1.9.1/conf/sql-client-defaults.yaml
...
#==============================================================================
# Catalogs
#==============================================================================
# Define catalogs here.
#catalogs: [] # empty list
# A typical catalog definition looks like:
# - name: myhive
# type: hive
# hive-conf-dir: /opt/hive_conf/
# default-database: ...
catalogs:
- name: myhive
type: hive
property-version: 1
hive-conf-dir: /etc/hive/conf.cloudera.hive
hive-version: 2.3.4
执行bin/sql-client.sh embedded启动sql-client,第一次报错:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...
Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
... 2 more
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.factories.CatalogFactory' in
the classpath.
Reason: No context matches.
The following properties are requested:
hive-conf-dir=/etc/hive/conf.cloudera.hive
hive-version=2.3.4
property-version=1
type=hive
The following factories have been considered:
org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.table.planner.StreamPlannerFactory
org.apache.flink.table.executor.StreamExecutorFactory
org.apache.flink.table.planner.delegation.BlinkPlannerFactory
org.apache.flink.table.planner.delegation.BlinkExecutorFactory
at org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:283)
at org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:191)
at org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:144)
at org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:114)
at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:258)
at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
at java.util.HashMap.forEach(HashMap.java:1289)
at org.apache.flink.table.client.gateway.local.ExecutionContext.(ExecutionContext.java:135)
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
... 4 more
加入jar:
{flink-home}/flink-connectors/flink-connector-hive/target/flink-connector-hive_2.11-1.9.1.jar
{flink-home}/flink-connectors/flink-hadoop-compatibility/target/flink-hadoop-compatibility_2.11-1.9.1.jar
运行再次报错:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...
Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
... 2 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hive/common/util/HiveVersionInfo
at org.apache.flink.table.catalog.hive.client.HiveShimLoader.getHiveVersion(HiveShimLoader.java:58)
at org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory.createCatalog(HiveCatalogFactory.java:82)
at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:259)
at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
at java.util.HashMap.forEach(HashMap.java:1289)
at org.apache.flink.table.client.gateway.local.ExecutionContext.(ExecutionContext.java:135)
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
... 4 more
Caused by: java.lang.ClassNotFoundException: org.apache.hive.common.util.HiveVersionInfo
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
报错显示缺少Hive相关的jar包,sql-client的jar管理直接放在{flink-home}/lib下,且Hive的版本支持2.3.4和1.2.1,我的Hive版本:Hive 2.1.1-cdh6.1.0,根据版本最近选择2.3.4,下载Hive-2.3.4的安装包:http://archive.apache.org/dist/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz(Hive-1.2.1:http://archive.apache.org/dist/hive/hive-1.2.1/)
拷贝{hive-home}/lib中的相关包:
{hive-home}/lib/hive-exec-2.3.4.jar
{hive-home}/lib/hive-common-2.3.4.jar
{hive-home}/lib/hive-metastore-2.3.4.jar
{hive-home}/lib/hive-shims-common-2.3.4.jar
{hive-home}/lib/antlr-runtime-3.5.2.jar
{hive-home}/lib/datanucleus-api-jdo-4.2.4.jar
{hive-home}/lib/datanucleus-core-4.1.17.jar
{hive-home}/lib/datanucleus-rdbms-4.1.19.jar
{hive-home}/lib/javax.jdo-3.2.0-m3.jar
{hive-home}/lib/libfb303-0.9.3.jar
{hive-home}/lib/commons-cli-1.2.jar
{hive-home}/lib/mysql-connector-java-5.1.34.jar
报错:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
at org.apache.flink.table.client.cli.CliOptionsParser.(CliOptionsParser.java:43)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:188)
将之前导入的{hive-home}/lib/commons-cli-1.2.jar改为:commons-cli-1.3.1.jar,再次运行成功:
[root@node01 flink-1.9.1]# bin/sql-client.sh embedded
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
No default environment specified.
Searching for '/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/module/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...done.
▒▓██▓██▒
▓████▒▒█▓▒▓███▓▒
▓███▓░░ ▒▒▒▓██▒ ▒
░██▒ ▒▒▓▓█▓▓▒░ ▒████
██▒ ░▒▓███▒ ▒█▒█▒
░▓█ ███ ▓░▒██
▓█ ▒▒▒▒▒▓██▓░▒░▓▓█
█░ █ ▒▒░ ███▓▓█ ▒█▒▒▒
████░ ▒▓█▓ ██▒▒▒ ▓███▒
░▒█▓▓██ ▓█▒ ▓█▒▓██▓ ░█░
▓░▒▓████▒ ██ ▒█ █▓░▒█▒░▒█▒
███▓░██▓ ▓█ █ █▓ ▒▓█▓▓█▒
░██▓ ░█░ █ █▒ ▒█████▓▒ ██▓░▒
███░ ░ █░ ▓ ░█ █████▒░░ ░█░▓ ▓░
██▓█ ▒▒▓▒ ▓███████▓░ ▒█▒ ▒▓ ▓██▓
▒██▓ ▓█ █▓█ ░▒█████▓▓▒░ ██▒▒ █ ▒ ▓█▒
▓█▓ ▓█ ██▓ ░▓▓▓▓▓▓▓▒ ▒██▓ ░█▒
▓█ █ ▓███▓▒░ ░▓▓▓███▓ ░▒░ ▓█
██▓ ██▒ ░▒▓▓███▓▓▓▓▓██████▓▒ ▓███ █
▓███▒ ███ ░▓▓▒░░ ░▓████▓░ ░▒▓▒ █▓
█▓▒▒▓▓██ ░▒▒░░░▒▒▒▒▓██▓░ █▓
██ ▓░▒█ ▓▓▓▓▒░░ ▒█▓ ▒▓▓██▓ ▓▒ ▒▒▓
▓█▓ ▓▒█ █▓░ ░▒▓▓██▒ ░▓█▒ ▒▒▒░▒▒▓█████▒
██░ ▓█▒█▒ ▒▓▓▒ ▓█ █░ ░░░░ ░█▒
▓█ ▒█▓ ░ █░ ▒█ █▓
█▓ ██ █░ ▓▓ ▒█▓▓▓▒█░
█▓ ░▓██░ ▓▒ ▓█▓▒░░░▒▓█░ ▒█
██ ▓█▓░ ▒ ░▒█▒██▒ ▓▓
▓█▒ ▒█▓▒░ ▒▒ █▒█▓▒▒░░▒██
░██▒ ▒▓▓▒ ▓██▓▒█▒ ░▓▓▓▓▒█▓
░▓██▒ ▓░ ▒█▓█ ░░▒▒▒
▒▓▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░▓▓ ▓░▒█░
______ _ _ _ _____ ____ _ _____ _ _ _ BETA
| ____| (_) | | / ____|/ __ \| | / ____| (_) | |
| |__ | |_ _ __ | | __ | (___ | | | | | | | | |_ ___ _ __ | |_
| __| | | | '_ \| |/ / \___ \| | | | | | | | | |/ _ \ '_ \| __|
| | | | | | | | < ____) | |__| | |____ | |____| | | __/ | | | |_
|_| |_|_|_| |_|_|\_\ |_____/ \___\_\______| \_____|_|_|\___|_| |_|\__|
Welcome! Enter 'HELP;' to list all available commands. 'QUIT;' to exit.
Flink SQL>
配置过程中参考了这篇博客:https://blog.csdn.net/h335146502/article/details/100689010
博主踩坑后的分享非常珍贵,节省了很多时间,具体解决的话需要根据报错类去找相应的hive包。
本来以为到这里大功告成了,然而在测试过程中还是会报错误,基本上都是由于版本的差异导致的,这里仿照官网在Hive中创建一个mytable表:
CREATE TABLE mytable(name string, value double);
测试效果如下:
Flink SQL> show catalogs;
default_catalog
myhive
Flink SQL> use catalog myhive;
Flink SQL> show databases;
2020-01-08 15:14:48,019 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-08 15:14:48,020 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-08 15:14:48,020 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-08 15:14:48,020 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.vectorized.input.format.excludes does not exist
default
test_myq
Flink SQL> use test_myq;
Flink SQL> show tables;
2020-01-08 15:14:56,783 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-08 15:14:56,783 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-08 15:14:56,783 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-08 15:14:56,784 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.vectorized.input.format.excludes does not exist
mytable
mytest
Flink SQL> select * from mytable;
[ERROR] Could not execute SQL statement. Reason:
org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
这里的报错尝试了网上说的很多方法,最终判断认为是版本差异所致。
中间还尝试通过代码进行操作:
package com.mort.flink.hive;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TupleTypeInfo;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment;
import org.apache.flink.table.catalog.hive.HiveCatalog;
public class ReadHive {
public static void main(String[] args) {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env);
String catalogName = "myhive";
String defaultDatabase = "default";
String hiveConfDir = "/etc/hive/conf.cloudera.hive/";
String version = "2.3.4"; // or 1.2.1
HiveCatalog hive = new HiveCatalog(catalogName, defaultDatabase, hiveConfDir, version);
tableEnv.registerCatalog(catalogName, hive);
try {
tableEnv.useCatalog(catalogName);
tableEnv.useDatabase("test_myq");
Table mytable = tableEnv.sqlQuery("select * from mytable");
//将table转换成DataSet
// convert the Table into a DataSet of Tuple2 via a TypeInformation
TupleTypeInfo<Tuple2<String, Integer>> tupleType = new TupleTypeInfo<>(
Types.STRING,
Types.DOUBLE);
DataSet<Tuple2<String, Integer>> dsTuple = tableEnv.toDataSet(mytable, tupleType);
dsTuple.print();
tableEnv.execute("TestFlinkHive");
} catch (Exception e) {
e.printStackTrace();
}
}
}
pom.xml如下:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0modelVersion>
<groupId>com.wonders.flink.hivegroupId>
<artifactId>flink-hiveartifactId>
<version>1.0-SNAPSHOTversion>
<repositories>
<repository>
<id>clouderaid>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/url>
repository>
repositories>
<dependencies>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-streaming-scala_2.11artifactId>
<version>1.9.1version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-scala_2.11artifactId>
<version>1.9.1version>
dependency>
<dependency>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-apiartifactId>
<version>1.7.25version>
dependency>
<dependency>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-log4j12artifactId>
<version>1.7.25version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-planner_2.11artifactId>
<version>1.9.1version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-api-scala-bridge_2.11artifactId>
<version>1.9.1version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-api-scala_2.11artifactId>
<version>1.9.1version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-commonartifactId>
<version>1.9.1version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-connector-hive_2.11artifactId>
<version>1.9.1version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-hadoop-compatibility_2.11artifactId>
<version>1.9.1version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-shaded-hadoop-2-uberartifactId>
<version>2.7.5-8.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.hivegroupId>
<artifactId>hive-execartifactId>
<version>2.3.4version>
dependency>
dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.pluginsgroupId>
<artifactId>maven-compiler-pluginartifactId>
<version>3.5.1version>
<configuration>
<source>1.8source>
<target>1.8target>
<encoding>UTF-8encoding>
configuration>
plugin>
<plugin>
<groupId>net.alchim31.mavengroupId>
<artifactId>scala-maven-pluginartifactId>
<version>3.2.2version>
<executions>
<execution>
<goals>
<goal>compilegoal>
<goal>testCompilegoal>
goals>
execution>
executions>
plugin>
<plugin>
<artifactId>maven-assembly-pluginartifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependenciesdescriptorRef>
descriptorRefs>
<archive>
<manifest>
<mainClass>mainClass>
manifest>
archive>
configuration>
<executions>
<execution>
<id>make-assemblyid>
<phase>packagephase>
<goals>
<goal>singlegoal>
goals>
execution>
executions>
plugin>
plugins>
build>
project>
将其打包,使用以下命令执行,但是报错依旧是找不到相应的方法:
[root@node01 flink-1.9.1]# bin/flink run --class com.mort.flink.hive.ReadHive ~/maojars/flink-hive-1.0-SNAPSHOT-jar-with-dependencies.jar
Starting execution of program
2020-01-07 17:24:49,166 INFO org.apache.hadoop.hive.conf.HiveConf - Found configuration file null
2020-01-07 17:24:49,387 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2020-01-07 17:24:49,387 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2020-01-07 17:24:49,387 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2020-01-07 17:24:49,388 WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.vectorized.input.format.excludes does not exist
org.apache.flink.table.api.ValidationException: SQL validation failed. A failure occurred when accessing table. Table path [myhive, test_myq, mytable]
at org.apache.flink.table.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:128)
at org.apache.flink.table.api.internal.TableEnvImpl.sqlQuery(TableEnvImpl.scala:431)
at com.wonders.flink.hive.ReadHive.main(ReadHive.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:576)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:438)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
Caused by: org.apache.flink.table.api.TableException: A failure occurred when accessing table. Table path [myhive, test_myq, mytable]
at org.apache.flink.table.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:95)
at org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:289)
at org.apache.calcite.sql.validate.EmptyScope.resolve_(EmptyScope.java:143)
at org.apache.calcite.sql.validate.EmptyScope.resolveTable(EmptyScope.java:99)
at org.apache.calcite.sql.validate.DelegatingScope.resolveTable(DelegatingScope.java:203)
at org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl(IdentifierNamespace.java:105)
at org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl(IdentifierNamespace.java:177)
at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:997)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:957)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3111)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3093)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3365)
at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:997)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:957)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:932)
at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:639)
at org.apache.flink.table.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:124)
... 19 more
Caused by: org.apache.flink.table.catalog.exceptions.CatalogException: Failed to check whether table test_myq.mytable exists or not.
at org.apache.flink.table.catalog.hive.HiveCatalog.tableExists(HiveCatalog.java:481)
at org.apache.flink.table.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:77)
... 40 more
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1458)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
at com.sun.proxy.$Proxy4.tableExists(Unknown Source)
at org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper.tableExists(HiveMetastoreClientWrapper.java:98)
at org.apache.flink.table.catalog.hive.HiveCatalog.tableExists(HiveCatalog.java:476)
... 41 more
The program didn't contain a Flink job. Perhaps you forgot to call execute() on the execution environment.
总结一下,该错误的出现是因为在使用Flink-1.9.1支持的Hive-2.3.4版本时,部分方法如get_table_req等在对接本地实际安装的Hive-2.1.1时找不到该方法。
查看源码得知在Flink-1.9.1中Select操作之前会判断表是否存在,调用以下方法:
public boolean tableExists(String databaseName, String tableName)
throws MetaException, TException, UnknownDBException {
return client.tableExists(databaseName, tableName);
}
而client.tableExitsts是hive-metastore-2.3.4.jar包中HiveMetaStoreClient类的一个方法,具体实现如下:
public boolean tableExists(String databaseName, String tableName) throws MetaException,
TException, UnknownDBException {
try {
GetTableRequest req = new GetTableRequest(databaseName, tableName);
req.setCapabilities(version);
return filterHook.filterTable(client.get_table_req(req).getTable()) != null;
} catch (NoSuchObjectException e) {
return false;
}
}
可以看到这里最终调用了thrift接口中的get_table_req方法,这一步在hive-metastore-2.1.1.jar中的实现如下:
public boolean tableExists(String databaseName, String tableName) throws MetaException,
TException, UnknownDBException {
try {
return filterHook.filterTable(client.get_table(databaseName, tableName)) != null;
} catch (NoSuchObjectException e) {
return false;
}
}
可以看到其实现使用的是get_table方法,所以Flink执行过程中才会报找不到thrift接口中的get_table_req方法的错误。
参考:
Hive-2.3.6 API文档
Hive-2.1.1 API文档
于是决定退而求其次,将lib下的hive相关jar包替换成1.2.1版本:
{hive-home}/lib/hive-exec-1.2.1.jar
{hive-home}/lib/hive-common-1.2.1.jar
{hive-home}/lib/hive-metastore-1.2.1.jar
{hive-home}/lib/hive-shims-common-1.2.1.jar
{hive-home}/lib/antlr-runtime-3.4.jar
{hive-home}/lib/datanucleus-api-jdo-3.2.6.jar
{hive-home}/lib/datanucleus-core-3.2.10.jar
{hive-home}/lib/datanucleus-rdbms-3.2.9.jar
{hive-home}/lib/javax.jdo-3.2.0-m3.jar
{hive-home}/lib/libfb303-0.9.3.jar
{hive-home}/lib/commons-cli-1.3.1.jar
{hive-home}/lib/mysql-connector-java-5.1.34.jar
{hive-home}/lib/libthrift-0.9.2.jar
{hive-home}/lib/hive-serde-1.2.1.jar
继续运行,依然报错:
No default environment specified.
Searching for '/opt/flink-1.9.1/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/flink-1.9.1/conf/sql-client-defaults.yaml
No session environment specified.
Validating current environment...
Exception in thread "main" org.apache.flink.table.client.SqlClientException: The configured environment is invalid. Please check your environment files again.
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:147)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:99)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:194)
Caused by: org.apache.flink.table.client.gateway.SqlExecutionException: Could not create execution context.
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:562)
at org.apache.flink.table.client.gateway.local.LocalExecutor.validateSession(LocalExecutor.java:382)
at org.apache.flink.table.client.SqlClient.validateEnvironment(SqlClient.java:144)
... 2 more
Caused by: java.lang.ExceptionInInitializerError
at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105)
at org.apache.flink.table.catalog.hive.HiveCatalog.createHiveConf(HiveCatalog.java:151)
at org.apache.flink.table.catalog.hive.HiveCatalog.(HiveCatalog.java:130)
at org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory.createCatalog(HiveCatalogFactory.java:84)
at org.apache.flink.table.client.gateway.local.ExecutionContext.createCatalog(ExecutionContext.java:259)
at org.apache.flink.table.client.gateway.local.ExecutionContext.lambda$new$0(ExecutionContext.java:136)
at java.util.HashMap.forEach(HashMap.java:1289)
at org.apache.flink.table.client.gateway.local.ExecutionContext.(ExecutionContext.java:135)
at org.apache.flink.table.client.gateway.local.LocalExecutor.getOrCreateExecutionContext(LocalExecutor.java:558)
... 4 more
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-cdh6.1.0
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:368)
... 13 more
这个错误的出现意味着,该Hive版本的jar包不支持该CDH的hadoop版本(Hadoop 3.0.0-cdh6.1.0)。
hive版本 1.2.1 支持hadoop版本:
Hadoop 1.x.y, 2.x.y
Flink官方表示目前Flink集成Hive仅支持2.3.4和1.2.1两个版本,我在利用CDH-6.1.0-Hadoop-3.0.0(Hive-2.1.1)集群集成Hive过程中发现,无论配置2.3.4和1.2.1都会出现错误。解决方案还是建议使用匹配或相近的Hive版本(起码大版本要匹配),否则相关方法不匹配会报错。