FLink读取+插入Hive数据入坑指南
Flink1.9
以上版本可以使用hivecatalog读取Hive数据,但是1.9
对于Hive的版本支持不太友好,只支持2.3.4
和1.2.1
,笔者用的Hive版本是比较老的版本1.2.1,FLink是1.10.0
,接下来说一说我在读取Hive数据和插入Hive数据期间遇到的问题。
本地环境:window10,Flink:1.10.0
目的:用本地电脑IDEA运行Flink程序读取测试环境集群的Hive表数据
首先我们可以参照Flink的官方文档加入任务需要的依赖:官网地址:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/hive/
Flink中文社区的FLInk-Hive文章:https://ververica.cn/developers/flink1-9-hive/
如图是官网提供的需要的依赖:
org.apache.flink
flink-connector-hive_2.11
1.10.0
provided
org.apache.flink
flink-table-api-java-bridge_2.11
1.10.0
provided
org.apache.hive
hive-exec
${hive.version}
provided
然后在主程序里写好代码,运行是发现报了一个错:
Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: tesla-cluster
这里的意思是说找不到hdfs的地址,需要我们手动把测试环境的hdfs-site.xml复制到本地,放到resources目录下,此问题解决。
再次运行代码,发现又报一个错误:
Exception in thread "main" java.lang.NoSuchMethodError: com.facebook.fb303.FacebookService$Client.sendBaseOneway(Ljava/lang/String;Lorg/apache/thrift/TBase;)V
在网上找了很多资料后来发现是Jar包冲突,原因是我从Flink1.9.1迁移过来的,FLink1.9.1提供的依赖是这样的:
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-connector-hive_2.11artifactId>
<version>1.9.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-hadoop-compatibility_2.11artifactId>
<version>1.9.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-shaded-hadoop-2-uberartifactId>
<version>2.6.5-8.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.hivegroupId>
<artifactId>hive-metastoreartifactId>
<version>1.2.1version>
dependency>
<dependency>
<groupId>org.apache.hivegroupId>
<artifactId>hive-execartifactId>
<version>1.2.1version>
dependency>
<dependency>
<groupId>org.apache.thriftgroupId>
<artifactId>libfb303artifactId>
<version>0.9.3version>
dependency>
比Flink1.10多出来几个Jar包,其中的罪魁祸首是:
<dependency>
<groupId>org.apache.thriftgroupId>
<artifactId>libfb303artifactId>
<version>0.9.3version>
dependency>
它于hive-exec包里的版本冲突了,将这个依赖删除就好了。你的程序可能会因为别的依赖导致冲突,找到对应的冲突的jar删除一个不需要的版本,就好了。
最后附上最基础的Flink读取Hive数据代码:
主程序:
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.catalog.hive.HiveCatalog;
/**
* @author:wy
* @date:2020/3/31
* @version:1.0.0
* @description:
*/
public class IntelligenceAlter {
public static void main(String[] args) throws Exception {
EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);
String name = "hive";
String defaultDatabase = "tmp";
String hiveConfDir = "E:\\Hadoop\\";
String version = "1.2.1";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, version);
tableEnv.registerCatalog("hive", hive);
tableEnv.registerCatalog(name, hive);
tableEnv.useCatalog(name);
tableEnv.sqlQuery("select * from tmp.tmp_flink_test_2").select("product_id");
tableEnv.sqlUpdate("insert into tmp.tmp_flink_test_2 values ('newKey')");
tableEnv.execute("insert into tmp");
}
}
pom.xml
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-javaartifactId>
<version>${flink.version}version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-streaming-java_2.11artifactId>
<version>${flink.version}version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-connector-hive_2.11artifactId>
<version>${flink.version}version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-hadoop-compatibility_2.11artifactId>
<version>${flink.version}version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-shaded-hadoop-2-uberartifactId>
<version>2.6.5-8.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.hivegroupId>
<artifactId>hive-execartifactId>
<version>1.2.1version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-planner-blink_2.11artifactId>
<version>${flink.version}version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-api-java-bridge_2.11artifactId>
<version>${flink.version}version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-planner_2.11artifactId>
<version>${flink.version}version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-table-commonartifactId>
<version>${flink.version}version>
dependency>