搭建Hive on Spark on YARN常见问题及解决方案

文章目录

  • 1、在Hive cli中往表中插入记录,报错信息如下
    • 错误信息a
    • 错误信息b
  • 2、Hive update和delete报错
  • 3、beeline通过JDBC连接HiveServer2报错

1、在Hive cli中往表中插入记录,报错信息如下

错误信息a

Unrecognized Hadoop major version number: 3.2.0

当时环境版本信息:
hadoop版本:3.2.0
spark版本:2.4.0
hive版本:3.1.1
解决方案

版本兼容性问题,通过查看hive源码根目录下的pom.xml文件,发现:
1、hive兼容的hadoop版本为3.1.0,
2、兼容的spark版本为2.3.0,
所以将hadoop版本从3.2.0换为3.1.0,spark版本从2.4.0换为2.3.3,问题解决
注:主要保证大版本一致即可,没有必要保证版本完全一致,尽量选用稳定的版本

错误信息b

Failed to monitor Job[-1] with exception 'java.lang.IllegalStateException(Connection to remote Spark driver was lost)' Last known state = SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. RPC channel is closed.

解决方案
通过yarn logs -applicationId xxx查看Application的日志,发现如下错误信息

2019-03-02 12:24:17,549 ERROR yarn.ApplicationMaster: User class threw exception: java.io.FileNotFoundException: File file:/user/hive/tmp/sparkeventlog does not exist

hive-site.xml配置错误,提示文件sparkeventlog不存在,每次涉及HDFS路径参数的时候,添加core-site.xml中 fs.defaultFS 对应的 {hostname}:port 信息。例如,之前关于sparkeventlog的配置参数是这样子的:
/user/hive/tmp/sparkeventlog
现在修改为:
hdfs://hadoopSvr1:8020/user/hive/tmp/sparkeventlog

解决方案
检查resourcemanager的log信息,发现nodemanager启动异常,错误信息如下:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [hadoopSvr3:8031] java.net.BindException: Cannot assign requested address; For more details see:  http://wiki.apache.org/hadoop/BindException

重启YARN集群,:尽量在resourcemanager所在的节点上通过start-yarn.sh脚本启动YARN集群,这样可以减少YARN集群启动异常的概率。

2、Hive update和delete报错

错误信息如下:

hive> delete from alarm where eid = 8;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

解决方案
默认在hive中没有默认开启支持单条插入(update)、更新以及删除(delete)操作,需要自己配置,可通过在hive-site.xml文件中配置如下属性来开启update与delete功能

hive.support.concurrency = true
hive.enforce.bucketing = true
hive.exec.dynamic.partition.mode = nonstrict
hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on = true
hive.compactor.worker.threads = 1

重启hive服务,而且要求新建的表带有属性(transactional=true)见如下官网原话
If a table is to be used in ACID writes (insert, update, delete) then the table property “transactional=true” must be set on that table

3、beeline通过JDBC连接HiveServer2报错

错误信息如下:

Connecting to jdbc:hive2://localhost:10000
19/03/06 13:55:31 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate anonymous (state=08S01,code=0)
Beeline version 3.1.1 by Apache Hive

解决方案
通过HttpFS访问rest接口,以root用户包装自己用户的方式操作HDFS,在${HADOOP_HOME}/etc/hadoop/core-site.xml文件中添加如下配置:

    <property>    
		<name>hadoop.proxyuser.root.hosts</name>     
		<value>*</value>    
    </property>  
    <property>    
        <name>hadoop.proxyuser.root.groups</name>    
        <value>*</value> 
    </property>

所有节点按如上配置修改后,重启dfs即可。

你可能感兴趣的:(data,warehouse)