目录
一、前提
二、hive3.1.1 安装
三、apache-tez-0.9.2-bin 配置
四、配置 tez 出现的一些问题
主机 centos301 centos302 centos303
IP 192.168.130.150 192.168.130.151 192.168.130.152
安装包目录:/opt/software/
解压目录:/opt/module/
hive:apache-hive-3.1.1-bin.tar.gz
tez:apache-tez-0.9.2-bin.tar.gz
tar -zxvf /opt/software/apache-hive-3.1.1-bin.tar.gz -C /opt/module/
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=${HADOOP_HOME}
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=${HIVE_HOME}/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=${HIVE_HOME}/lib
javax.jdo.option.ConnectionURL
jdbc:mysql://centos301:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&useSSL=false
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
Password0!
datanucleus.schema.autoCreateAll
true
hive.cli.print.header
true
hive.cli.print.current.db
true
hive.metastore.warehouse.dir
hdfs://centos301:9000/user/hive/warehouse
location of default database for the warehouse
hive.metastore.uris
Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore,thrift://centos301:9083
hive.server2.thrift.port
10000
Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.
hive.server2.webui.host
192.168.130.150
hive.server2.webui.port
10002
1、切换到 hadoop 目录下
cd /opt/module/hadoop-3.2.0
2、创建上述目录
bin/hdfs dfs -p -mkdir /user/hive/warehouse
这里就不写关于 spark 集群的配置了,如果有关于spark集群一些的问题,大家遇到问题可以百度,或者在博客下面留言,我会第一时间回复,共同学习1、在/etc/profile文件末尾追加以下内容,vi /etc/profile
#HIVE_HOME
export HIVE_HOME=/opt/module/apache-hive-3.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin
2、使环境变量生效
source /etc/profile
schematool -dbType mysql -initSchema
1、开启 hive 数据仓库,切换到 hive 目录下
bin/hive --service metastore &
2、查看集群中是否创建 /user/hive/warehouse 目录,若无,请参考第五步
3、开启hive
bin/hive
1、切换到hadoop目录
cd /opt/module/hadoop-3.2.0
2、创建目录
bin/hdfs dfs -mkdir -p /user/tez
1、切换到hadoop目录
cd /opt/module/hadoop-3.2.0
2、上传
bin/hdfs dfs -put /opt/module/apache-tez-0.9.2-bin/ /user/tez/
1、切换到 conf 目录
cd /opt/module/apache-hive-3.1.1-bin/conf
2、修改 hive-site.xml 文件 vi hive-site.xml,追加以下内容
hive.execution.engine
tez
1、切换到 conf 目录
cd /opt/module/apache-hive-3.1.1-bin/conf
2、创建 tez-site.xml
touch tez-site.xml
3、追加以下内容
tez.lib.uris
${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin,${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin/lib
tez.lib.uris.classpath
${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin,${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin/lib
tez.use.cluster.hadoop-libs
true
tez.history.logging.service.class
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
1、切换到 conf 目录
cd /opt/module/apache-hive-3.1.1-bin/conf
2、修改 hive-env.sh 文件 vi hive-env.sh,修改 HIVE_AUX_JARS_PATH
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/opt/module/apache-tez-0.9.2-bin/
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HIVE_AUX_JARS_PATH=${HIVE_HOME}/lib$TEZ_JARS
1、切换到 conf 目录
cd /opt/module/hadoop-3.2.0/etc/hadoop
2、修改 mapred-site.xml 文件 vi mapred-site.xml,添加以下文件
mapreduce.framework.name
yarn-tez
2019-11-19T09:21:22,547 ERROR [6d819beb-6055-4d76-8347-6c1347f11af7 main] tez.DagUtils: Could not find the jar that was being uploaded
2019-11-19T09:21:22,548 ERROR [6d819beb-6055-4d76-8347-6c1347f11af7 main] exec.Task: Failed to execute tez graph.
java.io.IOException: Previous writer likely failed to write hdfs://centos301:9000/tmp/hive/centos100/_tez_session_dir/6d819beb-6055-4d76-8347-6c1347f11af7-resources/lib. Failing because I am unlikely to write too.
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:1191) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:1042) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:931) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:610) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:371) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:195) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) ~[hive-exec-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.1.jar:3.1.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_211]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_211]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_211]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_211]
at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.0.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.0.jar:?]
2019-11-19T09:21:22,549 INFO [6d819beb-6055-4d76-8347-6c1347f11af7 main] reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
2019-11-19T09:21:22,594 ERROR [6d819beb-6055-4d76-8347-6c1347f11af7 main] ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
其中,注意到关键的一句信息: 2019-11-19T09:21:22,549 INFO [6d819beb-6055-4d76-8347-6c1347f11af7 main] reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
权限受制,给 /tmp/hive/centos100/_tez_session_dir 目录增加权限 1、切换到hadoop目录
cd /opt/module/hadoop-3.2.0
2、在 centos100 用户下增加权限
bin/hdfs dfs -chmod -R a+x /tmp/hive/centos100/_tez_session_dir
bin/hdfs dfs -chmod -R a+x /tmp/hive/centos100
bin/hdfs dfs -chmod -R a+x /tmp/hive
bin/hdfs dfs -chmod -R a+x /tmp
2019-11-19T10:55:23,800 ERROR [9f771f3a-ba56-4603-8896-b7cbaa754dc8 main] ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1574125079882_0008_1_03, diagnostics=[Task failed, taskId=task_1574125079882_0008_1_03_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1574125079882_0008_01_000005 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : attempt_1574125079882_0008_1_03_000000_1:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:488)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException:
控制是否启用查询执行的向量模式,默认是 true,出现这个问题还没有进行过深入探究,可以在查询时临时设置 set hive.vectorized.execution.enabled=false;
也可以在hive-stie.xml中永久设置
hive.vectorized.execution.enabled
false
1、ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
这个问题大多数都是一些 hdfs 文件权限问题,根据自己的实际情况查看日志,给相应的文件夹赋权限
2、ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
(1)HQL产生大量的笛卡尔积,撑爆了 HIVE 执行的最大线程载荷
优化HQL
(2)hive 运行执行一半出现错误,出现YarnException: Unauthorized request to start container的问题
查看集群各个节点时钟是否同步同步
3、ql.Driver: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.tez.TezTask
主要考虑数据倾斜问题
至此,这篇博客已经全部写完,大家如果有什么问题在博客下面留言,和博主一起探讨