apache-hive-3.1.1-bin、apache-tez-0.9.2-bin以及遇到的一些问题

目录

一、前提

二、hive3.1.1 安装

三、apache-tez-0.9.2-bin 配置

四、配置 tez 出现的一些问题


一、前提

  1. 这篇博客写之前,简要的说一下本人笔记本的环境,以及集群部署和目录规划,当前用户是centos100,超级用户是root,大家可以根据自己的环境做一下配置
    主机      centos301          centos302          centos303
    IP        192.168.130.150    192.168.130.151    192.168.130.152
  2. 目录规划
    安装包目录:/opt/software/
    解压目录:/opt/module/
    
  3. 安装包,大家可以在网上自己下载
    hive:apache-hive-3.1.1-bin.tar.gz
    tez:apache-tez-0.9.2-bin.tar.gz
    
  4. 这篇博客希望没有错误,如果有错误希望大家在博客下面留言,我会第一时间回复

二、hive3.1.1 安装

  1. 安装包解压到 /opt/module 目录下
    tar -zxvf /opt/software/apache-hive-3.1.1-bin.tar.gz -C /opt/module/
    
  2. 切换到 /opt/module 目录下,并对 /opt/module/apache-hive-3.1.1-bin/conf 目录下的文件进行配置
  3. 配置 hive-env.sh
    # Set HADOOP_HOME to point to a specific hadoop install directory
    HADOOP_HOME=${HADOOP_HOME}
    
    # Hive Configuration Directory can be controlled by:
    export HIVE_CONF_DIR=${HIVE_HOME}/conf
    
    # Folder containing extra libraries required for hive compilation/execution can be controlled by:
    export HIVE_AUX_JARS_PATH=${HIVE_HOME}/lib
    
    
  4. 配置 hive-site.xml 
    
    
    
    	
    		javax.jdo.option.ConnectionURL
    		 jdbc:mysql://centos301:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&useSSL=false
    	
    	
    		javax.jdo.option.ConnectionDriverName
    		com.mysql.jdbc.Driver
    	
    	
    		javax.jdo.option.ConnectionUserName
    		root
    	
    	
    		javax.jdo.option.ConnectionPassword
    		Password0!
    	
    	
    		datanucleus.schema.autoCreateAll
    		true
    	
    	
    	
    		hive.cli.print.header
    		true
    	
    	
    	
    		hive.cli.print.current.db
    		true
    	
        
    	
    		hive.metastore.warehouse.dir
    		hdfs://centos301:9000/user/hive/warehouse
    		location of default database for the warehouse
    	
    	
    		hive.metastore.uris
    		
    		Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore,thrift://centos301:9083
    	
    	
    		hive.server2.thrift.port
    		10000
    		Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.
    	
    	
    		hive.server2.webui.host
    		192.168.130.150
    	
    	
    		hive.server2.webui.port
    		10002
    	
    
  5. 其中,设置 hive.metastore.warehouse.dir 目的是为了与 spark-sql 数据库共享,把数据库放在 hdfs 上在做spark-sql查询时要先开启 hive 仓库,切换到 hive 目录下执行命令:
    bin/hive --service metastore &
    切换到 spark 目录执行 spark-sql shell 窗口命令:
    bin/spark-sql --conf spark.sql.warehouse.dir=hdfs://centos301:9000/user/hive/warehouse
    同时要在hdfs集群中创建上述目录 hdfs://centos301:9000/user/hive/warehouse
    1、切换到 hadoop 目录下
    cd /opt/module/hadoop-3.2.0
    2、创建上述目录
    bin/hdfs dfs -p -mkdir /user/hive/warehouse
    
    这里就不写关于 spark 集群的配置了,如果有关于spark集群一些的问题,大家遇到问题可以百度,或者在博客下面留言,我会第一时间回复,共同学习
  6. 设置 hive 环境变量
    1、在/etc/profile文件末尾追加以下内容,vi /etc/profile
    #HIVE_HOME
    export HIVE_HOME=/opt/module/apache-hive-3.1.1-bin
    export PATH=$PATH:$HIVE_HOME/bin
    2、使环境变量生效
    source /etc/profile
    
  7. 上传 MySQL 驱动包(mysql-connector-java-5.1.47-bin.jar)到 /opt/module/apache-hive-3.1.1-bin/lib 目录下,并改名称为 mysql-connector-java.jar,连接 MySQL 数据库以及初始化 hive 数据库使用
  8. 在 MySQL 数据库中初始化 hive 元数据库
    schematool -dbType mysql -initSchema
    
  9. 至此 hive 数据库配置完毕,验证
    1、开启 hive 数据仓库,切换到 hive 目录下
    bin/hive --service metastore &
    2、查看集群中是否创建 /user/hive/warehouse 目录,若无,请参考第五步
    3、开启hive
    bin/hive

三、apache-tez-0.9.2-bin 配置

  1. 下载并解压到 /opt/module 目录下
  2. 在 hadoop 集群上创建 /user/tez/ 目录
    1、切换到hadoop目录
    cd /opt/module/hadoop-3.2.0
    2、创建目录
    bin/hdfs dfs -mkdir -p /user/tez
  3. 上传 apache-tez-0.9.2-bin/ 下所有文件
    1、切换到hadoop目录
    cd /opt/module/hadoop-3.2.0
    2、上传
    bin/hdfs dfs -put /opt/module/apache-tez-0.9.2-bin/ /user/tez/
  4. 配置 hive-site.xml 文件,设置 hive 默认执行引擎为 tez
    1、切换到 conf 目录
    cd /opt/module/apache-hive-3.1.1-bin/conf
    2、修改 hive-site.xml 文件 vi hive-site.xml,追加以下内容
    
    		hive.execution.engine
    		tez
    
    
  5. 在 /opt/module/apache-hive-3.1.1-bin/conf 目录下创建 tez-site.xml
    1、切换到 conf 目录
    cd /opt/module/apache-hive-3.1.1-bin/conf
    2、创建 tez-site.xml
    touch tez-site.xml
    3、追加以下内容
    
    
    
    
        tez.lib.uris
    	${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin,${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin/lib
    
    
        tez.lib.uris.classpath
    	${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin,${fs.defaultFS}/user/tez/apache-tez-0.9.2-bin/lib
    
    
         tez.use.cluster.hadoop-libs
         true
    
    
         tez.history.logging.service.class
    	 org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
    
    
    
  6. 配置 hive-env.sh 修改 HIVE_AUX_JARS_PATH
    1、切换到 conf 目录
    cd /opt/module/apache-hive-3.1.1-bin/conf
    2、修改 hive-env.sh 文件 vi hive-env.sh,修改 HIVE_AUX_JARS_PATH
    # Folder containing extra libraries required for hive compilation/execution can be controlled by:
    export TEZ_HOME=/opt/module/apache-tez-0.9.2-bin/
    export TEZ_JARS=""
    for jar in `ls $TEZ_HOME |grep jar`; do
        export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
    done
    for jar in `ls $TEZ_HOME/lib`; do
        export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
    done
    
    export HIVE_AUX_JARS_PATH=${HIVE_HOME}/lib$TEZ_JARS
  7. 配置 mapred-site.xml
    1、切换到 conf 目录
    cd /opt/module/hadoop-3.2.0/etc/hadoop
    2、修改 mapred-site.xml 文件 vi mapred-site.xml,添加以下文件
    	
    		mapreduce.framework.name
    		yarn-tez
    	
  8. 至此,引擎切换为 tez,关于tez引擎的介绍大家可以查阅一下资料,hive是支持三种执行引擎:MapReduce,tez以及spark,其中 MapReduce 被逐渐舍弃,问题就是太慢

四、配置 tez 出现的一些问题

  1. java.io.IOException: Previous writer likely failed to write hdfs://centos301:9000/tmp/hive/centos
    2019-11-19T09:21:22,547 ERROR [6d819beb-6055-4d76-8347-6c1347f11af7 main] tez.DagUtils: Could not find the jar that was being uploaded
    2019-11-19T09:21:22,548 ERROR [6d819beb-6055-4d76-8347-6c1347f11af7 main] exec.Task: Failed to execute tez graph.
    java.io.IOException: Previous writer likely failed to write hdfs://centos301:9000/tmp/hive/centos100/_tez_session_dir/6d819beb-6055-4d76-8347-6c1347f11af7-resources/lib. Failing because I am unlikely to write too.
    	at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:1191) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java:1042) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:931) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.ensureLocalResources(TezSessionState.java:610) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:371) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:195) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) ~[hive-exec-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.1.jar:3.1.1]
    	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.1.jar:3.1.1]
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_211]
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_211]
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_211]
    	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_211]
    	at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.0.jar:?]
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.0.jar:?]
    2019-11-19T09:21:22,549  INFO [6d819beb-6055-4d76-8347-6c1347f11af7 main] reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
    2019-11-19T09:21:22,594 ERROR [6d819beb-6055-4d76-8347-6c1347f11af7 main] ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
    
    其中,注意到关键的一句信息:
    2019-11-19T09:21:22,549  INFO [6d819beb-6055-4d76-8347-6c1347f11af7 main] reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
    权限受制,给 /tmp/hive/centos100/_tez_session_dir 目录增加权限
    1、切换到hadoop目录
    cd /opt/module/hadoop-3.2.0
    2、在 centos100 用户下增加权限
    bin/hdfs dfs -chmod -R a+x /tmp/hive/centos100/_tez_session_dir
    bin/hdfs dfs -chmod -R a+x /tmp/hive/centos100
    bin/hdfs dfs -chmod -R a+x /tmp/hive
    bin/hdfs dfs -chmod -R a+x /tmp
  2. java.lang.RuntimeException: Hive Runtime Error while closing operators
    2019-11-19T10:55:23,800 ERROR [9f771f3a-ba56-4603-8896-b7cbaa754dc8 main] ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1574125079882_0008_1_03, diagnostics=[Task failed, taskId=task_1574125079882_0008_1_03_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1574125079882_0008_01_000005 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : attempt_1574125079882_0008_1_03_000000_1:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
    	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
    	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
    	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
    	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:422)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
    	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
    	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
    	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
    	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
    	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:488)
    	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
    	... 16 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: 
    控制是否启用查询执行的向量模式,默认是 true,出现这个问题还没有进行过深入探究,可以在查询时临时设置
    set hive.vectorized.execution.enabled=false;
    也可以在hive-stie.xml中永久设置
    
    	hive.vectorized.execution.enabled
    	false
    
    
  3. hive 查询出现错误, return code 1、2、3,这些都是一些普通大众的问题,总结一下,遇到问题还要具体问题具体分析,查看日志
    1、ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
        这个问题大多数都是一些 hdfs 文件权限问题,根据自己的实际情况查看日志,给相应的文件夹赋权限
    2、ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
        (1)HQL产生大量的笛卡尔积,撑爆了 HIVE 执行的最大线程载荷
            优化HQL
        (2)hive 运行执行一半出现错误,出现YarnException: Unauthorized request to start container的问题
            查看集群各个节点时钟是否同步同步
    3、ql.Driver: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.tez.TezTask
        主要考虑数据倾斜问题

至此,这篇博客已经全部写完,大家如果有什么问题在博客下面留言,和博主一起探讨

你可能感兴趣的:(hadoop学习,hadoop-3.2.0,hive-3.1.1,mysql)