tez的安装部署

1.下载tez的依赖包:http://tez.apache.org
apache-tez-0.9.1-bin.tar.gz

2.上传tez的tar包到linux

3.将apache-tez-0.9.1-bin.tar.gz上传到HDFS的/tez目录下

hadoop fs -mkdir /tez
hadoop fs -put /opt/software/apache-tez-0.9.1-bin.tar.gz/ /tez

4.解压缩apache-tez-0.9.1-bin.tar.gz

tar -zxvf apache-tez-0.9.1-bin.tar.gz -C /opt

5.修改名称

mv apache-tez-0.9.1-bin/ tez-0.9.1

6.进入到Hive的配置目录:/opt/hive-2.3.7/conf

cd /opt/hive-2.3.7/conf
pwd
/opt/hive-2.3.7/conf

7.在Hive的/opt/hive-2.3.7/conf下面创建一个tez-site.xml文件

vim tez-site.xml

添加如下内容



    
	        
		tez.lib.uris        
		${fs.defaultFS}/tez/apache-tez-0.9.1-bin.tar.gz    
	    
	         
		tez.use.cluster.hadoop-libs        
		true    
	    
	         
		tez.history.logging.service.class
		org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService    
	

8.在hive-env.sh文件中添加tez环境变量配置和依赖包环境变量配置

vim hive-env.sh

添加如下配置

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/opt/tez-0.9.1    #是你的tez的解压目录
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do    
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; 
do    
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done 
export HIVE_AUX_JARS_PATH=/opt/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS

9.在hive-site.xml文件中添加如下配置,更改hive计算引擎



	hive.metastore.schema.verification
	false



    
	hive.execution.engine   
	tez
 

10.测试
启动Hive

bin/hive

创建表

create table student(id int,name string);

向表中插入数据

insert into student values(1,"zhangsan");

如果没有报错就表示成功了

select * from student;

1       zhangsan

注意事项

运行Tez时检查到用过多内存而被NodeManager杀死进程问题:

Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103For more detailed output, check application tracking page:http://hadoop103:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt.Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.

这种问题是从机上运行的Container试图使用过多的内存,而被NodeManager kill掉了

[摘录] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don’t want.

解决方法:
关掉虚拟内存检查,修改yarn-site.xml

    
	yarn.nodemanager.vmem-check-enabled    
	false

修改后一定要分发,并重新启动hadoop集群

rsync -r yarn-site.xml node2:`pwd`

你可能感兴趣的:(环境搭建)