hive 使用TEZ的安装配置

为了更高效地运行存在依赖关系的作业(比如Pig和Hive产生的MapReduce作业),减少磁盘和网络IO,Hortonworks开发了DAG计算框架Tez。

Tez是从MapReduce计算框架演化而来的通用DAG计算框架,可作为MapReduceR/Pig/Hive等系统的底层数据处理引擎,它天生融入Hadoop 2.0中的资源管理平台YARN,且由Hadoop 2.0核心人员精心打造,势必将会成为计算框架中的后起之秀

需要的部分库和工具包gcc make gcc-c++ openssl 其中有两个phantomjs-2.1.1-linux-x86_64和 nodejs安装会浪费点时间

官网下载TEZ源码后解压编译

注意更改pom中hadoop version或在mvn中设定自己hadoop版本

mvn package -Dhadoop.version=2.7.2 -DskipTests -Dmaven.javadoc.skip=true
1.使用tez-dist/target/中的tez-0.8.4-minimal.tar.gz,在本地解压在/opt/single/tez,

在$TEZ_HOME下建立conf,创建tez-site.xml




	
		tez.lib.uris
		hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz
	
	
		tez.use.cluster.hadoop-libs
		true
	
2.设置linux的环境变量
export TEZ_HOME=/opt/single/tez
export TEZ_CONF_DIR=$TEZ_HOME/conf
export TEZ_JARS=$TEZ_HOME
3.在hadoop-env.sh中添加如下:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_CONF_DIR:$TEZ_JARS/*:$TEZ_JARS/lib/*
mapred-size.xml设置
	
		mapreduce.framework.name
		yarn-tez
	
4.启动hadoop将编译的tez-0.8.4-minimal.tar.gz上传到hdfs://hadoop:9000/apps/tez-0.8.4/目录下

5.关于TEZ UI的设置如下:

在yarn-site.xml中添加:


		yarn.timeline-service.enabled
		true
	
	
		yarn.timeline-service.hostname
		hadoop
	
	
		yarn.timeline-service.http-cross-origin.enabled
		true
	
	
		yarn.resourcemanager.system-metrics-publisher.enabled
		true
	
	
		yarn.timeline-service.webapp.address
		${yarn.timeline-service.hostname}:8188
	
	
		yarn.timeline-service.webapp.https.address
		${yarn.timeline-service.hostname}:2191
	
在tez-site.xml中添加:

	
		Enable Tez to use the Timeline Server for History Logging
		tez.history.logging.service.class
		org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
	

	
		
		tez.tez-ui.history-url.base
		http://hadoop:8008/tez-ui/
	

	
		tez.runtime.convert.user-payload.to.history-text
		true
	

	
		tez.task.generate.counters.per.io
		true
	
6.tomcat配置:

安装tomcat这里省略,网上很多

然后将tez-ui-0.8.4.war,tez-ui2-0.8.4.war解压到tomcat的webapps/目录下

mkdir -pv /opt/modules/tomcat-7.0.69/webapps/tez-ui  /opt/modules/tomcat-7.0.69/webapps/tez-ui2
cp /opt/single/tez/tez-ui-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui
cp /opt/single/tez/tez-ui2-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui2
jar xvf tez-ui-0.8.4.war
jar xvf tez-ui2-0.8.4.war
配置webapps/tez-ui/scripts/config.js文件
timelineBaseUrl: 'http://hadoop:8188',
RMWebUrl: 'http://hadoop:8088',
tomcat设置端口:8008

/opt/modules/tomcat-7.0.69/conf/ server.xml
     
7.测试:

启动

start-dfs.sh
start-yarn.sh
yarn-daemon.sh start timelineserver
startup.sh
hadoop jar /opt/single/tez/tez-tests-0.8.4.jar testorderedwordcount /data/data1 /output2
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim26, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:28 INFO client.TezClientUtils: Permissions on staging directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999 are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
16/08/27 00:33:28 INFO examples.TestOrderedWordCount: Creating Tez Session
16/08/27 00:33:28 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.8.4, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-08-25T08:17:01Z ]
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:28 INFO client.TezClient: Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.TezClient: Session mode. Starting session.
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
16/08/27 00:33:29 INFO client.TezClient: Tez system stage directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999/.tez/application_1472222203999_0005 doesn't exist and is created
16/08/27 00:33:29 INFO acls.ATSHistoryACLPolicyManager: Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1472222203999_0005
16/08/27 00:33:29 INFO impl.YarnClientImpl: Submitted application application_1472222203999_0005
16/08/27 00:33:29 INFO client.TezClient: The url to track the Tez Session: http://hadoop:8088/proxy/application_1472222203999_0005/
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Running OrderedWordCount DAG, dagIndex=1, inputPath=/data/data1, outputPath=/output2
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Checking DAG specific ACLS
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Waiting for TezSession to get into ready state
16/08/27 00:33:32 INFO examples.TestOrderedWordCount: Submitting DAG to Tez Session, dagIndex=1
16/08/27 00:33:32 INFO client.TezClient: Submitting dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1, callerContext={ context=Tez, callerType=TestOrderedWordCount, callerId=application_1472222203999_0005_1 }
16/08/27 00:33:33 INFO client.TezClient: Submitted dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1
16/08/27 00:33:33 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:33 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:33 INFO examples.TestOrderedWordCount: Submitted DAG to Tez Session, dagIndex=1
省略数百行....
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: DAG 1 completed. FinalState=SUCCEEDED
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: Shutting down session
16/08/27 00:33:37 INFO client.TezClient: Shutting down Tez Session, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005
测试tez是否能运行,然后在yarn的ui上观察tez的运行状况,

http://hadoop:8088/cluster

确认无误后可以测试hive

此处可选配置:在hive-site.xml中添加如下:

	
		hive.execution.engine
		tez
	
或者在~/.hiverc中添加

set hive.execution.engine=tez;

或者直接启动hive在命令行中设置以上set命令

然后执行hive查询

比如:

hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> select data1,data2 from test1 order by data1;
Query ID = hadoop_20160827004201_cb9e3165-4fd9-4b91-a68e-0ca4155be511
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1472222203999_0006)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1              SUCCEEDED      0          0        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 3.66 s     
--------------------------------------------------------------------------------
OK
data1	data2
Time taken: 6.346 seconds
hive (default)> 
出现以上显示说明配置成功,同时可以在ui上产看详细dag信息

hive 使用TEZ的安装配置_第1张图片

点击ApplicationMaster链接到TEZ的UI上如下图:

hive 使用TEZ的安装配置_第2张图片

选择对应的Dag Name链接可以查看详细内容如下:

hive 使用TEZ的安装配置_第3张图片

hive 使用TEZ的安装配置_第4张图片

也可以在hadoop:8008/tez-ui2/中查看

你可能感兴趣的:(hadoop2.x,hive,综合)