Tez是从MapReduce计算框架演化而来的通用DAG计算框架,可作为MapReduceR/Pig/Hive等系统的底层数据处理引擎,它天生融入Hadoop 2.0中的资源管理平台YARN,且由Hadoop 2.0核心人员精心打造,势必将会成为计算框架中的后起之秀
需要的部分库和工具包gcc make gcc-c++ openssl 其中有两个phantomjs-2.1.1-linux-x86_64和 nodejs安装会浪费点时间
官网下载TEZ源码后解压编译
注意更改pom中hadoop version或在mvn中设定自己hadoop版本
mvn package -Dhadoop.version=2.7.2 -DskipTests -Dmaven.javadoc.skip=true
1.使用tez-dist/target/中的tez-0.8.4-minimal.tar.gz,在本地解压在/opt/single/tez,
在$TEZ_HOME下建立conf,创建tez-site.xml
tez.lib.uris
hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz
tez.use.cluster.hadoop-libs
true
2.设置linux的环境变量
export TEZ_HOME=/opt/single/tez
export TEZ_CONF_DIR=$TEZ_HOME/conf
export TEZ_JARS=$TEZ_HOME
3.在hadoop-env.sh中添加如下:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_CONF_DIR:$TEZ_JARS/*:$TEZ_JARS/lib/*
mapred-size.xml设置
mapreduce.framework.name
yarn-tez
4.启动hadoop将编译的tez-0.8.4-minimal.tar.gz上传到hdfs://hadoop:9000/apps/tez-0.8.4/目录下
5.关于TEZ UI的设置如下:
在yarn-site.xml中添加:
yarn.timeline-service.enabled
true
yarn.timeline-service.hostname
hadoop
yarn.timeline-service.http-cross-origin.enabled
true
yarn.resourcemanager.system-metrics-publisher.enabled
true
yarn.timeline-service.webapp.address
${yarn.timeline-service.hostname}:8188
yarn.timeline-service.webapp.https.address
${yarn.timeline-service.hostname}:2191
在tez-site.xml中添加:
Enable Tez to use the Timeline Server for History Logging
tez.history.logging.service.class
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
tez.tez-ui.history-url.base
http://hadoop:8008/tez-ui/
tez.runtime.convert.user-payload.to.history-text
true
tez.task.generate.counters.per.io
true
6.tomcat配置:
安装tomcat这里省略,网上很多
然后将tez-ui-0.8.4.war,tez-ui2-0.8.4.war解压到tomcat的webapps/目录下
mkdir -pv /opt/modules/tomcat-7.0.69/webapps/tez-ui /opt/modules/tomcat-7.0.69/webapps/tez-ui2
cp /opt/single/tez/tez-ui-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui
cp /opt/single/tez/tez-ui2-0.8.4.war /opt/modules/tomcat-7.0.69/webapps/tez-ui2
jar xvf tez-ui-0.8.4.war
jar xvf tez-ui2-0.8.4.war
配置webapps/tez-ui/scripts/config.js文件
timelineBaseUrl: 'http://hadoop:8188',
RMWebUrl: 'http://hadoop:8088',
tomcat设置端口:8008
/opt/modules/tomcat-7.0.69/conf/ server.xml
7.测试:
启动
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start timelineserver
startup.sh
hadoop jar /opt/single/tez/tez-tests-0.8.4.jar testorderedwordcount /data/data1 /output2
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:27 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim26, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.7.2, majorVersion=2, minorVersion=7
16/08/27 00:33:28 INFO client.TezClientUtils: Permissions on staging directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999 are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx------
16/08/27 00:33:28 INFO examples.TestOrderedWordCount: Creating Tez Session
16/08/27 00:33:28 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.8.4, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-08-25T08:17:01Z ]
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:28 INFO client.TezClient: Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
16/08/27 00:33:28 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:28 INFO client.TezClient: Session mode. Starting session.
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz
16/08/27 00:33:28 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
16/08/27 00:33:29 INFO client.TezClient: Tez system stage directory hdfs://hadoop:9000/tmp/hadoop/tez/staging/1472229207999/.tez/application_1472222203999_0005 doesn't exist and is created
16/08/27 00:33:29 INFO acls.ATSHistoryACLPolicyManager: Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1472222203999_0005
16/08/27 00:33:29 INFO impl.YarnClientImpl: Submitted application application_1472222203999_0005
16/08/27 00:33:29 INFO client.TezClient: The url to track the Tez Session: http://hadoop:8088/proxy/application_1472222203999_0005/
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Running OrderedWordCount DAG, dagIndex=1, inputPath=/data/data1, outputPath=/output2
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Checking DAG specific ACLS
16/08/27 00:33:29 INFO examples.TestOrderedWordCount: Waiting for TezSession to get into ready state
16/08/27 00:33:32 INFO examples.TestOrderedWordCount: Submitting DAG to Tez Session, dagIndex=1
16/08/27 00:33:32 INFO client.TezClient: Submitting dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1, callerContext={ context=Tez, callerType=TestOrderedWordCount, callerId=application_1472222203999_0005_1 }
16/08/27 00:33:33 INFO client.TezClient: Submitted dag to TezSession, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005, dagName=OrderedWordCount1
16/08/27 00:33:33 INFO impl.TimelineClientImpl: Timeline service address: http://localhost:8188/ws/v1/timeline/
16/08/27 00:33:33 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.0.3:8032
16/08/27 00:33:33 INFO examples.TestOrderedWordCount: Submitted DAG to Tez Session, dagIndex=1
省略数百行....
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: DAG 1 completed. FinalState=SUCCEEDED
16/08/27 00:33:37 INFO examples.TestOrderedWordCount: Shutting down session
16/08/27 00:33:37 INFO client.TezClient: Shutting down Tez Session, sessionName=OrderedWordCountSession, applicationId=application_1472222203999_0005
测试tez是否能运行,然后在yarn的ui上观察tez的运行状况,
http://hadoop:8088/cluster
确认无误后可以测试hive
此处可选配置:在hive-site.xml中添加如下:
hive.execution.engine
tez
或者在~/.hiverc中添加
set hive.execution.engine=tez;
或者直接启动hive在命令行中设置以上set命令
然后执行hive查询
比如:
hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)> select data1,data2 from test1 order by data1;
Query ID = hadoop_20160827004201_cb9e3165-4fd9-4b91-a68e-0ca4155be511
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1472222203999_0006)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 SUCCEEDED 0 0 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 3.66 s
--------------------------------------------------------------------------------
OK
data1 data2
Time taken: 6.346 seconds
hive (default)>
出现以上显示说明配置成功,同时可以在ui上产看详细dag信息
点击ApplicationMaster链接到TEZ的UI上如下图:
选择对应的Dag Name链接可以查看详细内容如下:
也可以在hadoop:8008/tez-ui2/中查看