2019独角兽企业重金招聘Python工程师标准>>>
一、tez简介
tez官网:http://tez.apache.org
在使用tez作为计算引擎使用之前先说明下tez-ui。tez-ui是查看tez任务执行日志的的web界面,依赖于yarn的timeline服务。tez0.8.3中又增加了tez-ui2。
timeline服务是apache hadoop2.6.0之后加入作为yarn的一个子服务。jobhistoryserver只能储存Mapreduce的历史日志,但是不支持诸如tez、spark等其他计算引擎历史日志的访问,所以在2.6.0中增加了timeline服务。timelineserver同时支持mapreduce、tez、spark on yarn等计算引擎任务在非本地模式的历史日志访问,当然jobhistoryserver还是可以同时使用的。
建议使用apache hadoop2.6.4+和apche hadoop2.7.2+,低版本较多的timeline服务bug。
详细的版本改进和.BUG修复可以参照http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-yarn/CHANGES.txt。
二、编译过程
1、安装jdk1.7,maven3.3.*,protobuf2.5.0
2、通过https://mirrors.tuna.tsinghua.edu.cn/apache/tez/,下载源码(由于tez-ui是0.6.*版本后支持,所以建议使用0.7.*版本或者0.8.*,0.8.4之后的版本可以直接下载bin包)。
解压至如下目录:${project_home}/apache-tez-0.8.3-src
3、修改pom.xml中参数
指定hadoop版本
2.6.4
protobuf安装之后protoc命令的位置
/usr/local/protobuf-2.5.0/bin/protoc
4、改完配置文件就能在src目录下执行编译命令了(当然你也可以在Eclipse或者IntelliJ IDEA中进行编译):
mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
然后就是刷屏刷屏...,最后一堆SUCCESS。(当然也可能是Failed)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] tez ............................................... SUCCESS [1.626s]
[INFO] hadoop-shim ....................................... SUCCESS [1.720s]
[INFO] tez-api ........................................... SUCCESS [5.598s]
[INFO] tez-common ........................................ SUCCESS [0.425s]
[INFO] tez-runtime-internals ............................. SUCCESS [0.612s]
[INFO] tez-runtime-library ............................... SUCCESS [1.688s]
[INFO] tez-mapreduce ..................................... SUCCESS [0.988s]
[INFO] tez-examples ...................................... SUCCESS [0.202s]
[INFO] tez-dag ........................................... SUCCESS [2.407s]
[INFO] tez-tests ......................................... SUCCESS [0.572s]
[INFO] tez-ext-service-tests ............................. SUCCESS [0.487s]
[INFO] tez-ui ............................................ SUCCESS [10.163s]
[INFO] tez-ui2 ........................................... SUCCESS [1:51.654s]
[INFO] tez-plugins ....................................... SUCCESS [0.023s]
[INFO] tez-yarn-timeline-history ......................... SUCCESS [0.383s]
[INFO] tez-yarn-timeline-history-with-acls ............... SUCCESS [0.254s]
[INFO] tez-history-parser ................................ SUCCESS [7.432s]
[INFO] tez-tools ......................................... SUCCESS [0.022s]
[INFO] tez-perf-analyzer ................................. SUCCESS [0.022s]
[INFO] tez-job-analyzer .................................. SUCCESS [0.272s]
[INFO] tez-javadoc-tools ................................. SUCCESS [0.095s]
[INFO] hadoop-shim-impls ................................. SUCCESS [0.021s]
[INFO] hadoop-shim-2.6 ................................... SUCCESS [0.118s]
[INFO] tez-dist .......................................... SUCCESS [9.088s]
[INFO] Tez ............................................... SUCCESS [0.052s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:36.974s
[INFO] Finished at: Sun Apr 24 19:10:56 CST 2016
[INFO] Final Memory: 94M/1298M
[INFO] ------------------------------------------------------------------------
Process finished with exit code 0
生成安装包
${project_home}/apache-tez-0.8.3-src/tez-dist/target/tez-0.8.3-minimal.tar.gz
${project_home}/apache-tez-0.8.3-src/tez-dist/target/tez-0.8.3.tar.gz
phantomjs没有安装所以导致编译源码失败。安装并加入环境变量PATH中。
其他tez-ui编译问题参考官方文档https://cwiki.apache.org//confluence/display/TEZ/Build+errors+and+solutions
三、引擎使用
1、配置修改
1.1、tez-site.xml
$HADOOP_HOME/etc/hadoop目录下增加tez-site.xml文件,增加内容如下(还有一堆性能参数,自己根据实际环境添加吧):
tez.lib.uris
hdfs://beh/engine/tez/tez.tar.gz
tez.history.logging.service.class
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService
Publish configuration information to Timeline server.
tez.runtime.convert.user-payload.to.history-text
true
URL for where the Tez UI is hosted
tez.tez-ui.history-url.base
http://hadoop001:8280/tez-ui/
tez.allow.disabled.timeline-domains
true
备注:
#这个参数指定的是编译完成的tez包,建议将tar包直接传至hdfs,最好不要使用本地存储tar包。这里可以直接使用mini包,也可以使用完整包。
tez.lib.uris
hdfs://beh/engine/tez/tez.tar.gz
#这个参数是使用tez-ui的web服务相关地址,可以使用主机名或者ip地址,端口自选。由于tez-ui是个web app依赖于web服务器,我这里选的tomcat服务器,怎么使用后面讲。
URL for where the Tez UI is hosted
tez.tez-ui.history-url.base
http://hadoop001:8280/tez-ui/
1.2、hadoop-env.sh
$HADOOP_HOME/etc/hadoop/hadoop-env.sh中添加tez的环境变量:
##tez
export BEH_HOME=/opt/beh
export TEZ_HOME=${BEH_HOME}/core/tez
export TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${TEZ_CONF_DIR}:${TEZ_HOME}/*:${TEZ_HOME}/lib/*
TEZ_HOME是你解压tez安装包的位置。
1.3、mapred-site.xml
mapreduce.framework.name
yarn-tez
Optional: If running existing MapReduce jobs on Tez. Modify mapred-site.xml to change “mapreduce.framework.name” property from its default value of “yarn” to “yarn-tez”
1.4、yarn-site.xml
在$HADOOP_HOME/etc/hadoop/yarn-site.xml中设置timeline服务。
相关设置参考yarn官网和tez官网设置。
1.5.hive-site.xml
$HIVE_HOME/conf/hive-site.xml修改并添加如下设置:
hive.execution.engine
tez
hive.tez.container.size
4096
hive.tez.java.opts
-server -Xmx4096m -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseParallelGC
hive.server2.tez.initialize.default.sessions
false
hive.server2.tez.default.queues
default
hive.tez.input.format
org.apache.hadoop.hive.ql.io.HiveInputFormat
hive.server2.tez.sessions.per.default.queue
1
2、tez-ui服务设置
tez安装包解压后产生tez-ui-0.8.3.war(当然你可能编译的其他版本),在这个war包下的sripts目录下的configs.js中修改resourcemanager服务地址端口和timeline服务地址端口。
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
App.setConfigs({
/* Environment configurations */
envDefaults: {
version: "0.8.3",
/*
* By default TEZ UI looks for timeline server at http://localhost:8188, uncomment and change
* the following value for pointing to a different domain.
*/
// timelineBaseUrl: 'http://localhost:8188',
timelineBaseUrl: 'http://hadoop001:8188',
/*
* By default RM web interface is expected to be at http://localhost:8088, uncomment and change
* the following value to point to a different domain.
*/
// RMWebUrl: 'http://localhost:8088',
RMWebUrl: 'http://hadoop001:23188',
/*
* Ensures that some of the UI features work with old versions of Tez
*/
compatibilityMode: false,
/*
* Default time zone for UI display. Set to undefined for local timezone
* For configuration see http://momentjs.com/timezone/docs/
*/
//timezone: "UTC",
},
/*
* Visibility of table columns can be controlled using the column selector. Also an optional set of
* file system counters can be enabled as columns for most of the tables. For adding more counters
* as columns edit the following 'tables' object. Counters must be added as configuration objects
* of the following format.
* {
* counterName: '',
* counterGroupName: '',
* }
*
* Note: Till 0.6.0 the properties were counterId and groupId, their use is deprecated now.
*/
tables: {
/*
* Entity specific columns must be added into the respective array.
*/
entity: {
dag: [
// { // Following is a sample configuration object.
// counterName: 'FILE_BYTES_READ',
// counterGroupName: 'org.apache.tez.common.counters.FileSystemCounter',
// }
],
vertex: [],
task: [],
taskAttempt: [],
tezApp: [],
},
/*
* User sharedColumns to add counters that must be displayed in all tables.
*/
sharedColumns:[]
}
});
然后将war复制到tomcat安装目录的webapps下。然后就可以启动tomcat并且登录tez-ui网址了。
本文登陆地址是:
http://hadoop001:8280/tez-ui/