Compile
- home page : https://tez.apache.org/
- Wiki : https://cwiki.apache.org/confluence/display/TEZ/Index
- download
- https://github.com/wankunde/tez
- https://github.com/apache/tez
- cloudera version
- hadoop 2.5.0-cdh5.2.0 add a new function in JobContext.java.
/**
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobContext.java
*/
// add overwrite function in JobContextImpl.java
/**
* Get the boolean value for the property that specifies which classpath
* takes precedence when tasks are launched. True - user's classes takes
* precedence. False - system's classes takes precedence.
* @return true if user's classes should take precedence
*/
public boolean userClassesTakesPrecedence();
/**
* Get the boolean value for the property that specifies which classpath
* takes precedence when tasks are launched. True - user's classes takes
* precedence. False - system's classes takes precedence.
* @return true if user's classes should take precedence
*/
public boolean userClassesTakesPrecedence() {
return conf.userClassesTakesPrecedence();
}
- update hadoop.version and add cloudera repository.
2.5.0-cdh5.2.0
true
cloudera-repo-releases
https://repository.cloudera.com/artifactory/repo/
maven2-repository.cloudera
Cloudera Maven Repository
https://repository.cloudera.com/artifactory/repo/
default
- compile
- mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
- mvn -e clean package -DskipTests=true -Pcdh5.5.1
- result
- tez-0.7.0-minimal.tar.gz : contains tez jars
- tez-0.7.0.tar.gz : contains tez jars and hadoop jars
- tez-dist-0.7.0-tests.jar
- Tips
- if you meet compile error when you compile tez-ui,you may upgrade your node,npm ,bower version. Update pom.xml,bower.json ,package.json . Older version software may have some problem.
pom.xml:
v0.10.18
1.3.8
package.json:
"bower": "1.4.1",
Install
Official site
https://tez.apache.org/install.html
hdp documents
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html
mkdir tez-0.8.5 && cd tez-0.8.5 && tar -zxvf ../tez-0.8.5.tar.gz
cd .. && mv tez-0.8.5 /usr/lib && ln -s /usr/lib/tez-0.8.5 /usr/lib/tez
su - hdfs -c 'hadoop dfs -put -f /opt/app/tez-0.8.5-minimal.tar.gz /metadata/libs/tez/tez-0.8.5-minimal.tar.gz'
hadoop dfs -ls /metadata/libs/tez
- Unzip tez-0.7.0.tar.gz.
- upload tez libs to hdfs
hadoop dfs -rm -r -f /tmp/wankun/jars/tez/
hadoop dfs -mkdir /tmp/wankun/jars/tez/
hadoop dfs -put lib/ /tmp/wankun/jars/tez/
hadoop dfs -put tez-api-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-common-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-dag-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-examples-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-mapreduce-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-mbeans-resource-calculator-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-runtime-internals-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-runtime-library-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-tests-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -put tez-yarn-timeline-history-0.5.4.jar /tmp/wankun/jars/tez/
hadoop dfs -chmod -R 777 /tmp/wankun/jars/tez/
- set tez environment variable
export TEZ_HOME=/home/wankun/tez
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${TEZ_HOME}/conf:${TEZ_HOME}/*:${TEZ_HOME}/lib/*
- set tez-site.xml
In the configuration file,you should upload tez jars and libs to hdfs file system. and point tez.lib.uris to the hdfs directory.
tez.am.java.opts
-server -Xmx1535m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC
tez.am.env
LD_LIBRARY_PATH=/var/bh/hadoop/lib/native:/usr/lib/hadoop/lib/native/`$JAVA_HOME/bin/java -d32 -version &> /dev/null;if [ $? -eq 0 ]; then echo Linux-i386-32; else echo Linux-amd64-64;fi`
tez.am.shuffle-vertex-manager.max-src-fraction
0.4
tez.task.get-task.sleep.interval-ms.max
200
tez.staging-dir
/tmp/${user.name}/staging
tez.am.grouping.min-size
16777216
tez.runtime.intermediate-input.compress.codec
org.apache.hadoop.io.compress.SnappyCodec
tez.am.container.reuse.enabled
true
tez.yarn.ats.enabled
true
tez.am.log.level
INFO
tez.session.am.dag.submit.timeout.secs
300
tez.am.grouping.split-waves
1.4
tez.session.client.timeout.secs
180
tez.runtime.intermediate-output.compress.codec
org.apache.hadoop.io.compress.SnappyCodec
tez.am.shuffle-vertex-manager.min-src-fraction
0.2
tez.runtime.intermediate-output.should-compress
true
tez.am.am-rm.heartbeat.interval-ms.max
250
tez.lib.uris
hdfs:///bh/warehouse/dmp/jars/tez/,hdfs:///bh/warehouse/dmp/jars/tez/lib/
tez.am.container.reuse.non-local-fallback.enabled
true
tez.am.container.reuse.rack-fallback.enabled
true
tez.am.grouping.max-size
1073741824
tez.am.container.reuse.locality.delay-allocation-millis
250
tez.runtime.intermediate-input.is-compressed
true
tez.am.resource.memory.mb
2048
tez.am.container.session.delay-allocation-millis
30000
Test
MapReduce
- Put a test file to hdfs system.
hadoop dfs -rm -r -f /bh/warehouse/dmp/tmp/output/
hadoop jar tez-mapreduce-examples-0.4.1-incubating.jar orderedwordcount /bh/warehouse/dmp/tmp/input/47675.log /bh/warehouse/dmp/tmp/output/
- Cann't set queuename by -Dmapreduce.job.queuename=dmp_job in the command line. You must set the queuename by tez.queue.name in the tez-site.xml
Hive
Because I don't have the production environment permissions, deploy on hive test failed .
Just record the deployment process.
Way 1
Update mapreduce.framework.name=yarn-tez in mapred.xml. Not recommendedWay 2
copy tez-site.xml and all tez libs to hive install directory. Then use **set hive.execution.engine=tez; ** to enable tez .Tips
Our hadoop version is 2.5.0-cdh5.2.0 ,hive version is 0.13.0.
tez-0.7.0 is too new and conflict with hadoop.
tez-0.4.1-incubating-full is too old and even have no application logs.
tez-0.5.4 have little application logs but confict with hive 0.13.0 .
I'm crazy. It seems that cloudera company don't want to support tez.(By cloudera blogs in 2014.8)
Tips
Check out hadoop source branch
git branch -va // view remote branches
git checkout remotes/origin/branch-2.5.2
./dev-support/create-release.sh