Tez介绍
Tez是Apache开源的DAG作业的计算引擎,是为了减小Hive作业的延迟而提出的解决方案,Tez已被Hortonworks用于Hive引擎的优化,经测试,性能提升约100倍。Tez+Hive仍然采用MapReduce计算框架,但对DAG的依赖关系进行了剪裁,并将多个小作业合并成一个大作业,这样不仅作业量减少了,而且写HDFS的次数也会大大减少。
1)Tez的特点
2)tez和mapreduce对比:
hive on tez 的方式有两种安装配置方式:
● 在hadoop中配置一、对于hadoop中的配置如下:
1)下载源码、编译;然后将编译后的tez-0.5.4.tar.gz上传到hdfs某个目录下;
2)在hadoop的master节点上的$HADOOP_HOME/etc/hadoop/目录下创建tez-site.xml文件,内容如下:
tez.lib.uris
${fs.defaultFS}/apps/tez-0.5.3.tar.gz
3)需要将tez的jar包加到$HADOOP_CLASSPATH路径下;
export TEZ_HOME=/oneapm/local/tez-0.5.3 #是你的tez的解压目录
for jar in `ls $TEZ_HOME |grep jar`; do
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar
done
5)修改mapred-site.xml 文件:
mapreduce.framework.name
yarn-tez
二、hive上的配置:
1、下载源码包:(tez没有二进制包,需要下载源码使用maven编译)
1)地址:http://archive.apache.org/dist/tez/0.5.3/
2)下载到/usr/local目录、解压:
tar -xvzf apache-tez-0.5.4-src.tar.gz
2、编译tez:
cd apache-tez-0.5.4-src
mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
编译完成后/usr/local/apache-tez-0.5.4-src/tez-dist/target/目录中有二进制包(tez-0.5.4.tar.gz):
3、使用:
1)上传apache-tez-0.5.4-src/tez-dist/target/tez-0.5.4.tar.gz到hdfs:
hadoop fs -mkdir /apps
hadoop fs -put tez-0.5.4.tar.gz /apps
2)创建tez-site.xml文件:
tez.lib.uris
${fs.default.name}/apps/tez-0.5.4.tar.gz
将该文件放到hive客户端所在节点的hadoop配置文件下(/usr/local/hadoop-2.5.2/etc/hadoop)
3)拷贝jar包:
将tez下的jar和tez下的lib下的jar包复制到hive的$HIVE_HOME/lib目录下即可
cp -r /usr/local/apache-tez-0.5.4/tez-dist/target/tez-0.5.4/* $HIVE_HOME/lib
4)运行:
进入hive的客户端,然后执行:
hive> set hive.execution.engine=tez;
5)报错:
hive> select id,name,money from test distribute by id sort by money;
Query ID = root_20170424202118_99a0b9fd-dcac-4cc2-b764-19bdff409a7e
Total jobs = 1
Launching Job 1 out of 1
java.lang.NoClassDefFoundError: org/apache/commons/collections4/BidiMap
at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:283)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:157)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.collections4.BidiMap
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 20 more
解决:
参考: