本文主要介绍Azkaban与Hadoop2.5.1、Hive0.13.1的集成安装部署。以前一直再用oozie和Hadoop1,这是第一次接触Azkaban和Hadoop2,对它们的理解不深,所以文章中有错误的地方还望不吝赐教。本文主要是抛砖引玉,让有需要的人不用再跳到同样的坑里
下载地址:https://github.com/azkaban/azkaban-plugins,并切换到release-2.5分支下。为了描述方便,假设解压的完整路径是${AZKABAN_PLUGINS_SOURCE}。
用于编译的机器需要jdk1.6+和ant,请提前准备好。为了节省时间具体的插件编译到具体的目录下执行,例如:${AZKABAN_PLUGINS_SOURCE}/plugins/reportal/
此外,最好在根目录下直接运行sudo ant命令编译一部分,例如产生dist/package.version或基本的依赖,这过程中可能出现错误,例如,缺少dustc可执行命令等,但是对本文中提到的编译没有直接的影响。本文将azkaban web server和azkaban executor server分开部署,azkaban web server部署在192.168.20.221;azkaban executor server部署在192.168.20.222。本文中的所有配置都是满足我目前需要的配置,如果需要其他配置请自行查阅在线文档或源码 - -
CREATE DATABASE `azkaban` /*!40100 DEFAULT CHARACTER SET utf8 */;
解压azkaban-web-server-2.5.0.tar.gz,为了方便描述,以下使用${AZKABAN_WEB_SERVER}作为安装目录路径为,这个版本的azkaban在lib中自带了mysql-connector-java-5.1.28.jar,如果版本不一致请自行替换
keytool -keystore keystore -alias azkaban -genkey -keyalg RSA
输入/输入如下:所有密码假设输入为password
Enter keystore password:
Re-enter new password:
What is your first and last name?
[Unknown]: azkaban.test.com
What is the name of your organizational unit?
[Unknown]: azkaban
What is the name of your organization?
[Unknown]: test
What is the name of your City or Locality?
[Unknown]: beijing
What is the name of your State or Province?
[Unknown]: beijing
What is the two-letter country code for this unit?
[Unknown]: CN
Is CN=azkaban.test.com, OU=azkaban, O=test, L=beijing, ST=beijing, C=CN correct?
[no]: yes
Enter key password for
(RETURN if same as keystore password):
命令成功后,会在当前文件夹下生成名为keystore的文件
default.timezone.id=Asia/Shanghai
jetty.keystore=conf/keystore
jetty.password=password
jetty.keypassword=password
jetty.truststore=conf/keystore
jetty.trustpassword=password
jetty.hostname=192.168.20.221
executor.host=192.168.20.222
executor.port=12321
[email protected]
mail.host=mail.xxxx.com
mail.user=mailuser
mail.password=password
[root@mn extlib]# ll
total 10448
-rw-r--r-- 1 root root 41123 Oct 29 18:09 commons-cli-1.2.jar
-rw-r--r-- 1 root root 52418 Oct 29 18:09 hadoop-auth-2.5.1.jar
-rw-r--r-- 1 root root 2962475 Oct 29 18:09 hadoop-common-2.5.1.jar
-rw-r--r-- 1 root root 7095356 Oct 29 18:09 hadoop-hdfs-2.5.1.jar
-rw-r--r-- 1 root root 533455 Oct 29 18:09 protobuf-java-2.5.0.jar
这么做的缺点是以后hadoop如果升级,别忘了将这些jar更新!reportal.output.filesystem=hdfs
解压azkaban-executor-server-2.5.0.tar.gz到安装目录,为了方便描述,假设该路径为${AZKABAN_EXECUTOR_SERVER}。这个版本的azkaban在lib中自带了mysql-connector-java-5.1.28.jar,如果版本不一致请自行替换。
hadoop.home=/usr/local/hadoop
hive.home=/opt/hive
修改${AZKABAN_EXECUTOR_SERVER}/plugins/jobtypes/commonprivate.properties,同样需要设置hadoop.home和hive.home。另外修改jobtype.global.classpath。例如:
hadoop.home=/usr/local/hadoop
hive.home=/opt/hive
jobtype.global.classpath=${hadoop.home}/etc/hadoop,${hadoop.home}/share/hadoop/common/*,${hadoop.home}/share/hadoop/common/lib/*,${hadoop.home}/share/hadoop/hdfs/*,${hadoop.home}/share/hadoop/hdfs/lib/*,${hadoop.home}/share/hadoop/yarn/*,${hadoop.home}/share/hadoop/yarn/lib/*,${hadoop.home}/share/hadoop/mapreduce/*,${hadoop.home}/share/hadoop/mapreduce/lib/*
在上步中hive插件已经被安装了,这里主要是如何配置hive插件。
jobtype.classpath=${hive.home}/conf,${hive.home}/lib/*
hive.aux.jar.path=file://${hive.home}/aux/lib
jobtype.classpath与${AZKABAN_EXECUTOR_SERVER}/plugins/jobtypes/commonprivate.properties中的jobtype.global.classpath一起组合成hive任务的classpath。所以这两个属性如何赋值,可以灵活设置,保证classpath是你要的即可。hive.aux.jars.path=file://${hive.home}/aux/lib
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.usesJobShell()Z
at azkaban.jobtype.HadoopSecureHiveWrapper.runHive(HadoopSecureHiveWrapper.java:148)
at azkaban.jobtype.HadoopSecureHiveWrapper.main(HadoopSecureHiveWrapper.java:115)
解决办法是修改${AZKABAN_PLUGINS_SOURCE}/plugins/jobtype/src/azkaban/jobtype/HadoopSecureHiveWrapper.java,找到如下代码片段:if (!ShimLoader.getHadoopShims().usesJobShell()) {
...
...
}
将其中的if条件去掉,也就是删除两行。然后进入${AZKABAN_PLUGINS_SOURCE}/plugins/hadoopsecuritymanager/目录,运行:Exception in thread "main" java.lang.ClassNotFoundException: azkaban.jobtype.ReportalHiveRunner
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at azkaban.jobtype.HadoopJavaJobRunnerMain.getObject(HadoopJavaJobRunnerMain.java:299)
at azkaban.jobtype.HadoopJavaJobRunnerMain.(HadoopJavaJobRunnerMain.java:146)
at azkaban.jobtype.HadoopJavaJobRunnerMain.main(HadoopJavaJobRunnerMain.java:76)
${AZKABAN_EXECUTOR_SERVER}/plugins/jobtypes/reportalhive/lib/azkaban-reportal-2.5.jar也有bug,需要修改${AZKABAN_PLUGINS_SOURCE}/plugins/reportal/src/azkaban/jobtype/ReportalHiveRunner.java文件,找到如下代码片段:if (!ShimLoader.getHadoopShims().usesJobShell()) {
...
...
}
删除if条件,然后进入${AZKABAN_PLUGINS_SOURCE}/plugins/reportal,运行sudo ant生成${AZKABAN_PLUGINS_SOURCE}/dist/reportal/jars/azkaban-reportal-2.5.jar,用这个jar来替换掉${AZKABAN_EXECUTOR_SERVER}/plugins/jobtypes/reportalhive/lib/azkaban-reportal-2.5.jar,否则在运行report任务时会报如下错误:Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.usesJobShell()Z
at azkaban.jobtype.HadoopSecureHiveWrapper.runHive(HadoopSecureHiveWrapper.java:148)
at azkaban.jobtype.HadoopSecureHiveWrapper.main(HadoopSecureHiveWrapper.java:115)
hive.aux.jars.path=file://${hive.home}/aux/lib
hadoop.dir.conf=${hadoop.home}/etc/hadoop
hive.aux.jars.path - 使用本地的hive aux lib,如果使用hdfs,将file改为hdfs即可 jobtype.classpath=${hadoop.home}/conf,${hadoop.home}/lib/*,${hive.home}/lib/*,./lib/*
hive.aux.jars.path=file://${hive.home}/aux/lib
hadoop.dir.conf=${hadoop.home}/etc/hadoop
#jobtype.global.classpath=
#hive.classpath.items=
jobtype.classpath - 与hive插件的配置不一样,需要将插件本身的lib目录加入到classpath,以使用azkaban-reportal-2.5.jar,否则会报错。最后,部署完毕,可以运行多种任务了