标签(空格分隔): Azkaban
Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
Azkaban是由LinkedIn公司开发的任务调度框架,解决Hadoop平台的Job之间的依赖关系
Azkaban主要包含三个组件:
Relational Database(MYSQL)
AzkabanWebServer
AzkabanExecutorServer
Azkaban使用MYSQL来存储它的很多状态信息,AzkabanWebServer和AzkabanExecutorServer都需要连接这个DB
其中:
AzkabanWebServer使用DB来监测下列几条:
SLA
AzkabanExecutorServer使用DB做以下事情:
AzkabanWebServer是Azkaban的主要管理服务,它主要用来项目管理,授权,调度和执行的监控,然后提供一个web服务接口
Azkaban使用*.job
key-value格式文件来定义一个工作流中独立的task,用_dependency_
属性定义各个
job链之间的依赖,Job文件和代码都可以打成zip包然后通过AzkabanWebServer的WebUI或者Curl上传到Azkaban
Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban’s usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became ‘peak’.
Azkaban3.0提供三种模式:
solo-server mode是用来测试使用的,只适用于小的usecase,数据库为内置的H2数据库,它的web-server和executor-server都运行在一个进程中,启动简单,并且包含了Azkaban的所有特征
安装步骤:
1.从git上下载源码,编译,需要gradle和java8
//下载镜像
git clone https://github.com/azkaban/azkaban.git
//编译安装,需要在linux下进行
cd azkaban; ./gradlew build installDist
cd azkaban-solo-server/build/install/azkaban-solo-server;
//启动和停止
bin/azkaban-solo-start.sh
bin/azkaban-solo-shutdown.sh
这种模式比较常用,数据库为MySQL,管理服务器和执行服务器在不同进程,这种模式下,管理服务器和执行服务器互不影响
安装步骤:
mysql> CREATE DATABASE azkaban;
mysql> CREATE USER 'azkaban'@'%' IDENTIFIED BY '123456';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE ON azkaban.* to 'azkaban'@'%' WITH GRANT OPTION;
//设置最大允许包的大小 max_allowed_packet
//To configure this in linux, open /etc/my.cnf. Somewhere after mysqld, add the following:
[mysqld]
...
max_allowed_packet=1024M
//重启mysql服务
$ sudo /sbin/service mysqld restart
1.从git上下载源码,编译,需要gradle和java8
https://github.com/azkaban/azkaban/releases?after=3.24.0
git clone https://github.com/azkaban/azkaban.git
cd azkaban/
//编译成tar包
./gradlew distTar
//将编译好的tar包拷贝出来
cp azkaban/azkaban-*/build/distributions/*.tar.gz /opt/soft/azkabanTar/
tar zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz -C /opt/modules/azkaban/
tar zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz -C /opt/modules/azkaban/
tar zxvf azkaban-sql-0.1.0-SNAPSHOT.tar.gz -C /opt/modules/azkaban/
tar zxvf azkaban-hadoop-security-plugin-0.1.0-SNAPSHOT.tar.gz -C /opt/modules/azkaban/
3.在mysql中创建Azkaban所需的表
mysql> use azkaban;
mysql>source /opt/modules/azkaban/azkaban-sql-0.1.0-SNAPSHOT/create-all-sql-0.1.0-SNAPSHOT.sql
4.拷贝JDBC驱动包到web-server和executor-server的exlib目录下
检查下载包web和executor的lib文件下是否有mysql驱动,若不存在,则拷贝一个。
解压上一步编译源码后,得到
该web-server目录下应该有以下目录:
bin - 包含启动服务的脚本
conf - 配置文件目录
lib - 依赖的jar包
extlib - 外部依赖的jar包
plugins - 安装外部插件的目录
web - web服务相关的文件
在conf目录下,会有以下几个文件:
azkaban.properties - Used by Azkaban for runtime parameters
global.properties - Global static properties that are passed as shared properties to every workflow and job.
azkaban-users.xml - Used to add users and roles for authentication. This file is not used if the XmLUserManager is not set up to use it.
The azkaban.properties file will be the main configuration file that is necessary to setup Azkaban.
Azkaban使用SSL网络传输安全协议,所以在初始安装时需要生成一个keystore安全证书来登录
生成keystore的步骤如下:(这个password要与azkaban.properties里面的keystore.password一致)
[vin@vin01 azkaban-web-server-0.1.0-SNAPSHOT]$ keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password:
Re-enter new password:
What is your first and last name?
[Unknown]: vim^Hn
What is the name of your organizational unit?
[Unknown]: hypers
What is the name of your organization?
[Unknown]: hypers
What is the name of your City or Locality?
[Unknown]: shanghai
What is the name of your State or Province?
[Unknown]: shanghai
What is the two-letter country code for this unit?
[Unknown]: CN
Is CN=vin, OU=hypers, O=hypers, L=shanghai, ST=shanghai, C=CN correct?
[no]: Y
Enter key password for
(RETURN if same as keystore password):
Re-enter new password:
[vin@vin01 azkaban-web-server-0.1.0-SNAPSHOT]$
以上配置完成之后会在当前目录生成一个keystore文件。以下配置会用到。
如果Azkaban WebServer下面没有conf目录,将azkaban-solo-web 下的conf目录拷贝过来
做如下配置
[vin@vin01 conf]$ cat azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Vin
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
project.temp.dir=temp
trigger.plugin.dir=plugins/triggers
database.type=mysql
# h2.path=./h2
# h2.create.tables=true
mysql.port=3306
mysql.host=192.168.73.6
mysql.database=azkaban
mysql.user=azkaban
mysql.password=123456
mysql.numconnections=100
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
# jetty.use.ssl=false
jetty.maxThreads=25
jetty.ssl.port=8443
jetty.port=8081
jetty.keystore=keystore
jetty.password=123456
jetty.keypassword=123456
jetty.truststore=keystore
jetty.trustpassword=123456
# Azkaban Executor settings
executor.port=12321
# mail settings
mail.sender=
mail.host=
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
# azkaban.jobtype.plugin.dir=plugins/jobtypes
在conf目录下
修改azkaban-users.xml文件,增加管理员用户
<azkaban-users>
<user username="azkaban" password="azkaban" roles="admin" groups="azkaban" />
<user username="metrics" password="metrics" roles="metrics"/>
<user username="admin" password="admin" roles="admin,metrics"/>
<role name="admin" permissions="ADMIN" />
<role name="metrics" permissions="METRICS"/>
azkaban-users>
启动命令:
cd /opt/modules/azkaban/azkaban-web-server-0.1.0-SNAPSHOT
bin/azkaban-web-start.sh
报错:
疑似JDK版本问题
将JDK改为JDK1.8,再次运行
报错:
解决办法:
在conf目录下创建log4j.properties
文件,并做如下配置:
log4j.rootLogger=INFO,C
log4j.appender.C=org.apache.log4j.ConsoleAppender
log4j.appender.C.Target=System.err
log4j.appender.C.layout=org.apache.log4j.PatternLayout
log4j.appender.C.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
继续运行 提示错误信息
解决办法:
下载slf4j-nop-1.6.1.jar
放到web-server的lib目录下
继续运行 ,仍然报错:
参考文章:
http://www.cnblogs.com/tannerBG/archive/2014/07/10/3835952.html
再次启动,成功
注意,因为开启了SSL,所以必须要用https连接
访问验证https://192.168.73.6:8443/
界面如下:
进入azkaban-executor-server目录,将azkaban-web-server下的conf目录拷贝到该目录下,配置executor端的azkaban.properties如下:
[vin@vin01 azkaban-exec-server-0.1.0-SNAPSHOT]$ cat conf/azkaban.properties
# Azkaban
default.timezone.id=Asia/Shanghai
# Azkaban JobTypes Plugins
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
database.type=mysql
mysql.port=3306
mysql.host=localhost
mysql.database=azkaban
mysql.user=azkaban
mysql.password=123456
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# uncomment to enable inmemory stats for azkaban
#executor.metric.reports=true
#executor.metric.milisecinterval.default=60000
启动executor-server
cd /opt/modules/azkaban/azkaban-exec-server-0.1.0-SNAPSHOT
bin/azkaban-executor-start.sh
Azkaban可以安装多个插件,某些插件安装在Web端,某些插件需要安装在Executor端。
插件源码可以从git上clone,需找到对应的release版本
git clone https://github.com/azkaban/azkaban-plugins.git
使用某个插件时,在其目录下执行ant
便会生成tar包
注意:当使用azkaban-plugins3.0.0版本时,编译jobtype插件时出错,改为azkaban-plugins2.7.0版本
步骤如下:
1.下载源码包
分别进入各个插件的目录里执行ant进行编译,编译完成后会生成相应的jar包
编译成功
该插件安装在web端
(1) 修改配置文件
修改opt/modules/azkaban/azkaban-web-server-0.1.0-SNAPSHOT/conf
目录下的文件azkaban.properties,新增一行
viewer.plugins=hdfs
这个参数会告诉Azkaban到plugins/viewer/hdfs路径下去寻找新的插件。所以,我们需要在plugins/路径下新建一个viewer文件夹,并将编译好的插件tar包解压到此处,文件夹重命名为hdfs。
(2) 配置依赖包
删除/opt/modules/azkaban/azkaban-web-server-0.1.0-SNAPSHOT/plugins/viewer/hdfs/exlib/hadoop-core-1.0.4.jar
并将hadoop的下列jar包拷贝到/opt/modules/azkaban/azkaban-web-server-0.1.0-SNAPSHOT/exlib
[vin@vin01 hdfs]$ ll extlib/
total 11348
-rwxr-xr-x. 1 vin vin 41123 Jun 27 12:25 commons-cli-1.2.jar
-rwxr-xr-x. 1 vin vin 73105 Jun 27 12:25 hadoop-auth-2.5.0-cdh5.3.6.jar
-rwxr-xr-x. 1 vin vin 3172844 Jun 27 12:25 hadoop-common-2.5.0-cdh5.3.6.jar
-rwxr-xr-x. 1 vin vin 7754144 Jun 27 12:25 hadoop-hdfs-2.5.0-cdh5.3.6.jar
-rwxr-xr-x. 1 vin vin 31212 Jun 27 12:25 htrace-core-3.0.4.jar
-rwxr-xr-x. 1 vin vin 533455 Jun 27 12:25 protobuf-java-2.5.0.jar
将hadoop的core-site.xml
文件加入到hadoop-common-2.5.0-cdh5.3.6.jar
中
[vin@vin01 extlib]$ jar -uf hadoop-common-2.5.0-cdh5.3.6.jar core-site.xml
启动web-server,一直报错,预计是HADOOP_HOME问题,暂时无法解决
Azkaban可调度的任务类型如下:
创建.job为后缀的文件,type是工作任务类型执行会输出 Hello World
vim hello.job
type=command
command=echo "Hello World"
参考http://blog.csdn.net/djd1234567/article/details/51438385