Azkaban是一套简单的任务调度服务,整体包括三部分webserver、dbserver、executorserver。
Azkaban是由Linkedin开源的一个Java项目,批量工作流任务调度器。用于在一个工作流内以一个特定的顺序运行一组工作和流程。
Azkaban定义了一种KV文件格式来建立任务之间的依赖关系,并提供一个易于使用的web用户界面维护和跟踪你的工作流。
项目官网:https://azkaban.github.io/
Azkaban的功能特点
1、 Web用户界面
2、 方便上传工作流
3、 方便设置任务之间的关系
4、 工作流调度
5、 认证/授权
6、 能够杀死并重启工作流
7、 模块化和可插拔的插件机制
8、 项目工作区
9、 工作流和任务的日志记录和审计
安装部署需要3个组件:
azkaban-executor-server-2.5.0.tar.gz
azkaban-sql-script-2.5.0.tar.gz
azkaban-web-server-2.5.0.tar.gz
网盘共享连接地址::https://pan.baidu.com/s/1mMuIuVv9Ji6yO2A2b8Ibrg
提取码:seld
【注意:】提前部署mysql服务,这里不介绍安装mysql
# 上传安装包
wangting@ops01:/opt/software/azkaban >ll
total 22612
-rw-r--r-- 1 root root 11157302 May 16 10:45 azkaban-executor-server-2.5.0.tar.gz
-rw-r--r-- 1 root root 1928 May 16 10:45 azkaban-sql-script-2.5.0.tar.gz
-rw-r--r-- 1 root root 11989669 May 16 10:45 azkaban-web-server-2.5.0.tar.gz
# 创建应用目录,利于解压多组件都在一个管理目录中
wangting@ops01:/opt/software/azkaban >mkdir /opt/module/azkaban
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >tar -xf azkaban-web-server-2.5.0.tar.gz -C /opt/module/azkaban/
wangting@ops01:/opt/software/azkaban >ls /opt/module/azkaban/
azkaban-2.5.0 azkaban-executor-2.5.0 azkaban-web-2.5.0
wangting@ops01:/opt/software/azkaban >
wangting@ops01:/opt/software/azkaban >cd /opt/module/azkaban/
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 azkaban-executor-2.5.0
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 azkaban-web-2.5.0
# 改名,易于管理和切换目录
wangting@ops01:/opt/module/azkaban >mv azkaban-executor-2.5.0 executor
wangting@ops01:/opt/module/azkaban >mv azkaban-web-2.5.0 server
wangting@ops01:/opt/module/azkaban >ll
total 12
drwxrwxr-x 2 wangting wangting 4096 May 16 10:48 azkaban-2.5.0
drwxrwxr-x 7 wangting wangting 4096 May 16 10:47 executor
drwxrwxr-x 8 wangting wangting 4096 May 16 10:48 server
wangting@ops01:/opt/module/azkaban >
# azkaban-2.5.0目录下sql文件用于后面azkaban数据库项目初始化
wangting@ops01:/opt/module/azkaban >ls azkaban-2.5.0/
create.active_executing_flows.sql create.execution_flows.sql create.project_events.sql create.project_permissions.sql create.project_versions.sql create.triggers.sql update-all-sql-2.2.sql
create.active_sla.sql create.execution_jobs.sql create.project_files.sql create.project_properties.sql create.properties.sql database.properties update.execution_logs.2.1.sql
create-all-sql-2.5.0.sql create.execution_logs.sql create.project_flows.sql create.projects.sql create.schedules.sql update-all-sql-2.1.sql update.project_properties.2.1.sql
# 查看本机IP 和mysql服务是否正常运行着
wangting@ops01:/opt/module/azkaban >ifconfig eth0 |grep "inet "
inet 11.8.37.50 netmask 255.255.255.0 broadcast 11.8.37.255
wangting@ops01:/opt/module/azkaban >netstat -tnlpu|grep 3306
tcp6 0 0 :::3306 :::* LISTEN -
# 登录mysql
wangting@ops01:/opt/module/azkaban >mysql -uroot -pwangting
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 37069
Server version: 5.7.26 MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# 创建azkaban库
mysql> create database azkaban;
Query OK, 1 row affected (0.00 sec)
mysql> use azkaban;
Database changed
mysql> show tables;
Empty set (0.00 sec)
# 初始化
mysql> source /opt/module/azkaban/azkaban-2.5.0/create-all-sql-2.5.0.sql
mysql> show tables;
+------------------------+
| Tables_in_azkaban |
+------------------------+
| active_executing_flows |
| active_sla |
| execution_flows |
| execution_jobs |
| execution_logs |
| project_events |
| project_files |
| project_flows |
| project_permissions |
| project_properties |
| project_versions |
| projects |
| properties |
| schedules |
| triggers |
+------------------------+
15 rows in set (0.00 sec)
# 完成退出
mysql> exit
Bye
wangting@ops01:/opt/module/azkaban >
wangting@ops01:/opt/module/azkaban >cd server
wangting@ops01:/opt/module/azkaban/server >pwd
/opt/module/azkaban/server
# 生成认证 keystore jetty 都是配置文件中对应的名称
wangting@ops01:/opt/module/azkaban/server >keytool -keystore keystore -alias jetty -genkey -keyalg RSA
Enter keystore password: # wangting 密码可以自定义
Re-enter new password: # wangting 重复密码
What is your first and last name? # 回车
[Unknown]:
What is the name of your organizational unit? # 回车
[Unknown]:
What is the name of your organization? # 回车
[Unknown]:
What is the name of your City or Locality? # 回车
[Unknown]:
What is the name of your State or Province? # 回车
[Unknown]:
What is the two-letter country code for this unit? # 回车
[Unknown]:
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct? # y
[no]: y
Enter key password for <wangting>
(RETURN if same as keystore password): # 回车
Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore keystore -destkeystore keystore -deststoretype pkcs12".
wangting@ops01:/opt/module/azkaban/server >
# 查看一下时区
wangting@ops01:/opt/module/azkaban/server >cat /etc/localtime
TZifǚ^ ??ˊ??л>???-???????fp???|?? i ~?!I}"g? #)_$G %|&'e &??G (р~pCDTCSTTZif2
6C)????ǚ^???? ?????????ˊ????@????л>????{?????-????"????????????????fp??????????|?? i ~?!I}"g? #)_$G %|&'e &??G (рq?LMTCDTCST
CST-8
# 最后需要时CST-8,如果不是CST-8 东八区时区需要调置
wangting@ops01:/opt/module/azkaban/server >cd conf/
# 更改server配置
wangting@ops01:/opt/module/azkaban/server/conf >ls
azkaban.properties azkaban-users.xml
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban.properties
default.timezone.id=Asia/Shanghai # 改成Asia/Shanghai
database.type=mysql
mysql.port=3306
mysql.host=11.8.37.50 # IP改成mysql部署的ip
mysql.database=azkaban # 刚才创建的azkaban库
mysql.user=root
mysql.password=wangting
mysql.numconnections=100
# Azkaban Jetty server properties.
jetty.maxThreads=25
jetty.ssl.port=8443
jetty.port=8081
jetty.keystore=keystore # keytool执行时对应的keystore
jetty.password=wangting # 密码都改成刚才设置的密码
jetty.keypassword=wangting
jetty.truststore=keystore
jetty.trustpassword=wangting
# 添加用户,相当于注册功能
wangting@ops01:/opt/module/azkaban/server/conf >vim azkaban-users.xml
" password=" azkaban" roles="admin" groups="azkaban" />
" password=" metrics" roles="metrics"/>
" password=" wangting" roles="admin, metrics"/> # 可自定义用户名密码,用于界面登录使用
" permissions=" ADMIN" />
" permissions=" METRICS"/>
</azkaban-users>
# 更改executor配置
wangting@ops01:/opt/module/azkaban/server/conf >cd /opt/module/azkaban/executor/conf/
wangting@ops01:/opt/module/azkaban/executor/conf >ls
azkaban.private.properties azkaban.properties global.properties
wangting@ops01:/opt/module/azkaban/executor/conf >vim azkaban.properties
#Azkaban
default.timezone.id=Asia/Shanghai # 改成Asia/Shanghai
# Azkaban JobTypes Plugins
azkaban.jobtype.plugin.dir=plugins/jobtypes
#Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
database.type=mysql # 数据库更改
mysql.port=3306
mysql.host=11.8.37.50
mysql.database=azkaban
mysql.user=root
mysql.password=wangting
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30
wangting@ops01:/opt/module/azkaban/executor/conf >cd /opt/module/azkaban/server/
wangting@ops01:/opt/module/azkaban/server >bin/azkaban-web-start.sh
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
2021/05/16 11:26:42.425 +0800 INFO [log] [Azkaban] Started [email protected]:8443
2021/05/16 11:26:42.425 +0800 INFO [AzkabanWebServer] [Azkaban] Server running on ssl port 8443.
wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh
wangting@ops01:/opt/module/azkaban/server >cd /opt/module/azkaban/executor/
wangting@ops01:/opt/module/azkaban/executor >bin/azkaban-executor-start.sh
Using Hadoop from /opt/module/hadoop-3.1.3
Using Hive from /opt/module/hive
bin/..
Starting AzkabanExecutorServer on port 12321 ...
2021/05/16 11:29:20.076 +0800 INFO [log] [Azkaban] Started [email protected]:12321
2021/05/16 11:29:20.076 +0800 INFO [AzkabanExecutorServer] [Azkaban] Azkaban Executor Server started on port 12321
成功登录,部署流程完毕。
projects:最重要的部分,创建一个工程,所有flows将在工程中运行。
scheduling: 显示定时任务
executing: 显示当前运行的任务
history: 显示历史运行任务
创建一个project
project_1
任务如何执行,任务具体做什么是在job文件中定义
# 本地新建一个command.job文件,文件中的内容末尾不要有空格,内容如下:
# command.job
type=command
command=mkdir /opt/module/ztdata_0516
编辑好command.job文件后,利用压缩软件,打包成zip文件,例如command.zip
上传后,如果想看job的内容是什么,可以在job command中可以查看解析出任务内容
点击Flows中 command任务,可以进入到任务的具体界面,Execute Flow 可以执行任务
【注意:】 因为利用界面化操作,所以相关的文件直接在本地windows电脑上去编辑,创建,打包zip即可。
wangting@ops01:/opt/module >ll
total 52
drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive
drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka
drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix
drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410
drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >
任务完成后,验证: 在/opt/module/目录下,成功创建了ztdata_0516新目录,说明任务被成功挂起并执行
【注意:】 后续实验不再一个个截图,流程同例1。
创建一个project
project_2
描述信息
本地创建2个job文件
one.job
# one.job
type=command
command=mkdir /opt/module/one
two.job
# two.job
type=command
dependencies=one
command=touch /opt/module/one/two.txt
【注意:】 dependencies=one 意思是two这个job任务,依赖one这个任务,定义了这个参数,则意味着他们是先后执行,two需要one执行完成后再执行
首页,点击上方Projects分页栏,打开project_2项目,右上角Upload;然后上传zip文件
点击Flows,点击主任务的two,进入后执行
wangting@ops01:/opt/module >ll
total 56
drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive
drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x 2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix
drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410
drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516
wangting@ops01:/opt/module >cd one/
wangting@ops01:/opt/module/one >ls
two.txt
wangting@ops01:/opt/module/one >
任务完成后,验证: 在/opt/module/目录下,成功创建了one新目录,说明任务1被成功挂起并执行
进入one目录,成功查看到two.txt 文件,说明任务2被成功挂起并执行
在服务器上编写一个模拟流程复杂,例如调用脚本去执行hive、hdfs等等,业务脚本任务:
/opt/module/test >vim test_azkaban.sh
wangting@ops01:/opt/module/test >vim test_azkaban.sh
#!/bin/bash
echo "123"
echo "123123"
echo "123123123"
ls -l /opt/module/ >> /opt/module/test/shell_log_0516.log
hdfs dfs -ls / >> /opt/module/test/shell_log_0516.log
NOW=`date|awk -F" " '{print $4}'`
echo "当前时间: $NOW"
wangting@ops01:/opt/module/test >chmod +x test_azkaban.sh
创建一个project
project_3
描述信息
# run_bash.job
type=command
command=bash /opt/module/test/test_azkaban.sh
同上方案例
首页,点击上方Projects分页栏,打开project_3项目,右上角Upload;然后上传zip文件
点击Flows,点击主任务的run_bash,进入后执行
wangting@ops01:/opt/module/test >ll
total 8
-rw-rw-r-- 1 wangting wangting 1801 May 16 12:49 shell_log_0516.log
-rwxrwxr-x 1 wangting wangting 226 May 16 12:44 test_azkaban.sh
wangting@ops01:/opt/module/test >
# 查看输出内容是否有遍历目录和查看hdfs根目录的内容
wangting@ops01:/opt/module/test >cat shell_log_0516.log
total 60
drwxrwxr-x 5 wangting wangting 4096 May 16 10:51 azkaban
drwxrwxr-x 2 wangting wangting 4096 Apr 4 11:01 datas
drwxr-xr-x 12 wangting wangting 4096 Apr 24 16:37 flume
-rw-rw-r-- 1 wangting wangting 30 Apr 25 11:33 group.log
drwxr-xr-x 12 wangting wangting 4096 Mar 12 11:38 hadoop-3.1.3
drwxrwxr-x 8 wangting wangting 4096 May 10 11:55 hbase
drwxrwxr-x 11 wangting wangting 4096 Apr 2 15:14 hive
drwxr-xr-x 7 wangting wangting 4096 Apr 29 11:07 kafka
drwxrwxr-x 2 wangting wangting 4096 May 16 12:34 one
drwxr-xr-x 5 wangting wangting 4096 Jun 27 2018 phoenix
drwxrwxr-x 2 wangting wangting 4096 May 16 12:49 test
drwxrwxr-x 3 wangting wangting 4096 Apr 10 16:25 tez
drwxrwxr-x 5 wangting wangting 4096 Apr 2 15:03 tez-0.9.2_bak0410
drwxr-xr-x 8 wangting wangting 4096 Mar 25 11:02 zookeeper-3.5.7
drwxrwxr-x 2 wangting wangting 4096 May 16 11:51 ztdata_0516
2021-05-16 12:49:16,801 INFO [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
Found 10 items
drwxr-xr-x - wangting supergroup 0 2021-03-17 11:44 /20210317
drwxr-xr-x - wangting supergroup 0 2021-03-19 10:51 /20210319
drwxr-xr-x - wangting supergroup 0 2021-04-24 17:05 /flume
-rw-r--r-- 3 wangting supergroup 338075860 2021-03-12 11:50 /hadoop-3.1.3.tar.gz
drwxr-xr-x - wangting supergroup 0 2021-05-13 15:31 /hbase
drwxr-xr-x - wangting supergroup 0 2021-04-04 11:07 /test.db
drwxr-xr-x - wangting supergroup 0 2021-03-19 11:14 /testgetmerge
drwxr-xr-x - wangting supergroup 0 2021-04-10 16:23 /tez
drwx------ - wangting supergroup 0 2021-04-02 15:14 /tmp
drwxr-xr-x - wangting supergroup 0 2021-04-02 15:25 /user
wangting@ops01:/opt/module/test >
任务完成后,验证: 在/opt/module/test目录下,成功创建了shell_log_0516.log文件,说明任务run_bash被成功挂起并执行