1 Oozie 简介
一个基于工作流引擎的开源框架,提供对 Hadoop MapReduce、Pig Jobs 的任务调度与协调,主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。
2 功能模块
2.1 模块
1、Workflow
顺序执行流程节点,支持 fork(分支多个节点),join(合并多个节点为一个)
2、Coordinator
定时触发 workflow
3、Bundle
绑定多个 Coordinator
2.2 常用节点
- 控制流节点(Control Flow Nodes)
控制流节点一般都是定义在工作流开始或者结束的位置,比如start,end,kill 等,以及提供工作流的执行路径机制,如decision,fork,join 等。
- 动作节点(Action Nodes)
负责执行具体动作的节点,比如:拷贝文件,执行某个 Shell 脚本等等
3 安装部署
3.1 Hadoop
配置 core-site.xml
fs.defaultFS
hdfs://hadoop101:8020
hadoop.tmp.dir
/opt/module/cdh/hadoop-2.5.0-cdh5.3.6/data/tmp
配置 hadoop-env.sh
#修改JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
配置 hdfs-site.xml
dfs.replication
3
dfs.namenode.secondary.http-address
hadoop104:50090
配置 yarn-env.sh
#修改JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
配置 yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
hadoop103
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
604800
配置 mapred-env.sh
#修改JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
配置 mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop104:10020
mapreduce.jobhistory.webapp.address
hadoop104:19888
配置 salves
hadoop102
hadoop103
hadoop104
在集群上分发配置好的 Hadoop 配置文件
[djm@hadoop102 ~]$ xsync /opt/module/hadoop-2.7.2/etc/hadoop/
启动集群
[djm@hadoop102 hadoop-2.5.0-cdh5.3.6] sbin/start-yarn.sh
[djm@hadoop102 hadoop-2.5.0-cdh5.3.6]$ sbin/mr-jobhistory-daemon.sh start historyserver
3.2 Oozie
解压 oozie
[djm@hadoop102 software]$ tar -zxvf /opt/software/cdh/oozie-4.0.0-cdh5.3.6.tar.gz -C /opt/module
在 oozie 根目录下解压 oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../
在 oozie 目录下创建 libext 目录
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ mkdir libext/
拷贝依赖的 jar 包
[djm@hadoop102 oozie-4.0.0-cdh5.3.6] cp -a /opt/software/mysql-connector-java-5.1.27-bin.jar ./libext/
将 ext-2.2.zip 拷贝到 libext 目录下
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -a /opt/software/cdh/ext-2.2.zip libext/
修改 oozie-site.xml
oozie.service.JPAService.jdbc.driver
com.mysql.jdbc.Driver
oozie.service.JPAService.jdbc.url
jdbc:mysql://hadoop102:3306/oozie
oozie.service.JPAService.jdbc.username
root
oozie.service.JPAService.jdbc.password
123456
oozie.service.HadoopAccessorService.hadoop.configurations
*=/opt/module/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop
初始化 oozie
#进入MySQL并创建oozie数据库:
create database oozie;
上传Oozie目录下的yarn.tar.gz文件到HDFS:
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie-setup.sh sharelib create -fs hdfs://hadoop102:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
创建oozie.sql文件
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/ooziedb.sh create -sqlfile oozie.sql -run
打包项目,生成war包
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie-setup.sh prepare-war
oozie 的启动与关闭
启动命令如下:
[djm@hadoop102 oozie-4.0.0-cdh5.3.6] bin/oozied.sh stop
访问 Web 界面
http://hadoop102:11000/oozie
4 实战案例
4.1 单节点工作流
1、创建工作目录
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ mkdir -p oozie-apps/shell
2、在 oozie-apps/shell 目录下创建 workflow.xml、job.properties
[djm@hadoop102 shell] touch job.properties
3、编辑 workflow.xml
{nameNode}
mapred.job.queue.name
Shell action failed, error message[">{wf:errorMessage(wf:lastErrorNode())}]
4、编辑 job.properties
#HDFS地址
nameNode=hdfs://hadoop102:8020
ResourceManager地址
jobTracker=hadoop103:8032
队列名称
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path={user.name}/${examplesRoot}/shell
5、上传配置
[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm
6、执行任务
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/shell/job.properties -run
4.2 多节点工作流
1、编辑 workflow.xml
xmlns="uri:oozie:workflow:0.4" name="shell-wf">
{nameNode}
mapred.job.queue.name
">{jobTracker}
{queueName}
mkdir
/opt/module/d2
{nameNode}
mapred.job.queue.name
">{jobTracker}
{queueName}
mkdir
/opt/module/d4
Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
2、编辑 job.properties
nameNode=hdfs://hadoop102:8020
jobTracker=hadoop103:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path={user.name}/${examplesRoot}/shell
3、删除配置
[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -rm -r -f /user/djm/oozie-apps/
4、上传配置
[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm
5、执行任务
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/shell/job.properties -run
4.3 oozie 调度 MR
1、拷贝官方模板到 oozie-apps
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -r /opt/module/cdh/ oozie-4.0.0-cdh5.3.6/examples/apps/map-reduce/ oozie-apps/
2、编辑 workflow.xml
{nameNode}
Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
3、编辑 job.properties
jobTracker=hadoop103:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path={user.name}/${examplesRoot}/map-reduce/workflow.xml
4、拷贝待执行的jar包到map-reduce的lib目录下
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -a /opt /module/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar oozie-apps/map-reduce/lib
3、删除配置
[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -rm -r -f /user/djm/oozie-apps/
4、上传配置
[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm
5、执行任务
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/map-reduce/job.properties -run
4.4 定时任务
1、检查是否安装了 ntp 服务
[root@hadoop102 ~]# rpm -qa | grep ntp
2、修改 /etc/ntp.conf
将
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
修改为
restrict 192.168.10.0 mask 255.255.255.0 nomodify notrap
将
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
修改为
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
添加
server 127.127.1.0
fudge 127.127.1.0 stratum 10
3、修改 /etc/sysconfig/ntpd
#同步硬件时间
SYNC_HWCLOCK=yes
4、重新启动 ntpd 服务
[root@hadoop102 ~]# systemctl restart ntpd
5、设置 ntpd 服务开机启动
[root@hadoop102 ~]# chkconfig ntpd on
6、在其他机器配置 10 分钟与时间服务器同步一次
[root@hadoop102 ~]# crontab -e
添加
*/10 * * * * /usr/sbin/ntpdate hadoop102
7、修改oozie-site.xml
oozie.processing.timezone
GMT+0800
8、重启 oozie
[djm@hadoop102 oozie-4.0.0-cdh5.3.6] bin/oozied.sh start
9、拷贝官方模板到 oozie-apps
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -r /opt/module/cdh/ oozie-4.0.0-cdh5.3.6/examples/apps/map-reduce/ oozie-apps/
10、修改 workflow.xml
{nameNode}
mapred.job.queue.name
Shell action failed, error message[">{wf:errorMessage(wf:lastErrorNode())}]
11、修改 coordinator.xml
nameNode=hdfs://hadoop102:8020
jobTracker=hadoop103:8032
queueName=default
examplesRoot=oozie-apps
oozie.coord.application.path={user.name}/{nameNode}/user/{examplesRoot}/cron
13、创建并修改 p1.sh
[djm@hadoop102 cron]$ vim p1.sh
date >> /opt/module/p1.log
14、删除配置
[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -rm -r -f /user/djm/oozie-apps/
15、上传配置
[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm
16、执行任务
[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/cron/job.properties -run