Oozie 入门

1 Oozie 简介

一个基于工作流引擎的开源框架,提供对 Hadoop MapReduce、Pig Jobs 的任务调度与协调,主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。

2 功能模块

2.1 模块

1、Workflow

顺序执行流程节点,支持 fork(分支多个节点),join(合并多个节点为一个)

2、Coordinator

定时触发 workflow

3、Bundle

绑定多个 Coordinator

2.2 常用节点

  1. 控制流节点(Control Flow Nodes

控制流节点一般都是定义在工作流开始或者结束的位置,比如start,end,kill 等,以及提供工作流的执行路径机制,如decision,fork,join 等。

  1. 动作节点(Action Nodes

负责执行具体动作的节点,比如:拷贝文件,执行某个 Shell 脚本等等

3 安装部署

3.1 Hadoop

配置 core-site.xml

 


fs.defaultFS
hdfs://hadoop101:8020



hadoop.tmp.dir
/opt/module/cdh/hadoop-2.5.0-cdh5.3.6/data/tmp

配置 hadoop-env.sh

 #修改JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144

配置 hdfs-site.xml

 


dfs.replication
3



dfs.namenode.secondary.http-address
hadoop104:50090

配置 yarn-env.sh

 #修改JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144

配置 yarn-site.xml

 


yarn.nodemanager.aux-services
mapreduce_shuffle



yarn.resourcemanager.hostname
hadoop103



yarn.log-aggregation-enable
true



yarn.log-aggregation.retain-seconds
604800

配置 mapred-env.sh

 #修改JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144

配置 mapred-site.xml

 


mapreduce.framework.name
yarn



mapreduce.jobhistory.address
hadoop104:10020



mapreduce.jobhistory.webapp.address
hadoop104:19888

配置 salves

 hadoop102
hadoop103
hadoop104

在集群上分发配置好的 Hadoop 配置文件

 [djm@hadoop102 ~]$ xsync /opt/module/hadoop-2.7.2/etc/hadoop/

启动集群

 [djm@hadoop102 hadoop-2.5.0-cdh5.3.6] sbin/start-yarn.sh
[djm@hadoop102 hadoop-2.5.0-cdh5.3.6]$ sbin/mr-jobhistory-daemon.sh start historyserver

3.2 Oozie

解压 oozie

 [djm@hadoop102 software]$ tar -zxvf /opt/software/cdh/oozie-4.0.0-cdh5.3.6.tar.gz -C /opt/module

在 oozie 根目录下解压 oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ tar -zxvf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C ../

在 oozie 目录下创建 libext 目录

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ mkdir libext/

拷贝依赖的 jar 包

 [djm@hadoop102 oozie-4.0.0-cdh5.3.6] cp -a /opt/software/mysql-connector-java-5.1.27-bin.jar ./libext/

将 ext-2.2.zip 拷贝到 libext 目录下

 [djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -a /opt/software/cdh/ext-2.2.zip libext/

修改 oozie-site.xml

 

oozie.service.JPAService.jdbc.driver
com.mysql.jdbc.Driver



oozie.service.JPAService.jdbc.url
jdbc:mysql://hadoop102:3306/oozie



oozie.service.JPAService.jdbc.username
root



oozie.service.JPAService.jdbc.password
123456


oozie.service.HadoopAccessorService.hadoop.configurations
*=/opt/module/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop

初始化 oozie

#进入MySQL并创建oozie数据库:
create database oozie;

上传Oozie目录下的yarn.tar.gz文件到HDFS:

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie-setup.sh sharelib create -fs hdfs://hadoop102:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz

创建oozie.sql文件

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/ooziedb.sh create -sqlfile oozie.sql -run

打包项目,生成war包

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie-setup.sh prepare-war

oozie 的启动与关闭

启动命令如下:
[djm@hadoop102 oozie-4.0.0-cdh5.3.6] bin/oozied.sh stop

访问 Web 界面

http://hadoop102:11000/oozie

4 实战案例

4.1 单节点工作流

1、创建工作目录

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ mkdir -p oozie-apps/shell

2、在 oozie-apps/shell 目录下创建 workflow.xml、job.properties

[djm@hadoop102 shell] touch job.properties

3、编辑 workflow.xml








{nameNode}


mapred.job.queue.name
{queueName}</value> </property> </configuration> <exec>mkdir</exec> <argument>/opt/module/d</argument> <capture-output/> </shell> <ok to=
Shell action failed, error message[">{wf:errorMessage(wf:lastErrorNode())}]




4、编辑 job.properties

#HDFS地址
nameNode=hdfs://hadoop102:8020

ResourceManager地址

jobTracker=hadoop103:8032

队列名称

queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path={user.name}/${examplesRoot}/shell

5、上传配置

[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm

6、执行任务

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/shell/job.properties -run

4.2 多节点工作流

1、编辑 workflow.xml

 xmlns="uri:oozie:workflow:0.4" name="shell-wf">



{nameNode}


mapred.job.queue.name
{queueName}</value> </property> </configuration> <exec>mkdir</exec> <argument>/opt/module/d1</argument> <capture-output/> </shell> <ok to=
">{jobTracker}
{queueName}


mkdir
/opt/module/d2







{nameNode}


mapred.job.queue.name
{queueName}</value> </property> </configuration> <exec>mkdir</exec> <argument>/opt/module/d3</argument> <capture-output/> </shell> <ok to=
">{jobTracker}
{queueName}


mkdir
/opt/module/d4






Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]



2、编辑 job.properties

nameNode=hdfs://hadoop102:8020
jobTracker=hadoop103:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path={user.name}/${examplesRoot}/shell

3、删除配置

[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -rm -r -f /user/djm/oozie-apps/

4、上传配置

[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm

5、执行任务

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/shell/job.properties -run

4.3 oozie 调度 MR

1、拷贝官方模板到 oozie-apps

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -r /opt/module/cdh/ oozie-4.0.0-cdh5.3.6/examples/apps/map-reduce/ oozie-apps/

2、编辑 workflow.xml





{nameNode}

Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]


3、编辑 job.properties

jobTracker=hadoop103:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path={user.name}/${examplesRoot}/map-reduce/workflow.xml

4、拷贝待执行的jar包到map-reduce的lib目录下

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -a /opt /module/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar oozie-apps/map-reduce/lib

3、删除配置

[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -rm -r -f /user/djm/oozie-apps/

4、上传配置

[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm

5、执行任务

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/map-reduce/job.properties -run

4.4 定时任务

1、检查是否安装了 ntp 服务

[root@hadoop102 ~]# rpm -qa | grep ntp

2、修改 /etc/ntp.conf

restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

修改为
restrict 192.168.10.0 mask 255.255.255.0 nomodify notrap

server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
修改为

server 0.centos.pool.ntp.org iburst

server 1.centos.pool.ntp.org iburst

server 2.centos.pool.ntp.org iburst

server 3.centos.pool.ntp.org iburst

添加
server 127.127.1.0
fudge 127.127.1.0 stratum 10

3、修改 /etc/sysconfig/ntpd

#同步硬件时间
SYNC_HWCLOCK=yes

4、重新启动 ntpd 服务

[root@hadoop102 ~]# systemctl restart ntpd

5、设置 ntpd 服务开机启动

[root@hadoop102 ~]# chkconfig ntpd on

6、在其他机器配置 10 分钟与时间服务器同步一次

[root@hadoop102 ~]# crontab -e
添加
*/10 * * * * /usr/sbin/ntpdate hadoop102

7、修改oozie-site.xml


oozie.processing.timezone
GMT+0800

8、重启 oozie

[djm@hadoop102 oozie-4.0.0-cdh5.3.6] bin/oozied.sh start

9、拷贝官方模板到 oozie-apps

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ cp -r /opt/module/cdh/ oozie-4.0.0-cdh5.3.6/examples/apps/map-reduce/ oozie-apps/

10、修改 workflow.xml





{nameNode}


mapred.job.queue.name
{queueName}</value> </property> </configuration> <exec>p1.sh</exec> <file>/user/djm/oozie-apps/cron/p1.sh</file> <capture-output/> </shell> <ok to=
Shell action failed, error message[">{wf:errorMessage(wf:lastErrorNode())}]



11、修改 coordinator.xml

nameNode=hdfs://hadoop102:8020
jobTracker=hadoop103:8032
queueName=default
examplesRoot=oozie-apps
oozie.coord.application.path={user.name}/{nameNode}/user/{examplesRoot}/cron

13、创建并修改 p1.sh

[djm@hadoop102 cron]$ vim p1.sh
date >> /opt/module/p1.log

14、删除配置

[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -rm -r -f /user/djm/oozie-apps/

15、上传配置

[djm@hadoop102 shell]$ /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/bin/hadoop fs -put oozie-apps/ /user/djm

16、执行任务

[djm@hadoop102 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop102:11000/oozie -config oozie-apps/cron/job.properties -run

你可能感兴趣的:(Oozie 入门)