oozie概述:oozie能干什么
oozie格式:怎么用oozie
oozie执行:怎么运行oozie
oozie概述:
oozie是基于hadoop的调度器,以xml的形式写调度流程,可以调度mr,pig,hive,shell,jar等等。
主要的功能有
Workflow: 顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)
Coordinator,定时触发workflow
Bundle Job,绑定多个coordinator
oozie格式:
写一个oozie,有两个是必要的:job.properties 和 workflow.xml(coordinator.xml,bundle.xml)
一、job.properties里定义环境变量
nameNode | hdfs://xxx5:8020 | hdfs地址 |
jobTracker | xxx5:8034 | jobTracker地址 |
queueName | default | oozie队列 |
examplesRoot | examples | 全局目录 |
oozie.usr.system.libpath | true | 是否加载用户lib库 |
oozie.libpath | share/lib/user | 用户lib库 |
oozie.wf.appication.path | ${nameNode}/user/${user.name}/... | oozie流程所在hdfs地址 |
注意:
workflow:oozie.wf.application.path
coordinator:oozie.coord.application.path
bundle:oozie.bundle.application.path
二、XML
1.workflow:
<workflow-app xmlns="uri:oozie:workflow:0.2" name="wf-example1"> <start to="pig-node"> <action name="pig-node"> <pig> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="hdfs://xxx5/user/hadoop/appresult" /> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>default</value> <property> <property> <name>mapred.compress.map.output</name> <value>true</value> <property> <property> <name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name> <value>false</value> <property> </configuration> <script>test.pig</script> <param>filepath=${filpath}</param> </pig> <ok to="end"> <error to="fail"> </action> <kill name="fail"> <message> Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name="end"/> </workflow-app>
2.coordinator
<coordinator-app name="cron-coord" frequence="${coord:hours(6)}" start="${start}" end="${end}" timezoe="UTC" xmlns="uri:oozie:coordinator:0.2"> <action> <workflow> <app-path>${nameNode}/user/{$coord:user()}/${examplesRoot}/wpath</app-path> <configuration> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> </configuration> </workflow> </action>
注意:coordinator设置的UTC,比北京时间晚8个小时,所以你要是把期望执行时间减8小时
3.bundle
<bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'> <controls> <kick-off-time>${kickOffTime}</kick-off-time> </controls> <coordinator name='coordJobFromBundle1' > <app-path>${appPath}</app-path> <configuration> <property> <name>startTime1</name> <value>${START_TIME}</value> </property> <property> <name>endTime1</name> <value>${END_TIME}</value> </property> </configuration> </coordinator> <coordinator name='coordJobFromBundle2' > <app-path>${appPath2}</app-path> <configuration> <property> <name>startTime2</name> <value>${START_TIME2}</value> </property> <property> <name>endTime2</name> <value>${END_TIME2}</value> </property> </configuration> </coordinator> </bundle-app>
oozie hive
<action name="hive-app"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>hive-site.xml</job-xml> <script>hivescript.q</script> <param>yyyymmdd=${yyyymmdd}</param> <param>yesterday=${yesterday}</param> <param>lastmonth=${lastmonth}</param> </hive> <ok to="result-stat-join"/> <error to="fail"/> </action>
oozie运行
启动任务:
oozie job -oozie http://xxx5:11000/oozie -config job.properties -run
停止任务:
oozie job -oozie http://localhost:8080/oozie -kill 14-20090525161321-oozie-joe
注意:在停止任务的时候,有的时候会出现全线问题,需要修改oozie-site.xml文件
hadoop.proxyuser.oozie.groups *
hadoop.proxyuser.oozie.hosts *
oozie.server.ProxyUserServer.proxyuser.hadoop.hosts *
oozie.server.ProxyUserServer.proxyuser.hadoop.groups *
以上所有东西虽然已经使用过了,但是内容都是手打的,若有笔误,请见谅