Our Oozie Tutorials will cover most of the available workflow actions with and without Kerberos authentication.
Let’s have a look at some basic concepts of Oozie.
Oozie is open source workflow management system. We can schedule Hadoop jobs via Oozie which includes hive/pig/sqoop etc. actions. Oozie provides great features to trigger workflows based on data availability,job dependency,scheduled time etc.
More information about Oozie is available here.
Oozie workflow is DAG(Directed acyclic graph) contains collection of actions. DAG contains two types of nodes action nodes and control nodes, action node is responsible for execution of tasks such as MapReduce, Pig, Hive etc. We can also execute shell scripts using action node. Control node is responsible for execution order of actions.
In production systems its necessary to run Oozie workflows on a regular time interval or trigger workflows when input data is available or execute workflows after completion of dependent job. This can be achieved by Oozie co-ordinator job.
Bundle is set of Oozie co-ordinators which gives us better control to start/stop/suspend/resume multiple co-ordinators in a better way.
Oozie launcher is map only job which runs on Hadoop Cluster, for e.g. you want to run a hive script, you can just run “hive -f
[root@sandbox shell]# cat ~/sample.sh #!/bin/bash echo "`date` hi" > /tmp/output
hadoop fs -put sample.sh /user/root/
[root@sandbox shell]# cat job.properties nameNode=hdfs://:8020 jobTracker= :8050 queueName=default examplesRoot=examples oozie.wf.application.path=${nameNode}/user/${user.name}
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
sample.sh
/user/root/sample.sh
Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
hadoop fs -copyFromLocal -f workflow.xml /user/root/
oozie job -oozie http://:11000/oozie -config job.properties -run
http://:11000/oozie
[root@sandbox shell]# cat /tmp/output Sun Apr 3 19:44:52 UTC 2016 hi