OOZIE概览

[TOC]

调度框架：Linux Crontab，Azkaban，oozie，zeus

三款任务调度系统比较

简介

oozie是一个工作流调度系统

工作流的调度是DAG
可扩展：一个oozie就是一个mr任务，但是仅仅是map，没有reduce
可靠性：任务失败后重试
集成了Hadoop生态系统的其他任务，如mr、pig、hive、sqoop、spark

主要组件

tomcat（servlet进行调用并页面显示任务）
数据库(存储任务)
Bundle,coordinator,workflow

架构图

三大服务模块

Oozie V3 :a server based Bundle engine：对多个coordinator进行封装，可以启动，停止，挂起，关闭，重启一组coordinator的任务
Oozie V2 :a server based Coordinator engine:可以运行多个workflow，结构：start->workfows->end
Oozie V1 :a server based workflow engine，结构：start->mr->pig->fork->mr/hive->join->end

workflow

coordinator

记录下踩的
报错
Error: E0505 : E0505: App definition [hdfs://localhost:8020/tmp/oozie-app/coordinator/] does not exist
这个错误信息很坑爹，当时发现其实不是目录不对，是coordinator.xml文件名命名有问题。

准备工作：时区统一

建议采用东八区时间（GMT+0800）

在服务器上，date -R如果显示如下信息，则表示为东八区，如果不是需要设置时区，一般采用北京或者上海的ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
Sat, 30 Sep 2017 10:26:58 +0800

接着去修改oozie-site.xml，如果没有这个属性，就增加

   
      oozie.processing.timezone
      GMT+0800

让界面的时间也显示正确

Examples

Spark Action

workflow spark on yarn

workflow spark on yarn参考官方地址

文件目录结构

├── ooziespark
│   ├── job.properties
│   ├── lib
│   │   └── spark-1.6.2-1.0-SNAPSHOT.jar
│   └── workflow.xml

workflow.xml


  
    
   
      
      ${jobTracker}  
      ${nameNode}  
       
        
        
      ${master}  
      Spark-Wordcount  
      WordCount  
      ${nameNode}/user/LJK/ooziecoor/lib/spark-1.6.2-1.0-SNAPSHOT.jar  
      --driver-memory 512M --executor-memory 512M  
      ${inputdir}  
      ${outputdir} 
      
      
     
    
   
    Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

job.properties

nameNode=hdfs://nn1:8020
jobTracker=rm:8050
master=yarn-cluster
queueName=default
inputdir=/user/LJK/hello-spark
outputdir=/user/LJK/output
oozie.use.system.libpath=true
oozie.wf.application.path=/user/LJK/ooziespark
#oozie.coord.application.path=${nameNode}/user/LJK/ooziespark
#start=2017-09-28T17:00+0800
#end=2017-09-30T17:00+0800
#workflowAppUri=${nameNode}/user/LJK/ooziespark/

打包程序拷贝到app/lib目录下，测试源码以下

object WordCount {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf()
//      .setJars(List("/Users/LJK/Documents/code/github/study-spark1.6.2/target/spark-1.6.2-1.0-SNAPSHOT.jar"))
//      .set("spark.yarn.historyServer.address", "rm:18080")
//      .set("spark.eventLog.enabled", "true")
//      .set("spark.eventLog.dir", "hdfs://nn1:8020/spark-history")
      .set("spark.testing.memory", "1073741824")
    val sc = new SparkContext(conf)
    val rdd = sc.textFile(args(0))
      .flatMap(_.split(" "))
      .map((_, 1))
      .reduceByKey(_ + _)
    rdd.saveAsTextFile(args(1))
    sc.stop()
  }
}

把这个目录上传到HDFS目录，执行命令hdfs dfs -put ooziespark /user/LJK/
注意点：job.properties可以不用上传到HDFS，因为执行命令的时候用的是本地的不是HDFS的

oozie启动job，执行命令
oozie job -oozie http://rm:11000/oozie -config /usr/local/share/applications/ooziespark/job.properties -run
或者
oozie job -config /usr/local/share/applications/ooziespark/job.properties -run
简略版前提是你要配置the env variable 'OOZIE_URL' is used as default value for the '-oozie' option,具体可以用oozie help查看

在oozie界面上查看job执行

Coordinator spark on yarn

简单调度，每五分钟跑一次WordCount

文件目录结构

├── ooziecoor
│   ├── coordinator.xml
│   ├── job.properties
│   ├── lib
│   │   └── spark-1.6.2-1.0-SNAPSHOT.jar
│   └── workflow.xml

coordinator.xml


     
     
         ${workflowAppUri}
         
             
                 jobTracker
                 ${jobTracker}
             
             
                 nameNode
                 ${nameNode}
             
             
                 queueName
                 ${queueName}

修改之前的job.properties，改为

nameNode=hdfs://nn1:8020
jobTracker=rm:8050
master=yarn-cluster
queueName=default
inputdir=/user/LJK/hello-spark
outputdir=/user/LJK/output
oozie.use.system.libpath=true
#oozie.wf.application.path=/user/LJK/ooziespark
oozie.coord.application.path=${nameNode}/user/LJK/ooziecoor
start=2017-09-30T9:30+0800
end=2017-09-30T17:00+0800
workflowAppUri=${nameNode}/user/LJK/ooziecoor

之前的workflow可以直接保留不改jar包位置也是可以的，但为了每个任务更加好看，修改下jar包位置即可

上传到HDFS，并执行命令
oozie job -config /usr/local/share/applications/ooziecoor/job.properties -run

可以在web上查看job

bundle spark on yarn

文件结构

├── ooziebundle
│   ├── bundle.xml
│   ├── coordinator.xml
│   ├── job.properties
│   ├── lib
│   │   └── spark-1.6.2-1.0-SNAPSHOT.jar
│   └── workflow.xml

增加bundle.xml


          
                 ${nameNode}/user/LJK/ooziebundle/coordinator.xml
                 
                     
                         start
                         ${start}
                     
                     
                         end
                         ${end}

修改job.properties

nameNode=hdfs://nn1:8020
jobTracker=rm:8050
master=yarn-cluster
queueName=default
inputdir=/user/LJK/hello-spark
outputdir=/user/LJK/output
oozie.use.system.libpath=true
#oozie.wf.application.path=/user/LJK/ooziespark
#oozie.coord.application.path=${nameNode}/user/LJK/ooziecoor
oozie.bundle.application.path=${nameNode}/user/LJK/ooziebundle
start=2017-09-30T9:30+0800
end=2017-09-30T17:00+0800
workflowAppUri=${nameNode}/user/LJK/ooziebundle

上传到HDFS，并执行命令
oozie job -config /usr/local/share/applications/ooziebundle/job.properties -run

web上查看job

Java Action

文件结构，lib包不是打成一个jar包所以不列出了，你可以选择打成一个jar包

javaExample/
├── job.properties
├── lib
└── workflow.xml

注意
如果你用的是SpringBoot框架，需要在pom上加上exclusions，否则会有jar包冲突，oozie会报错


  org.springframework.boot
  spring-boot-starter
  
      
          spring-boot-starter-logging
          org.springframework.boot

workflow.xml


    
    
        Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
    
        
            ${jobTracker}
            ${nameNode}
            com.sharing.App
            hello
            springboot

job.properties

oozie.use.system.libpath=false
queueName=default
jobTracker=rm.ambari:8050
nameNode=hdfs://nn1.ambari:8020
oozie.wf.application.path=${nameNode}/user/LJK/javaExample

java程序源码

@SpringBootApplication
public class App {

    public static void main(String[] args) {
        SpringApplication.run(App.class,args);
        System.out.println(args[0] + " " + args[1]);
    }
}

Shell Action

文件结构

shell
├── job.properties
└── workflow.xml

workflow.xml


    
    
        Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
    
        
            ${jobTracker}
            ${nameNode}
            echo
              hello shell

job.properties

hue-id-w=50057
jobTracker=rm.ambari:8050
mapreduce.job.user.name=admin
nameNode=hdfs://nn1.ambari:8020
oozie.use.system.libpath=True
oozie.wf.application.path=hdfs://nn1.ambari:8020/user/LJK/shell
user.name=admin

Hive Action

文件结构

hiveExample/
├── hive-site.xml
├── input
│   └── inputdata
├── job.properties
├── output
├── script.q
└── workflow.xml

hive script，写一个hive脚本，文件名自定义，
script.q文件内容

DROP TABLE IF EXISTS test;
CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;

workflow.xml


    
    
        Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
    
        
            ${jobTracker}
            ${nameNode}
            
                  
                  
            
              /user/LJK/hiveExample/hive-site.xml
            
              INPUT=/user/LJK/hiveExample/input
              OUTPUT=/user/LJK/hiveExample/output

job.properties

hue-id-w=50059
jobTracker=rm.ambari:8050
mapreduce.job.user.name=admin
nameNode=hdfs://nn1.ambari:8020
oozie.use.system.libpath=True
oozie.wf.application.path=hdfs://nn1.ambari:8020/user/LJK/hiveExample
user.name=admin

其中hdfs://nn1.ambari:8020/user/LJK/hiveExample/input要放一个文件，文件名自定义，
inputdata文件内容

执行成功后，可以看到output文件夹生成文件000000_0,内容与inputdata内容一致

Hive2 Action

跟Hive Action基本是一样的，只要改动workflow.xml就好


    
    
        Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
    
        
            ${jobTracker}
            ${nameNode}
            
                  
                  
            
              /user/LJK/hiveExample/hive-site.xml
            jdbc:hive2://rm.ambari:10000/default
            
              INPUT=/user/LJK/hiveExample/input
              OUTPUT=/user/LJK/hiveExample/output

Oozie概览

OOZIE概览

简介

主要组件

架构图

三大服务模块

workflow

coordinator

Examples

Spark Action

workflow spark on yarn

Coordinator spark on yarn

bundle spark on yarn

Java Action

Shell Action

Hive Action

Hive2 Action

资源链接

你可能感兴趣的:(oozie)