Ooize 串行并行(bundle)定时任务 使用总结(sqoop + shell)

本文包含内容:

一、ooize使用sqoop将oracle导入到hdfs

二、ooize串行定时任务

三、ooize并行定时任务

四、遇到的问题

 

一、ooize使用sqoop将oracle表导入到hdfs

此处在ooize的lib文件夹下需要oracle的OJDBC驱动包, 不然会报错

workflow.xml文件


    
    
        
            ${jobTracker}
            ${nameNode}
        
                
                  mapred.job.queue.name
                  ${queueName}
                
                
                  oozie.sqoop.log.level
                  WARN
                
            
        sqoop import --connect jdbc:oracle:thin:@***.***.**.***:1521:orcl --username ** --password ** --table ** --delete-target-dir --target-dir /yss/guzhi/**/** --m 1
    
        
        
    

    
        Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
    

job.properties文件

nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=wmz_test
oozie.libpath=hdfs://bj-rack001-hadoop002:8020/user/oozie/share/lib/sqoop
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/tmp/oracle2hdfs

二、ooize串行定时任务

当需求需要导入导出多表或者多个操作时,可以添加多个action, 将多个命令放入一个command或者将多个command写入一个action都会报错

workflow.xml文件 首先通过shell脚本获取当前日期, 再赋值给sqoop的命令, 以当天日期建立文件夹




	
		
			${jobTracker}
			${nameNode}
			
				
					mapred.job.queue.name
					${queueName}
				
			
			${shell}
			${nameNode}/tmp/oracle2hdfs/${shell}#${shell}
			
		
		
		
	
    
        
            ${jobTracker}
            ${nameNode}
            
                
                    mapred.job.queue.name
                    ${queueName}
                
            
            import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/guzhi/**/${wf:actionData('shell-node')['day']}/LSETLIST/ --delete-target-dir --m 1 
        
        
        
    


        
            ${jobTracker}
            ${nameNode}
          
            
                
                    mapred.job.queue.name
                    ${queueName}
                
            
            import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/**/**/${wf:actionData('shell-node')['day']}/CSGDZH --delete-target-dir --m 1 
        
        
        
    


        
            ${jobTracker}
            ${nameNode}
          
            
                
                    mapred.job.queue.name
                    ${queueName}
                
            
            import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/**/**/${wf:actionData('shell-node')['day']}/CSQSXW --delete-target-dir --m 1 
        
        
        
    
    
        Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
   

coordinator.xml文件  这里设置的是12小时跑一次



    
        ${nameNode}/tmp/oracle2hdfs/workflow.xml


jobTracker
${jobTracker}


nameNode
${nameNode}


queueName
${queueName}




shell  获取当天日期

#!/bin/sh
day=`date '+%Y%m%d'`
echo "day:$day"

job.properties

nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=examples


oozie.service.coord.check.maximum.frequency=false
oozie.coord.application.path=${nameNode}/tmp/oozietest/
start=2018-09-11T16:00Z
end=2018-09-11T16:00Z
workflowAppUri=${oozie.coord.application.path}

因为设置的GML时间, 所以时间上要北京时间-8小时

三、ooize并行任务

当串行action过多时会导致效率过慢,此时可以设置并行执行

这里并行执行用到了bundle组建

workflow1.xml 




        
                
                        ${jobTracker}
                        ${nameNode}
                        
                                
                                        mapred.job.queue.name
                                        ${queueName}
                                
                        
                        ${shell}
                        ${nameNode}/tmp/oracle2hdfs/${shell}#${shell}
                        
                
                
                
        
         
        
            ${jobTracker}
            ${nameNode}
            
                
                    mapred.job.queue.name
                    ${queueName}
                
            
 import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username ***--password ***--table LSETLIST --target-dir /yss/guzhi/***/${wf:actionData('shell-node')['day']}/LSETLIST/ --delete-target-dir --m 1 
        
        
        
    
     
        Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
   


workflow2.xml




        
                
                        ${jobTracker}
                        ${nameNode}
                        
                                
                                        mapred.job.queue.name
                                        ${queueName}
                                
                        
                        ${shell}
                        ${nameNode}/tmp/oracle2hdfs/${shell}#${shell}
                        
                
                
                
        
        
        
            ${jobTracker}
            ${nameNode}

            
                
                    mapred.job.queue.name
                    ${queueName}
                
            
            import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username ***--password ***--table CSGDZH --target-dir /yss/guzhi/***/${wf:actionData('shell-node')['day']}/CSGDZH --delete-target-dir --m 1 
        
        
        
    

     
        Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
   


workflow3.xml等以此类推

 

coordinate1.xml




${workflowAppUri1}


jobTracker
${jobTracker}


nameNode
${nameNode}


queueName
${queueName}





coordinate2.xml




${workflowAppUri2}


jobTracker
${jobTracker}


nameNode
${nameNode}


queueName
${queueName}





corrdinate3.xml等以此类推

bundle.xml



          
                 ${coordinator1}
          

          
                 ${coordinator2}
          

job.properties

nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=wmz_test
oozie.libpath=hdfs://bj-rack001-hadoop002:8020/user/oozie/share/lib/sqoop
oozie.use.system.libpath=true
#oozie.wf.application.path=${nameNode}/tmp/oracle2hdfs
shell=getDate.sh

oozie.bundle.application.path=${nameNode}/tmp/oracle2hdfs/bundle.xml

oozie.service.coord.check.maximum.frequency=false
#oozie.coord.application.path=${nameNode}/tmp/bundleTest
start=2018-09-10T16:00Z
end=2028-09-10T16:00Z

workflowAppUri1=${nameNode}/tmp/oracle2hdfs/workflow1.xml
workflowAppUri2=${nameNode}/tmp/oracle2hdfs/workflow2.xml

coordinator1=${nameNode}/tmp/oracle2hdfs/coordinator1.xml
coordinator2=${nameNode}/tmp/oracle2hdfs/coordinator2.xml

oozie job -oozie http://***.***.***.***:11000/oozie -config /data/temp/wmz/shelltest/job.properties -run 执行任务

四、遇到的问题

1、脚本文件起始 若#!/bin/bash无法执行报错,可写为#!/bin/sh

2、之前试过将sqoop操作写入shell, 使用ooize执行shell操作sqoop, 但是shell中的sqoop只能做到list-tables和list-databases,各种import命令都无法执行,至今不知道是什么原因, 单独执行脚本也可以执行, 单独用ooize执行shell和单独用ooize执行sqoop import操作都没问题, 但是结合起来就不行, 很诧异。

你可能感兴趣的:(ooize)