本文包含内容:
一、ooize使用sqoop将oracle导入到hdfs
二、ooize串行定时任务
三、ooize并行定时任务
四、遇到的问题
一、ooize使用sqoop将oracle表导入到hdfs
此处在ooize的lib文件夹下需要oracle的OJDBC驱动包, 不然会报错
workflow.xml文件
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
oozie.sqoop.log.level
WARN
sqoop import --connect jdbc:oracle:thin:@***.***.**.***:1521:orcl --username ** --password ** --table ** --delete-target-dir --target-dir /yss/guzhi/**/** --m 1
Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
job.properties文件
nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=wmz_test
oozie.libpath=hdfs://bj-rack001-hadoop002:8020/user/oozie/share/lib/sqoop
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/tmp/oracle2hdfs
二、ooize串行定时任务
当需求需要导入导出多表或者多个操作时,可以添加多个action, 将多个命令放入一个command或者将多个command写入一个action都会报错
workflow.xml文件 首先通过shell脚本获取当前日期, 再赋值给sqoop的命令, 以当天日期建立文件夹
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
${shell}
${nameNode}/tmp/oracle2hdfs/${shell}#${shell}
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/guzhi/**/${wf:actionData('shell-node')['day']}/LSETLIST/ --delete-target-dir --m 1
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/**/**/${wf:actionData('shell-node')['day']}/CSGDZH --delete-target-dir --m 1
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/**/**/${wf:actionData('shell-node')['day']}/CSQSXW --delete-target-dir --m 1
Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
coordinator.xml文件 这里设置的是12小时跑一次
${nameNode}/tmp/oracle2hdfs/workflow.xml
jobTracker
${jobTracker}
nameNode
${nameNode}
queueName
${queueName}
shell 获取当天日期
#!/bin/sh
day=`date '+%Y%m%d'`
echo "day:$day"
job.properties
nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=examples
oozie.service.coord.check.maximum.frequency=false
oozie.coord.application.path=${nameNode}/tmp/oozietest/
start=2018-09-11T16:00Z
end=2018-09-11T16:00Z
workflowAppUri=${oozie.coord.application.path}
因为设置的GML时间, 所以时间上要北京时间-8小时
三、ooize并行任务
当串行action过多时会导致效率过慢,此时可以设置并行执行
这里并行执行用到了bundle组建
workflow1.xml
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
${shell}
${nameNode}/tmp/oracle2hdfs/${shell}#${shell}
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username ***--password ***--table LSETLIST --target-dir /yss/guzhi/***/${wf:actionData('shell-node')['day']}/LSETLIST/ --delete-target-dir --m 1
Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
workflow2.xml
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
${shell}
${nameNode}/tmp/oracle2hdfs/${shell}#${shell}
${jobTracker}
${nameNode}
mapred.job.queue.name
${queueName}
import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username ***--password ***--table CSGDZH --target-dir /yss/guzhi/***/${wf:actionData('shell-node')['day']}/CSGDZH --delete-target-dir --m 1
Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
workflow3.xml等以此类推
coordinate1.xml
${workflowAppUri1}
jobTracker
${jobTracker}
nameNode
${nameNode}
queueName
${queueName}
coordinate2.xml
${workflowAppUri2}
jobTracker
${jobTracker}
nameNode
${nameNode}
queueName
${queueName}
corrdinate3.xml等以此类推
bundle.xml
${coordinator1}
${coordinator2}
job.properties
nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=wmz_test
oozie.libpath=hdfs://bj-rack001-hadoop002:8020/user/oozie/share/lib/sqoop
oozie.use.system.libpath=true
#oozie.wf.application.path=${nameNode}/tmp/oracle2hdfs
shell=getDate.sh
oozie.bundle.application.path=${nameNode}/tmp/oracle2hdfs/bundle.xml
oozie.service.coord.check.maximum.frequency=false
#oozie.coord.application.path=${nameNode}/tmp/bundleTest
start=2018-09-10T16:00Z
end=2028-09-10T16:00Z
workflowAppUri1=${nameNode}/tmp/oracle2hdfs/workflow1.xml
workflowAppUri2=${nameNode}/tmp/oracle2hdfs/workflow2.xml
coordinator1=${nameNode}/tmp/oracle2hdfs/coordinator1.xml
coordinator2=${nameNode}/tmp/oracle2hdfs/coordinator2.xml
oozie job -oozie http://***.***.***.***:11000/oozie -config /data/temp/wmz/shelltest/job.properties -run 执行任务
四、遇到的问题
1、脚本文件起始 若#!/bin/bash无法执行报错,可写为#!/bin/sh
2、之前试过将sqoop操作写入shell, 使用ooize执行shell操作sqoop, 但是shell中的sqoop只能做到list-tables和list-databases,各种import命令都无法执行,至今不知道是什么原因, 单独执行脚本也可以执行, 单独用ooize执行shell和单独用ooize执行sqoop import操作都没问题, 但是结合起来就不行, 很诧异。