Oozie是一种Java Web应用程序,它运行在Java servlet容器——即Tomcat——中,并使用数据库来存储以下内容:
(4)CDH 5.7.0中的Oozie
CDH 5.7.0中,Oozie的版本是4.1.0,元数据存储使用MySQL。关于CDH 5.7.0中Oozie的属性,参考以下链接:yarn.nodemanager.resource.memory-mb = 2000
yarn.scheduler.maximum-allocation-mb = 2000
否则会在执行工作流作业时报类似下面的错误:
具体的做法是:
sqoop metastore > /tmp/sqoop_metastore.log 2>&1 &
关于Oozie无法运行Sqoop Job的问题,参考以下链接: http://www.lamborryan.com/oozie-sqoop-fail/
last_value=`sqoop job --show myjob_incremental_import | grep incremental.last.value | awk '{print $3}'`
sqoop job --delete myjob_incremental_import
sqoop job \
--meta-connect jdbc:hsqldb:hsql://cdh2:16000/sqoop \
--create myjob_incremental_import \
-- \
import \
--connect "jdbc:mysql://cdh1:3306/source?useSSL=false&user=root&password=mypassword" \
--table sales_order \
--columns "order_number, customer_number, product_code, order_date, entry_date, order_amount" \
--hive-import \
--hive-table rds.sales_order \
--incremental append \
--check-column order_number \
--last-value $last_value
其中$last-value是上次ETL执行后的值。
${jobTracker}
${nameNode}
import
--connect
jdbc:mysql://cdh1:3306/source?useSSL=false
--username
root
--password
mypassword
--table
customer
--hive-import
--hive-table
rds.customer
--hive-overwrite
/tmp/hive-site.xml#hive-site.xml
/tmp/mysql-connector-java-5.1.38-bin.jar#mysql-connector-java-5.1.38-bin.jar
${jobTracker}
${nameNode}
import
--connect
jdbc:mysql://cdh1:3306/source?useSSL=false
--username
root
--password
mypassword
--table
product
--hive-import
--hive-table
rds.product
--hive-overwrite
/tmp/hive-site.xml#hive-site.xml
/tmp/mysql-connector-java-5.1.38-bin.jar#mysql-connector-java-5.1.38-bin.jar
${jobTracker}
${nameNode}
job --exec myjob_incremental_import --meta-connect jdbc:hsqldb:hsql://cdh2:16000/sqoop
/tmp/hive-site.xml#hive-site.xml
/tmp/mysql-connector-java-5.1.38-bin.jar#mysql-connector-java-5.1.38-bin.jar
${jobTracker}
${nameNode}
/tmp/hive-site.xml
Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
其DAG如下图所示。
hdfs dfs -put -f workflow.xml /user/root/
hdfs dfs -put /etc/hive/conf.cloudera.hive/hive-site.xml /tmp/
hdfs dfs -put /root/mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar /tmp/
hdfs dfs -put /root/regular_etl.sql /tmp/
(7)建立作业属性文件
nameNode=hdfs://cdh2:8020
jobTracker=cdh2:8032
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}
(8)运行工作流
oozie job -oozie http://cdh2:11000/oozie -config /root/job.properties -run
此时从Oozie Web Console可以看到正在运行的作业,如下图所示。
nameNode=hdfs://cdh2:8020
jobTracker=cdh2:8032
queueName=default
oozie.use.system.libpath=true
oozie.coord.application.path=${nameNode}/user/${user.name}
timezone=UTC
start=2016-07-11T06:00Z
end=2020-12-31T07:15Z
workflowAppUri=${nameNode}/user/${user.name}
(2)建立协调作业配置文件
${workflowAppUri}
jobTracker
${jobTracker}
nameNode
${nameNode}
queueName
${queueName}
(3)部署协调作业
hdfs dfs -put -f coordinator.xml /user/root/
(4)运行协调作业
oozie job -oozie http://cdh2:11000/oozie -config /root/job-coord.properties -run
此时从Oozie Web Console可以看到准备运行的协调作业,作业的状态为PREP,如下图所示。