Oozie调度Spark SQL

说明:oozie action里面目前没有原生的像支持hive action一样的支持spark sql action,不过是支持spark action的,可以根据个人需求来决定是需要用spark submit还是本文介绍的方法。本文的方法的本质是通过oozie调shell action,然后再在shell写成一个spark sql shell 从而达到oozie 直接调spark sql(如果需要Kerberos认证,只需要修改两个地方即可,下面会介绍到)

目录结构

sparksql/
├── job.properties
├── runspark.hql
├── runspark.sh
└── workflow.xml

job.properties

#HDFS地址
nameNode=hdfs://hadoop01:8020
#ResourceManager地址
jobTracker=hadoop03:8032
#队列名称
queueName=default
examplesRoot=oozie-apps
#使用Oozie lib
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/sparksql

runspark.hql

CREATE TABLE oozie.sparksqltest(id int,name string);
INSERT INTO TABLE oozie.sparksqltest VALUES(1,'Daniel');

runspark.sh

/home/hadoop/apps/spark/bin/spark-sql \
--master yarn \
--executor-memory 600M \
--executor-cores 1 \
--total-executor-cores 2 \
--files /home/hadoop/apps/apache-hive-2.3.2-bin/conf/hive-site.xml \
-f "$1"

参数根据实际情况进行配置。这里写成了shell传参的形式,方便多个workflow调用。如果需要Kerberos认证,则在提交命令上加上–principal hadoop与–keytab /etc/security/keytabs/hadoop.keytab参数即可

workflow.xml


	<start to="shell-node"/>
	<action name="shell-node">
		<shell
			xmlns="uri:oozie:shell-action:0.2">
			<job-tracker>${jobTracker}job-tracker>
			<name-node>${nameNode}name-node>
			<configuration>
				<property>
					<name>mapred.job.queue.namename>
					<value>${queueName}value>
				property>
			configuration>
			<exec>runspark.shexec>
			
			<argument>${nameNode}/user/hadoop/oozie-apps/sparksql/runspark.hqlargument>
			
			<file>${nameNode}/user/hadoop/oozie-apps/sparksql/runspark.shfile>
		shell>
		<ok to="end"/>
		<error to="fail"/>
	action>
	
		<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]message>
	kill>
	
workflow-app>

提交命令(若需要Kerberos认证请在最后加上参数-auth Kerberos)

hdfs dfs -rm -r -f /user/hadoop/oozie-apps/sparksql
hdfs dfs -put oozie-apps/sparksql /user/hadoop/oozie-apps/
oozie job -oozie http://hadoop03:11000/oozie -config /home/hadoop/cdh/oozie-4.0.0-cdh5.3.6/oozie-apps/shell/job.properties -run

查看任务状态

oozie job -info $jobid 0000018-200512104107645-Oozie-hado-w

跑完之后可以去hive里面查看一下

select * from oozie.sparksqltest;

你可能感兴趣的:(工具,oozie集成)