Oozie Kerberos环境下提交Spark-Sql任务

spark-sql Oozie暂时没有像hive Action那样调用HiveCli实现类似的原生Action,spark Action实现的是基于spark-submit提交自定义的Jar包,实现Spark-Sql需要考虑的东西比较多后续试试;
此文是基于Shell Action调用spark-sql命令去实现,Shell Action虽然灵活但是对集群内主机的环境要求比较高,这种形式必须保证每个计算节点都正确配置spark-sql客户端。

目录结构

spark-sql/
├── hive-site.xml
├── job.properties
├── oozie.keytab
├── script.q
└── workflow.xml

job.properties定义

[oozie@hadoop01 spark-sql]$ more job.properties 
nameNode=hdfs://beh001
jobTracker=hadoop02.bonc.com:8032
queueName=default
examplesRoot=jobs

oozie.use.system.libpath=true
oozie.credentials.skip=false
oozie.wf.application.path=${nameNode}/${examplesRoot}/spark-sql

workflow.xml 定义


    
    
        
            ${jobTracker}
            ${nameNode}
            
                
                    mapred.job.queue.name
                    ${queueName}
                
            
        spark-sql
        --master
        yarn
        --files
        hive-site.xml
        --principal
        oozie
        --keytab
        oozie.keytab
        -f
        script.q
        script.q
        hive-site.xml
        oozie.keytab
        
        
        
    
    
        Workflow failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}]
        
    
    

测试sql脚本

[oozie@hadoop01 spark-sql]$ more script.q 
use tpcds_parquet_2;
DROP TABLE IF EXISTS oozie.demo;
create table oozie.demo as 
select  dt.d_year 
       ,item.i_brand_id brand_id 
       ,item.i_brand brand
       ,sum(ss_ext_sales_price) sum_agg
 from  date_dim dt 
      ,store_sales
      ,item
 where dt.d_date_sk = store_sales.ss_sold_date_sk
   and store_sales.ss_item_sk = item.i_item_sk
   and item.i_manufact_id = 436
   and dt.d_moy=12
 group by dt.d_year
      ,item.i_brand
      ,item.i_brand_id
 order by dt.d_year
         ,sum_agg desc
         ,brand_id
 limit 100;

提交测试

#上传hdfs
[oozie@hadoop01 spark-sql]$ hdfs dfs -put ../spark-sql/ /jobs/

#提交任务
[oozie@hadoop01 spark-sql]$ oozie job -run -oozie http://hadoop01.bonc.com:11000/oozie -config job.properties -verbose -debug -auth Kerberos
 Auth type : Kerberos
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/beh/core/oozie/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/beh/core/oozie/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/beh/core/oozie/libext/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
POST http://hadoop01.bonc.com:11000/oozie/v2/jobs?action=start

user.nameoozie
oozie.use.system.libpathtrue
oozie.credentials.skipfalse
oozie.wf.application.path${nameNode}/${examplesRoot}/spark-sql
queueNamedefault
nameNodehdfs://beh001
jobTrackerhadoop02.bonc.com:8032
examplesRootjobs

job: 0000019-191015105059109-oozie-oozi-W

#查看任务状态
[oozie@hadoop01 spark-sql]$ oozie job -info    0000019-191015105059109-oozie-oozi-W
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/beh/core/oozie/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/beh/core/oozie/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/beh/core/oozie/libext/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job ID : 0000019-191015105059109-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : shell-wf
App Path      : hdfs://beh001/jobs/spark-sql
Status        : SUCCEEDED
Run           : 0
User          : oozie
Group         : -
Created       : 2019-10-15 11:17 GMT
Started       : 2019-10-15 11:17 GMT
Last Modified : 2019-10-15 11:18 GMT
Ended         : 2019-10-15 11:18 GMT
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code  
------------------------------------------------------------------------------------------------------------------------------------
0000019-191015105059109-oozie-oozi-W@:start:                                  OK        -                      OK         -         
------------------------------------------------------------------------------------------------------------------------------------
0000019-191015105059109-oozie-oozi-W@shell-node                               OK        job_1569636947511_0086 SUCCEEDED  -         
------------------------------------------------------------------------------------------------------------------------------------
0000019-191015105059109-oozie-oozi-W@end                                      OK        -                      OK         -         
------------------------------------------------------------------------------------------------------------------------------------

你可能感兴趣的:(Oozie Kerberos环境下提交Spark-Sql任务)