Spark通过Java Web提交任务

相关软件版本:

Spark1.4.1 ,Hadoop2.6,Scala2.10.5 , MyEclipse2014,intelliJ IDEA14,JDK1.8,Tomcat7

机器:

windows7 (包含JDK1.8,MyEclipse2014,IntelliJ IDEA14,TOmcat7);

centos6.6虚拟机(Hadoop伪分布式集群,Spark standAlone集群,JDK1.8);

centos7虚拟机(Tomcat,JDK1.8);

1. 场景:
1. windows简单java程序调用Spark,执行Scala开发的Spark程序,这里包含两种模式:

    1> 提交任务到Spark集群,使用standAlone模式执行;

    2> 提交任务到Yarn集群,使用yarn-client的模式;

2. windows 开发java web程序调用Spark,执行Scala开发的Spark程序,同样包含两种模式,参考1.

3. linux运行java web程序调用Spark,执行Scala开发的Spark程序,包含两种模式,参考1.



2. 实现:
1. 简单Scala程序,该程序的功能是读取HDFS中的log日志文件,过滤log文件中的WARN和ERROR的记录,最后把过滤后的记录写入到HDFS中,代码如下:


[Bash shell]  纯文本查看  复制代码
01 import org.apache.spark.{SparkConf, SparkContext}
02  
03  
04 /**
05  * Created by Administrator on 2015/8/23.
06  */
07 object Scala_Test {
08   def main(args:Array[String]): Unit ={
09     if(args.length!=2){
10       System.err.println("Usage:Scala_Test <input> <output>")
11     }
12     // 初始化SparkConf
13     val conf = new SparkConf().setAppName("Scala filter")
14     val sc = new SparkContext(conf)
15  
16     //  读入数据
17     val lines = sc.textFile(args(0))
18  
19     // 转换
20     val errorsRDD = lines.filter(line => line.contains("ERROR"))
21     val warningsRDD = lines.filter(line => line.contains("WARN"))
22     val  badLinesRDD = errorsRDD.union(warningsRDD)
23  
24     // 写入数据
25     badLinesRDD.saveAsTextFile(args(1))
26  
27     // 关闭SparkConf
28     sc.stop()
29   }
30 }



使用IntelliJ IDEA 并打成jar包备用(lz这里命名为spark_filter.jar);

2.  java调用spark_filter.jar中的Scala_Test 文件,并采用Spark standAlone模式

java代码如下:

[Java]  纯文本查看  复制代码
01 package test;
02  
03 import java.text.SimpleDateFormat;
04 import java.util.Date;
05  
06 import org.apache.spark.deploy.SparkSubmit;
07 /**
08  * @author fansy
09  *
10  */
11 public class SubmitScalaJobToSpark {
12  
13     public static void main(String[] args) {
14         SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-hh-mm-ss");
15         String filename = dateFormat.format(new Date());
16         String tmp=Thread.currentThread().getContextClassLoader().getResource("").getPath();
17         tmp =tmp.substring(0, tmp.length()-8);
18         String[] arg0=new String[]{
19                 "--master","spark://node101:7077",
20                 "--deploy-mode","client",
21                 "--name","test java submit job to spark",
22                 "--class","Scala_Test",
23                 "--executor-memory","1G",
24 //              "spark_filter.jar",
25                 tmp+"lib/spark_filter.jar",//
26                 "hdfs://node101:8020/user/root/log.txt",
27                 "hdfs://node101:8020/user/root/badLines_spark_"+filename
28         };
29          
30         SparkSubmit.main(arg0);
31     }
32 }


具体操作,使用MyEclipse新建java web工程,把spark_filter.jar 以及spark-assembly-1.4.1-hadoop2.6.0.jar(该文件在Spark压缩文件的lib目录中,同时该文件较大,拷贝需要一定时间) 拷贝到WebRoot/WEB-INF/lib目录。(注意:这里可以直接建立java web项目,在测试java调用时,直接运行java代码即可,在测试web项目时,开启tomcat即可)
java调用spark_filter.jar中的Scala_Test 文件,并采用Yarn模式。采用Yarn模式,不能使用简单的修改master为“yarn-client”或“yarn-cluster”,在使用Spark-shell或者spark-submit的时候,使用这个,同时配置HADOOP_CONF_DIR路径是可以的,但是在这里,读取不到HADOOP的配置,所以这里采用其他方式,使用yarn.Clent提交的方式,java代码如下:


[Java]  纯文本查看  复制代码
01 package test;
02  
03 import java.text.SimpleDateFormat;
04 import java.util.Date;
05  
06 import org.apache.hadoop.conf.Configuration;
07 import org.apache.spark.SparkConf;
08 import org.apache.spark.deploy.yarn.Client;
09 import org.apache.spark.deploy.yarn.ClientArguments;
10  
11 public class SubmitScalaJobToYarn {
12  
13     public static void main(String[] args) {
14         SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-hh-mm-ss");
15         String filename = dateFormat.format(new Date());
16         String tmp=Thread.currentThread().getContextClassLoader().getResource("").getPath();
17         tmp =tmp.substring(0, tmp.length()-8);
18         String[] arg0=new String[]{
19                 "--name","test java submit job to yarn",
20                 "--class","Scala_Test",
21                 "--executor-memory","1G",
22 //              "WebRoot/WEB-INF/lib/spark_filter.jar",//
23                 "--jar",tmp+"lib/spark_filter.jar",//
24                  
25                 "--arg","hdfs://node101:8020/user/root/log.txt",
26                 "--arg","hdfs://node101:8020/user/root/badLines_yarn_"+filename,
27                 "--addJars","hdfs://node101:8020/user/root/servlet-api.jar",//
28                 "--archives","hdfs://node101:8020/user/root/servlet-api.jar"//
29         };
30          
31 //      SparkSubmit.main(arg0);
32         Configuration conf = new Configuration();
33         String os = System.getProperty("os.name");
34         boolean cross_platform =false;
35         if(os.contains("Windows")){
36             cross_platform = true;
37         }
38         conf.setBoolean("mapreduce.app-submission.cross-platform", cross_platform);// 配置使用跨平台提交任务
39         conf.set("fs.defaultFS""hdfs://node101:8020");// 指定namenode
40         conf.set("mapreduce.framework.name","yarn"); // 指定使用yarn框架
41         conf.set("yarn.resourcemanager.address","node101:8032"); // 指定resourcemanager
42         conf.set("yarn.resourcemanager.scheduler.address""node101:8030");// 指定资源分配器
43         conf.set("mapreduce.jobhistory.address","node101:10020");
44          
45          System.setProperty("SPARK_YARN_MODE""true");
46  
47          SparkConf sparkConf = new SparkConf();
48          ClientArguments cArgs = new ClientArguments(arg0, sparkConf);
49          
50         new Client(cArgs,conf,sparkConf).run();
51     }
52 }


SparkServlet如下:
[Java]  纯文本查看  复制代码
01 package servlet;
02  
03 import java.io.IOException;
04 import java.io.PrintWriter;
05  
06 import javax.servlet.ServletException;
07 import javax.servlet.http.HttpServlet;
08 import javax.servlet.http.HttpServletRequest;
09 import javax.servlet.http.HttpServletResponse;
10  
11 import test.SubmitScalaJobToSpark;
12  
13 public class SparkServlet extends HttpServlet {
14  
15     /**
16      * Constructor of the object.
17      */
18     public SparkServlet() {
19         super();
20     }
21  
22     /**
23      * Destruction of the servlet. <br>
24      */
25     public void destroy() {
26         super.destroy(); // Just puts "destroy" string in log
27         // Put your code here
28     }
29  
30     /**
31      * The doGet method of the servlet. <br>
32      *
33      * This method is called when a form has its tag value method equals to get.
34      *
35      * @param request the request send by the client to the server
36      * @param response the response send by the server to the client
37      * @throws ServletException if an error occurred
38      * @throws IOException if an error occurred
39      */
40     public void doGet(HttpServletRequest request, HttpServletResponse response)
41             throws ServletException, IOException {
42  
43         this.doPost(request, response);
44     }
45  
46     /**
47      * The doPost method of the servlet. <br>
48      *
49      * This method is called when a form has its tag value method equals to post.
50      *
51      * @param request the request send by the client to the server
52      * @param response the response send by the server to the client
53      * @throws ServletException if an error occurred
54      * @throws IOException if an error occurred
55      */
56     public void doPost(HttpServletRequest request, HttpServletResponse response)
57             throws ServletException, IOException {
58         System.out.println("开始SubmitScalaJobToSpark调用......");
59         SubmitScalaJobToSpark.main(null);
60         //YarnServlet也只是这里不同
61         System.out.println("完成SubmitScalaJobToSpark调用!");
62         response.setContentType("text/html");
63         PrintWriter out = response.getWriter();
64         out.println("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">");
65         out.println("<HTML>");
66         out.println("  <HEAD><TITLE>A Servlet</TITLE></HEAD>");
67         out.println("  <BODY>");
68         out.print("    This is ");
69         out.print(this.getClass());
70         out.println(", using the POST method");
71         out.println("  </BODY>");
72         out.println("</HTML>");
73         

你可能感兴趣的:(Spark通过Java Web提交任务)