SpringBoot+SparkSQL操作JSON字符串

在SpringBoot中通过maven来做包管理构建,有几个地方需要注意一下的,需要解决包之间的冲突,否则运行时会报错:

(1)sparkSQL中需要先排除两个包:

		
			org.apache.spark
			spark-sql_2.11
			${spark.version}
			
				
					org.codehaus.janino
					janino
				
				
					org.codehaus.janino
					commons-compiler
				
			
		

(2)重新引入:

		
			org.codehaus.janino
			commons-compiler
			2.7.8
		

		
			org.codehaus.janino
			janino
			2.7.8
		

 

 

 

ok,准备工作做完之后,开始代码层面得工作:

主要给大家演示的场景是将json字符串转换成临时表,然后通过sparkSQL操作临时表,非常简单方便:

public class SparkJsonSQL {

    public void Exec(){
        SparkConf conf = new SparkConf();
        conf.setMaster("local[2]").setAppName("jsonRDD");
        JavaSparkContext sc = new JavaSparkContext(conf);
        SQLContext sqlContext = new SQLContext(sc);

        JavaRDD nameRDD = sc.parallelize(Arrays.asList(
                "{\"name\":\"zhangsan\",\"age\":\"18\"}",
                "{\"name\":\"lisi\",\"age\":\"19\"}",
                "{\"name\":\"wangwu\",\"age\":\"20\"}"
        ));
        JavaRDD scoreRDD = sc.parallelize(Arrays.asList(
                "{\"name\":\"zhangsan\",\"score\":\"100\"}",
                "{\"name\":\"lisi\",\"score\":\"200\"}",
                "{\"name\":\"wangwu\",\"score\":\"300\"}"
        ));

        Dataset namedf = sqlContext.read().json(nameRDD);
        Dataset scoredf = sqlContext.read().json(scoreRDD);
        namedf.registerTempTable("name");
        scoredf.registerTempTable("score");

        Dataset result = sqlContext.sql("select name.name,name.age,score.score from name,score where name.name = score.name");
        //Dataset result = sqlContext.sql("select * from name");
        result.show();
        result.foreach(x ->System.out.print(x));
        sc.stop();
    }
}

 

我们将程序运行起来看看效果:

SpringBoot+SparkSQL操作JSON字符串_第1张图片

 

你可能感兴趣的:(大数据,BI,商业智能,kafka,数据流,消息队列,springboot,人工智能,可视化分析)