1.准备数据以json格式为例
student表
{"sid":"S001","sanme":"zhangsan","age":"12","gender":"female"}
{"sid":"S002","sanme":"lisi","age":"13","gender":"male"}
{"sid":"S003","sanme":"wangwu","age":"14","gender":"male"}{"sid":"S004","sanme":"zhaoliu","age":"15","gender":"female"}
course表
{"cid":"C001","cname":"football"}
{"cid":"C002","cname":"music"}
{"cid":"C003","cname":"art"}
student_course表
{"id":"1","sid":"S001","cid":"C001"}
{"id":"2","sid":"S002","cid":"C001"}
{"id":"3","sid":"S002","cid":"C002"}
{"id":"4","sid":"S003","cid":"C003"}
{"id":"5","sid":"S003","cid":"C001"}
{"id":"6","sid":"S004","cid":"C003"}
{"id":"7","sid":"S004","cid":"C002"}
2.打开spark客户端
命令:./spark-shell --master =local
这种创建的表是永久表,前后三个双引号是因为里面有字符串,这样可以实现转义,这些表是存在spark的一个内置数据库(derby数据库)当中。
sqlContext.sql("""create table student_course using org.apache.spark.sql.json options(path "hdfs://192.168.1.110:9000/data/sparksql/student_course.json")""")
sqlContext.sql("""create table course using org.apache.spark.sql.json options(path "hdfs://192.168.1.110:9000/data/sparksql/course.json")""")
sqlContext.sql("""create table student using org.apache.spark.sql.json options(path "hdfs://192.168.1.110:9000/data/sparksql/student.json")""")
查看表
sqlContext.sql("show tables").show()
实现关联查询:
sqlContext.sql("select * from course c,student s,student_course sc where c.cid=sc.cid and s.sid=sc.sid").show()
如下数据:
+----+--------+---+------+--------+----+----+---+----+
| cid| cname|age|gender| sanme| sid| cid| id| sid|
+----+--------+---+------+--------+----+----+---+----+
|C001|football| 12|female|zhangsan|S001|C001| 1|S001|
|C001|football| 13| male| lisi|S002|C001| 2|S002|
|C001|football| 14| male| wangwu|S003|C001| 5|S003|
|C002| music| 13| male| lisi|S002|C002| 3|S002|
|C003| art| 14| male| wangwu|S003|C003| 4|S003|
+----+--------+---+------+--------+----+----+---+----+