Spark读取JDBC中的数据(以MySQL为例)为DataFrame,有两种方式。
//聚合的时候默认分区是200,可以在此设置
val spark = SparkSession.builder().master("local").appName("schema")
.config("spark.sql.shuffle.partitions",1).getOrCreate()
val properties = new Properties()
//设置用户名、密码
properties.setProperty("user","root")
properties.setProperty("password","123")
//读取mysql中的person表
val personDF= spark.read.jdbc("jdbc:mysql://192.168.16.11:3306/spark","person",properties)
//多表关联查询,一定要给别名
// val personDF= spark.read.jdbc("jdbc:mysql://192.168.16.11:3306/spark",
// "(select person.id,person.name,person.age,score.score from person,score where person.id=score.id) T",properties)
map集合存着登录连接mysql的库,mysql启动的驱动,和用户名密码,具体哪张表。在option里仍进去map。
val map = Map[String, String](
"url" -> "jdbc:mysql://192.168.16.11:3306/spark",
"driver" -> "com.mysql.jdbc.Driver",
"user" -> "root",
"password" -> "123",
"dbtable" -> "score"
)
val dataFrame = spark.read.format("jdbc").options(map).load()
把启动所需的属性分别.option写出来
val reader: DataFrameReader = spark.read.format("jdbc")
.option("url", "jdbc:mysql://192.168.16.11:3306/spark")
.option("driver", "com.mysql.jdbc.Driver")
.option("user", "root")
.option("password", "123")
.option("dbtable", "score")
val dataFrame = reader.load()
将以上两张表注册临时表,关联查询
person.createOrReplaceTempView("person")
score2.createOrReplaceTempView("score")
val result = spark.sql("select person.id,person.name,person.age,score.score from person ,score where person.id = score.id")
将关联查询后的结果保存到mysql表中:
result.write.mode(SaveMode.Append).jdbc("jdbc:mysql://192.168.16.111:3306/spark", "result", properties)