MySQL经典50题
目录
连接hive数据库
1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数
2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数
3、查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩
4、查询平均成绩小于60分的同学的学生编号和学生姓名和平均成绩
5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩
6、查询"李"姓老师的数量
7、查询学过"李四"老师授课的同学的信息
8、查询没有学过"李四"老师授课的同学的信息
9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息
10、查询学过编号为"01"并且没有学过编号为"02"的课程的同学的信息
11、查询没有学全所有课程的同学的信息
12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息
13、查询和"01"号的同学学习的课程完全相同的其他同学的信息
14、查询没学过"张三"老师讲授的任一门课程的学生姓名
15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩
16、检索"01"课程分数小于60,按分数降序排列的学生信息
17、按平均成绩从高到低显示所有学生的所有课程的成绩以及平均成绩
18、查询各科成绩最高分、最低分和平均分:以如下形式显示:课程ID,课程name,最高分,最低分,平均分,及格率,中等率,优良率,优秀率 及格为>=60,中等为:70-80,优良为:80-90,优秀为:>=90
19、按各科成绩进行排序,并显示排名
20、查询学生的总成绩并进行排名
21、查询不同老师所教不同课程平均分从高到低显示
22、查询所有课程的成绩第2名到第3名的学生信息及该课程成绩
23、统计各科成绩各分数段人数:课程编号,课程名称, 100-85 , 85-70 , 70-60 , 0-60 及所占百分比
24、查询学生平均成绩及其名次
25、查询各科成绩前三名的记录
26、查询每门课程被选修的学生数
27、查询出只有两门课程的全部学生的学号和姓名
28、查询男生、女生人数
29、查询名字中含有"风"字的学生信息
30、查询同名同性学生名单,并统计同名人数
31、查询1990年出生的学生名单(注:Student表中Sage列的类型是datetime)
32、查询每门课程的平均成绩,结果按平均成绩降序排列,平均成绩相同时,按课程编号升序排列
33、查询平均成绩大于等于85的所有学生的学号、姓名和平均成绩
34、查询课程名称为"数学",且分数低于60的学生姓名和分数
35、查询所有学生的课程及分数情况
36、查询任何一门课程成绩在70分以上的姓名、课程名称和分数
37、查询课程不及格的学生
38、查询课程编号为01且课程成绩在80分以上的学生的学号和姓名
39、求每门课程的学生人数
40、查询选修"张三"老师所授课程的学生中,成绩最高的学生信息及其成绩
41、查询不同课程成绩相同的学生的学生编号、课程编号、学生成绩
42、查询每门功课成绩最好的前两名
43、统计每门课程的学生选修人数(超过5人的课程才统计)。要求输出课程号和选修人数, 查询结果按人数降序排列,若人数相同,按课程号升序排列
44、检索至少选修两门课程的学生学号
45、查询选修了全部课程的学生信息
46、查询各学生的年龄
47、查询本周过生日的学生
48、查询下周过生日的学生
49、查询本月过生日的学生
50、查询下月过生日的学生
写在前面:例行佘赞~
import java.util.Properties
import org.apache.spark.sql.SparkSession
object SparkSQL50 {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().appName("SQL50").master("local")
.getOrCreate()
import spark.implicits._
import org.apache.spark.sql.functions._
val driver = "com.mysql.cj.jdbc.Driver"
val user = "root"
val pwd = "123123"
val url = "jdbc:mysql://192.168.153.134:3306/school"
val properties = new Properties()
properties.setProperty("driver",driver)
properties.setProperty("user",user)
properties.setProperty("password",pwd)
val course = spark.read.jdbc(url,"course",properties)
val sc = spark.read.jdbc(url,"sc",properties)
val student = spark.read.jdbc(url,"student",properties)
val teacher = spark.read.jdbc(url,"teacher",properties)
course.printSchema()
sc.printSchema()
student.printSchema()
teacher.printSchema()
spark.stop()
}
}
sc.as("s1").join(sc.as("s2"),"SID")
.where("s1.CID=01 and s2.CID=02 and s1.score>s2.score")
.join(student,"SID")
.show()
sc.as("s1").join(sc.as("s2"),"SID")
.where("s1.CID=01 and s2.CID=02 and s1.score
sc.as("s1")
.groupBy("sid")
.avg("s1.score")
.join(student,"sid")
.where($"avg(score)">60)
.show()
需要注意的时,条件判断的时候,$"avg(score)" isNull这种写法时错误的,返回值类型为布尔类型,返回条件不匹配,会报错。要用$"avg(score).isNull"的方法。
// 第一步:求出所有学生的平均成绩
sc.as("s1").groupBy("sid").avg("s1.score")
// 第二步:平均成绩join学生信息表,主表为学生表,输出所有学生信息
student.as("stu")
.join(sc.as("s1").groupBy("sid").avg("s1.score"),student("sid")===sc("sid"),"left_outer")
.show()
// 第三步:加上条件判断
student.as("stu")
.join(sc.as("s1").groupBy("sid").avg("s1.score"),student("sid")===sc("sid"),"left_outer")
.where($"avg(score)" <60 || $"avg(score)".isNull)
.show()
student.as("stu")
.join(sc.as("s1").groupBy("sid").sum("score"),Seq("sid"),"left_outer")
.join(sc.as("s1").groupBy("sid").count(),Seq("sid"),"left_outer")
.show()
val i = teacher.as("t")
.where($"Tname".like("李%"))
.count()
print(i)
teacher.join(course,"TID")
.where($"Tname"==="李四")
.join(sc,"CID")
.join(student,"SID")
.show()
student.as("s1").join(
student.join(sc,"sid")
.join(course,"cid")
.join(teacher,"tid")
.where($"Tname".equalTo("李四")).as("s2")
,Seq("sid"),"left_outer")
.where($"score" isNull)
.show()
student.join(sc,"sid")
.where($"cid".equalTo("01"))
.join(student.join(sc,"sid")
.where($"cid".equalTo("02")),"sid")
.show()
student.as("s1").join(sc,"sid")
.where($"cid".equalTo("01"))
.join(student.as("s2").join(sc,"sid")
.where($"cid".equalTo("02")),Seq("sid"),"left_outer")
.where($"s2.Sname".isNull)
.show()
student.join(
student.join(sc,"sid")
.join(course.as("c"),"cid")
.groupBy("sid")
.count()
,Seq("sid"),"left_outer")
.where($"count"
student.join(sc,"sid").as("s1").join(
student.as("s2")
.join(sc,"sid")
.where($"s2.sid"==="01")
,"cid")
.select("s1.SID","s1.Sname","s1.Sage","s1.Ssex")
.distinct()
.show()
student.as("s1").join(sc,"sid")
.join(student.as("s2"),Seq("sid"),"left_outer").where($"s1.sid".notEqual("01"))
.groupBy($"s1.sid").count().where($"count".equalTo(3))
.join(student,"sid")
.show()
student.as("s1").join(
student.join(sc,"sid").join(course,"cid").join(teacher,"tid")
.where($"tname".equalTo("张三")).as("s2")
,Seq("sid"),"left_outer")
.where($"s2.tid".isNull)
.select("s1.sid","s1.sname","s1.sage","s1.ssex")
.show()
sc.where($"score"<60).groupBy("sid").count().where($"count">=2).as("s1")
.join(
sc.groupBy("sid").avg("score"),"sid"
)
.join(student,"sid")
.show()
orderBy函数和sort函数都可以实现对固定列的排序,在只有一个reduce的时候,sort by和order by发挥的作用是一致的。
sc.where($"cid".equalTo("01")).where($"score"<60)
.join(student,"sid")
.orderBy($"score".desc)
.show()
sc.as("s1")
.join(sc.groupBy("sid").avg("score").as("s2"),"sid")
.join(student,"sid")
.orderBy($"s2.avg(score)".desc)
.show()
1)根据题目要求,课程ID和课程name可以通过sc表和course表联表得到
2)最高/低/平均分比较简单,分组后求值就可以
3)及格率需要先count各分组总人数all,接着count分组中符合条件pass的个数
4)将pass列的值 / all 列的值。两列操作可以使用withColumn
// 第一步:按照课程分组,求及格的人数
val dfcount = sc.groupBy("cid").count()
val dfpas = sc.where($"score">=60).groupBy("cid").count().withColumnRenamed("count","pas")
val dfmid = sc.where($"score">=70 && $"score"<80).groupBy("cid").count().withColumnRenamed("count","mid")
val dfpre = sc.where($"score">=80 && $"score"<90).groupBy("cid").count().withColumnRenamed("count","pre")
val dfsup = sc.where($"score">=90).groupBy("cid").count().withColumnRenamed("count","sup")
// 第二步:两列合并,及格人数/总人数,求及格率
val pas = dfcount.join(dfpas,"cid").withColumn("pas",col("pas")/col("count")).select("cid","pas")
val mid = dfcount.join(dfmid,"cid").withColumn("mid",col("mid")/col("count")).select("cid","mid")
val pre = dfcount.join(dfpre,"cid").withColumn("pre",col("pre")/col("count")).select("cid","pre")
val sup = dfcount.join(dfsup,"cid").withColumn("sup",col("sup")/col("count")).select("cid","sup")
// 第三步:将得到的数据进行汇总
sc.join(course,"cid")
.join(course,"cid")
.groupBy("cid").agg(
max("score").as("max"),
min("score").as("min"),
avg("score").as("avg")
)
.join(pas,Seq("cid"),"left_outer")
.join(mid,Seq("cid"),"left_outer")
.join(pre,Seq("cid"),"left_outer")
.join(sup,Seq("cid"),"left_outer")
.orderBy($"avg".desc)
.show()
sc.selectExpr("*","rank() over(partition by cid order by score desc)")
.withColumnRenamed("rank() OVER (PARTITION BY cid ORDER BY score DESC NULLS LAST unspecifiedframe$())","rank").show()
val sum = sc.groupBy("sid").sum("score").withColumnRenamed("sum(score)","score")
sum.selectExpr("*","rank() over(order by score desc)").withColumnRenamed("rank() OVER (ORDER BY score DESC NULLS LAST unspecifiedframe$())","rank")show()
teacher.join(course,"tid").join(sc,"cid")
.groupBy("cid","tid")
.agg(
avg("score")
)
.orderBy($"avg(score)".desc)
.show()
sc.selectExpr("*","row_number() over(partition by cid order by score desc)")
.withColumnRenamed("row_number() OVER (PARTITION BY cid ORDER BY score DESC NULLS LAST unspecifiedframe$())","rank")
.where($"rank" between(2,3))
.join(student,"sid").show()
这题和18题类似,在写的时候,一直纠结分组后的count不能进行条件判断(因为没有办法加where过滤),后来发现可以在count ( when ( '判断条件' , '值') ) 的方式来进行条件过滤。
如果有更好的或者更简洁的方法,可以一起交流讨论下~~
百分比怎么用我还在研究
sc.groupBy("cid").agg(
count("score"),
count(when($"score".between(85,100),1)),
count(when($"score".between(70,85),1)),
count(when($"score".between(60,70),1)),
count(when($"score".between(0,60),1))
)
.withColumnRenamed("count(score)","count")
.withColumnRenamed("count(CASE WHEN ((score >= 85) AND (score <= 100)) THEN 1 END)","min60")
.withColumnRenamed("count(CASE WHEN ((score >= 70) AND (score <= 85)) THEN 1 END)","60-70")
.withColumnRenamed("count(CASE WHEN ((score >= 60) AND (score <= 70)) THEN 1 END)","70-85")
.withColumnRenamed("count(CASE WHEN ((score >= 0) AND (score <= 60)) THEN 1 END)","85-100")
.withColumn("min60",$"min60"/$"count")
.withColumn("60-70",$"60-70"/$"count")
.withColumn("70-85",$"70-85"/$"count")
.withColumn("85-100",$"85-100"/$"count")
.join(course,"cid")
.select("cid","cname","min60","60-70","70-85","85-100")
.show()
注意:selectExpr中,partition by 或 order by 的列如果是自定义的,需要加上引号。否则无法识别。
sc.groupBy("sid").avg("score")
.selectExpr("*","row_number() over(order by 'avg(score)' desc)")
.withColumnRenamed("row_number() OVER (ORDER BY avg(score) DESC NULLS LAST unspecifiedframe$())","rank")
.show()
sc.selectExpr("*","row_number() over(partition by cid order by score desc)")
.withColumnRenamed("row_number() OVER (PARTITION BY cid ORDER BY score DESC NULLS LAST unspecifiedframe$())","rank")
.where("rank<=3").show()
sc.groupBy("cid").count().show()
sc.groupBy("sid")
.count().where($"count"===2).join(student,"sid").show()
student.groupBy("ssex").count().show()
student.where($"sname".like("%风%")).show()
student.groupBy("sname","ssex").count().where("count>1").show()
student.where(year($"sage")===1990).show()
sc.groupBy("cid").avg("score").orderBy($"avg(score)".desc,$"cid".asc).show()
sc.groupBy("sid").avg("score").where($"avg(score)">=85).join(student,"sid").show()
sc.join(course,"cid").join(student,"sid").where($"cname".equalTo("数学") &&$"score"<60 ).show()
student.join(sc,Seq("sid"),"left_outer").show()
sc.where($"score">70).join(student,"sid").join(course,"cid").show()
sc.where("score<60").join(student,"sid").show()
sc.where($"cid"===1 && $"score">=80).join(student,"sid").show()
sc.groupBy("cid").count().show()
teacher.where($"tname".equalTo("张三"))
.join(course,"tid")
.join(sc,"cid").limit(1)
.show()
sc.groupBy("score").count().where($"count">1)
.join(sc,"score")
.join(student,"sid").show()
group by 和limit不能同时使用,所以还是要用到开窗函数
sc.selectExpr("*","row_number() over(partition by cid order by score desc)")
.withColumnRenamed("row_number() OVER (PARTITION BY cid ORDER BY score DESC NULLS LAST unspecifiedframe$())","rank")
.where($"rank"<=2).show()
sc.groupBy("cid").count().orderBy($"count".desc,$"cid".asc).show()
sc.groupBy("sid").count().where("count>=2").show()
sc.groupBy("sid").count().where($"count".equalTo(course.count())).join(student,"sid")show()
student.withColumn("sage",year(current_date())-year($"sage")).show()
student.where(weekofyear(current_date())===weekofyear($"sage")).show()
student.where(weekofyear(current_date()+1)===weekofyear($"sage")).show()
student.where(month(current_date())===month($"sage")).show()
student.where(month(current_date()+1)===month($"sage")).show()