SELECT SUM() OVER(PARTITION BY ___ ORDER BY___) FROM Table
SELECT
first_name,
last_name,
salary,
AVG(salary) OVER()
FROM employee;
-- `AVG(salary)` 意思是要计算平均工资,加上 `OVER()` 意味着对全部数据进行计算,所以就是在计算所有人的平均工资。
OVER()
用于将当前行与一个聚合值进行比较-- 需求:创建报表统计每个员工的工龄和平均工龄之间的差值。
SELECT
first_name,
last_name,
years_worked,
AVG(years_worked) over() as `avg`,
years_worked - AVG(years_worked) over() as `difference`
FROM employee;
-- 需求:查询人力资源部(`department_id = 3`)的采购情况- 查询如下字段:
-- `id`,`department_id`,`item`,`price`,最高采购金额,最高采购金额和每项采购的金额差值
SELECT
id,
department_id,
item,
price,
MAX(price) over() as 'max_price',
MAX(price) over() - price as 'different'
FROM purchase
WHERE department_id=3;
WHERE
子句后执行!-- 需求:查询部门id为1,2,3三个部门员工的姓名,薪水,和这三个部门员工的平均薪资
SELECT
first_name,
last_name,
salary,
AVG(salary) OVER() as avg
FROM employee
WHERE department_id IN (1, 2, 3);
OVER (PARTITION BY column1, column2 ... column_n)
`PARTITION BY` 的作用与 `GROUP BY`类似:将数据按照传入的列进行分组,与 `GROUP BY` 的区别是, `PARTITION BY` 不会改变结果的行数。
-- 需求:按车型分组,每组中满足一等座>30,二等座>180的有几条记录
SELECT
id,
model,
first_class_places,
second_class_places,
count(id) over(PARTITION BY model) as 'count'
FROM train
WHERE first_class_places>30
AND second_class_places>180;
-- 需求:查询时刻表中的车次ID,运营车辆的生产日期(`production_year`),同一种车型的车次数量,同一线路的车次数量
SELECT
journey.id,
production_year,
COUNT(journey.id) OVER(PARTITION BY train_id) as count_train,
COUNT(journey.id) OVER (PARTITION BY route_id) as count_journey
FROM journey
JOIN train
ON journey.train_id = train.id;
OVER (ORDER BY )
`RANK()`会返回每一行的等级(序号)
`ORDER BY`对行进行排序将数据按升序或降序排列
` RANK()OVER(ORDER BY ...)`是一个函数,与`ORDER BY` 配合返回序号
-- 需求:统计每个游戏的名字,分类,更新日期,更新日期序号
SELECT
`name`,
genre,
updated,
RANK() over(ORDER BY updated) as 'date_rank'
FROM game;
-- 对游戏的安装包大小进行排序,使用`DENSE_RANK()`,返回游戏名称,包大小以及序号。
SELECT
`name`,
size,
DENSE_RANK() over(
FROM game;
ORDER BY
配合返回的是连续不重复的序号-- 需求:将游戏按发行时间排序,返回唯一序号
-- 需求:对比 `RANK()`, `DENSE_RANK()`, `ROW_NUMBER()` 之间的区别,对上面的案例同时使用三个函数
SELECT
`name`,
genre,
released,
RANK() over(ORDER BY released) as 'rank',
DENSE_RANK() over(ORDER BY released) as 'dense_rank',
ROW_NUMBER() over(ORDER BY released) as 'row_number'
FROM game;
ORDER BY
配合返回的是连续不重复的序号-- 将所有的游戏按照升级日期降序排列分成4组,返回游戏名字,类别,更新日期,和分组序号
SELECT
`name`,
genre,
updated,
NTILE(4) over(ORDER BY updated DESC) as 'ntile'
FROM game;
-- 需求:查询最近更新的游戏中,时间第二近的游戏,返回游戏名称,运行平台,更新时间
WITH ranking AS(
SELECT
`name`,
platform,
updated,
RANK() over(ORDER BY updated DESC) as 'rank'
FROM game
)
SELECT
name,
platform,
updated
FROM ranking
WHERE `rank`=2;
小结
最基本的排序函数: RANK() OVER(ORDER BY column1, column2...)
通过排序获取序号的函数介绍了如下三个:
ORDER BY
配合返回的是连续不重复的序号NTILE(x) – 将数据分组,并为每组添加一个相同的序号
WITH 获取排序后,指定位置的数据(第一位,第二位)可以通过如下
WITH ranking AS
(SELECT
RANK() OVER (ORDER BY col2) AS RANK,
col1
FROM table_name)
SELECT col1
FROM ranking
WHERE RANK = place1;
ROWS BETWEEN lower_bound AND upper_bound
-- 需求:统计每件商品的上架日期,以及截至值该日期,上架商品种类数量
SELECT
id,
name,
introduced,
COUNT(id) OVER(
ORDER BY introduced
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM product;
ROWS UNBOUNDED PRECEDING 等价于 `BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`
ROWS n PRECEDING 等价于 `BETWEEN n PRECEDING AND CURRENT ROW`
ROWS CURRENT ROW 等价于 `BETWEEN CURRENT ROW AND CURRENT ROW
和使用 `ROWS`一样,使用`RANGE` 一样可以通过 `BETWEEN ... AND...` 来自定义窗口
在使用`RANGE` 时,我们一般用
`RANGE UNBOUNDED PRECEDING`
`RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING`
`RANGE CURRENT ROW`
小结
我们可以在OVER(…)中定义一个窗口框架。 语法为:
` 按如下方式定义:
BETWEEN AND
, 其中边界通过以下方式定义:
OVER (...)
与聚类函数不同的地方是,分析函数只引用窗口中的单个行
SELECT
name,
opened,
LEAD(name) OVER(ORDER BY opened)
FROM website;
上面的SQL中,分析函数为LEAD(name)。 LEAD中传入name列作为参数,将以 `ORDER BY` 排序后的顺序,返回当前行的下一行`name` 列所对应的值,并在新列中显示,具体如下图所示:
-- 需求: 统计id 为1的网站,每天访问的人数以及下一天访问的人数- 返回字段:`day`日期,`users`访问人数,`lead` 下
-- 一天访问人数
SELECT
day,
users,
LEAD(users) OVER(ORDER BY day) AS `lead`
FROM statistics
WHERE website_id = 1;
-- 需求:统计id为2的网站,在2016年5月1日到5月14日之间,每天的用户访问数量以及7天后的用户访问数量
-- 需要注意,最后7行最后一列会返回NULL,因为最后7行没有7日后的数据。
SELECT
day,
users,
LEAD(users, 7) OVER(ORDER BY day) AS `lead`
FROM statistics
WHERE website_id = 2
AND day BETWEEN '2016-05-01' AND '2016-05-14';
LAG(x)函数与LEAD(x)用法类似,区别是,LEAD返回当前行后面的值,LAG返回当前行之前的值
LEAD(…)和
LAG(…),之间可以互相替换,可以在ORDER BY的时候通过
DESC 来改变排序方式,使
LEAD(…)和
LAG(…)`返回相同结果
-- 与LEAD(x,y,z)一样,LAG(x,y,z) 最后一个参数是默认值,用来填补NULL值
-- 统计id = 3的网站每日广告收入以及三天前的广告收入
SELECT
day,
revenue,
LAG(revenue, 3, -1.00) OVER(ORDER BY day)
FROM statistics
WHERE website_id = 3;
-- 需求:统计id为2的网站每天用户访问情况,以及最少用户访问人数。
SELECT
day,
users,
FIRST_VALUE(users) OVER(ORDER BY users) as `first_value`
FROM statistics
WHERE website_id = 2;
-- 需求:统计id为1的网站的广告展示情况,返回每日日期,广告展示次数,以及访问用户最多的一天广告展示的次数
SELECT
day,
impressions,
LAST_VALUE(impressions) OVER(
ORDER BY users
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS `last_value`
FROM statistics
WHERE website_id = 1;
-- 需求:统计id为2的网站的收入情况,在5月15和5月31日之间,每天的收入,以及这半个月内的第三高的日收入金额
SELECT
day,
revenue,
NTH_VALUE(revenue,3) OVER (
ORDER BY revenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) `3rd_highest`
FROM statistics
WHERE website_id = 2
AND day BETWEEN '2016-05-15' AND '2016-05-31';
-- 需求:统计2016年8月10日至8月14日之间的销售情况,返回如下字段- `store_id`, `day`,顾客数量`customers`, 每个商店在该--- 段时间内按每日顾客数量排名(降序排列)
SELECT
store_id,
day,
customers,
RANK() OVER (PARTITION BY store_id ORDER BY customers DESC) AS `rank`
FROM sales
WHERE day BETWEEN '2016-08-10' AND '2016-08-14';
-- 需求:分析2016年8月1日到8月7日的销售数据,统计每个商店到当前日期为止的单日最高销售收入
返回字段:商店id`store_id`,日期 `day`,销售收入 `revenue` 和 最佳销售收入 best revenue
SELECT
store_id,
day,
revenue,
MAX(revenue) OVER(
PARTITION BY store_id
ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as best_revenue
FROM sales
WHERE day BETWEEN '2016-08-01' AND '2016-08-07';
执行顺序
FROM
WHERE
GROUP BY
聚合函数
HAVING
窗口函数
SELECT
DISTINCT
UNION
ORDER BY
OFFSET
LIMIT
-- 需求:将所有的拍卖按照浏览量降序排列,并均分成4组,最终结果再按照每组编号升序排列,返回字段: `id`, `views` 和 分组情况( `quartile`)
SELECT
id,
views,
NTILE(4) OVER(ORDER BY views DESC) AS quartile
FROM auction
ORDER BY NTILE(4) OVER(ORDER BY views DESC);
-- 需求:将拍卖数据按国家分组,查询如下字段:
- 国家 `country`
- 每组最少参与人数 `min`
- 所有组最少参与人数的平均值 `avg`
SELECT
country,
MIN(participants) AS `min`,
AVG(MIN(participants)) OVER() AS `avg`
FROM auction
GROUP BY country;
-- 需求: 按商品分类`category_id` 分组,对成交价格`final_price`求和`sum`,对所有类别按成交价格的总金额排序,返回序号`rank`
返回字段 `category_id` ,`sum`,`rank`
SELECT
category_id,
SUM(final_price) AS `sum`,
RANK() OVER(ORDER BY SUM(final_price) DESC) AS `rank`
FROM auction
GROUP BY category_id;
需求:按拍卖结束日期`ended`分组分析所有拍卖的浏览数据`views`,返回如下字段:
每组的拍卖结束日期`ended`
每组的总浏览量 `sum`
每组的前一组总浏览量 `previous_day`
比较结束日期相邻两天浏览量的差值 `delta`
SELECT
ended,
SUM(views) AS `sum`,
LAG(SUM(views)) OVER(ORDER BY ended) AS previous_day,
SUM(views) - LAG(SUM(views)) OVER(ORDER BY ended) AS delta
FROM auction
GROUP BY ended
ORDER BY ended;