窗口函数的名字是over()函数,常用的有两个属性partition by和order by,partition by类似于group by,我们通常将group by叫做分组,而partition by称作分区。
一般结构为:
Function(arg1 , arg2 ……) over(partition by clause order by clause windowing clause )
Windowing clause : rows | range between start_expr and end_expr
Start_expr is unbounded preceding | current row | n preceding | n following
End_expr is unbounded following | current row | n preceding | n following
Function可以是下面函数,后面有星号 (*) 的函数允许完整的语法,包括windowing_clause。
AVG *
CLUSTER_DETAILS
CLUSTER_DISTANCE
CLUSTER_ID
CLUSTER_PROBABILITY
CLUSTER_SET
CORR *
COUNT *
COVAR_POP *
COVAR_SAMP *
CUME_DIST
DENSE_RANK
FEATURE_DETAILS
FEATURE_ID
FEATURE_SET
FEATURE_VALUE
FIRST
FIRST_VALUE *
LAG
LAST
LAST_VALUE *
LEAD
LISTAGG
MAX *
MIN *
NTH_VALUE *
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
PREDICTION
PREDICTION_COST
PREDICTION_DETAILS
PREDICTION_PROBABILITY
PREDICTION_SET
RANK
RATIO_TO_REPORT
REGR_ (Linear Regression) Functions *
ROW_NUMBER
STDDEV *
STDDEV_POP *
STDDEV_SAMP *
SUM *
VAR_POP *
VAR_SAMP *
VARIANCE *
Windowing clause 指定分析函数的对象物理或逻辑行集( ROWS | RANGE )。
举个例子。
conn test/test@localhost:1521/pdb
create table test_tab (student_id number, subject_id number, score number);
insert into test_tab values(1,1,90);
insert into test_tab values(1,2,98);
insert into test_tab values(1,3,99);
insert into test_tab values(1,4,95);
insert into test_tab values(2,1,98);
insert into test_tab values(2,2,95);
insert into test_tab values(2,3,98);
insert into test_tab values(2,4,97);
insert into test_tab values(3,1,93);
insert into test_tab values(3,2,94);
insert into test_tab values(3,3,94);
insert into test_tab values(3,4,91);
commit;
--以“subject_id”分区,找出每个人和相同“subject_id”的平均“score”的偏离值。
SQL> set autot on
SQL> select t.*,(t.score-avg(t.score) over( partition by t.subject_id)) as gaps
from test_tab t
order by student_id,subject_id; 2 3
STUDENT_ID SUBJECT_ID SCORE GAPS
---------- ---------- ---------- ----------
1 1 90 -3.6666667
1 2 98 2.33333333
1 3 99 2
1 4 95 .666666667
2 1 98 4.33333333
2 2 95 -.66666667
2 3 98 1
2 4 97 2.66666667
3 1 93 -.66666667
3 2 94 -1.6666667
3 3 94 -3
3 4 91 -3.3333333
12行が選択されました。
実行計画
----------------------------------------------------------
Plan hash value: 2491645504
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12 | 108 | 5 (40)| 00:00:01 |
| 1 | SORT ORDER BY | | 12 | 108 | 5 (40)| 00:00:01 |
| 2 | WINDOW SORT | | 12 | 108 | 5 (40)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_TAB | 12 | 108 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
統計
----------------------------------------------------------
59 recursive calls
23 db block gets
99 consistent gets
1 physical reads
4080 redo size
1158 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
12 rows processed
--不使用窗口函数取得上面的结果。
select t1.*, (t1.score - t3.avgs) as gaps
from test_student_score t1,
(select t2.subject_id, avg(t2.score) as avgs
from test_student_score t2
group by t2.subject_id) t3
where t1.subject_id = t3.subject_id
order by t1.student_id,t1.subject_id;
STUDENT_ID SUBJECT_ID SCORE GAPS
---------- ---------- ---------- ----------
1 1 90 -3.6666667
1 2 98 2.33333333
1 3 99 2
1 4 95 .666666667
2 1 98 4.33333333
2 2 95 -.66666667
2 3 98 1
2 4 97 2.66666667
3 1 93 -.66666667
3 2 94 -1.6666667
3 3 94 -3
3 4 91 -3.3333333
12行が選択されました。
実行計画
----------------------------------------------------------
Plan hash value: 1945508744
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36 | 2772 | 7 (15)| 00:00:01 |
| 1 | SORT GROUP BY | | 36 | 2772 | 7 (15)| 00:00:01 |
|* 2 | HASH JOIN | | 36 | 2772 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST_STUDENT_SCORE | 12 | 612 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST_STUDENT_SCORE | 12 | 312 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."SUBJECT_ID"="T2"."SUBJECT_ID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計
----------------------------------------------------------
374 recursive calls
0 db block gets
363 consistent gets
0 physical reads
0 redo size
1158 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
28 sorts (memory)
0 sorts (disk)
12 rows processed
简单比较一下,似乎使用窗口函数时的COST更小。
2021/04/15 @ Dalian