与聚集函数一样,窗口函数也针对定义的行集(组)执行聚集,但它不像聚集函数那样每组之返回一个值,窗口函数可以为每组返回多个值。实际上,DB2中称这种函数为联机分析处理OLAP函数,而Oracle把它们称为解析函数,但ISO SQL标准把它们称为窗口函数。窗口函数一般在OLAP分析、制作报表过程中会使用到。
窗口函数:
聚合函数 over()
聚合函数 over(partition by 字段)—分区
聚合函数 over(order by 字段)--框架字句
本文以Oracle11g中HR模式下的Employees表为例子来试着了解窗口函数,
Employees表结构如下:
SQL> desc employees 名称 是否为空? 类型 ----------------------------------------- -------- ---------------------------- EMPLOYEE_ID NOT NULL NUMBER(6) FIRST_NAME VARCHAR2(20) LAST_NAME NOT NULL VARCHAR2(25) EMAIL NOT NULL VARCHAR2(25) PHONE_NUMBER VARCHAR2(20) HIRE_DATE NOT NULL DATE JOB_ID NOT NULL VARCHAR2(10) SALARY NUMBER(8,2) COMMISSION_PCT NUMBER(2,2) MANAGER_ID NUMBER(6) DEPARTMENT_ID NUMBER(4)
计算部门号位20的员工总数:
SQL> edit 1 select first_name,department_id,count(*) over() 2 from employees 3* where department_id=20 SQL> / FIRST_NAME DEPARTMENT_ID COUNT(*)OVER() -------------------- ------------- -------------- Michael 20 2 Pat 20 2
窗口 ,函数 count(*) over() 对于查询返回的每一行,它返回了表中所有行的计数。
在深入研究Over字句之前,一定要注意:在SQL处理中,窗口函数都是最后一步执行,而且仅位于Order by字句之前。
使用Partiton by字句定义行的分区或组,可以用paritition by对定义的行组计算聚集(当遇到新组的时候复位),并返回每个值(每个组中的每个成员),而不是一个用一个组表示表中的这个值的所有实例。如:
SQL> edit 1 select first_name,department_id,count(*) over(partition by department_id) as cnt 2 from employees 3* order by 2 SQL> / FIRST_NAME DEPARTMENT_ID CNT -------------------- ------------- ---------- Jennifer 10 1 Michael 20 2 Pat 20 2 Den 30 6 Alexander 30 6 Shelli 30 6 Sigal 30 6 Guy 30 6 Karen 30 6 Susan 40 1 Matthew 50 45
。。。。。。。。。。
如上结果所示:对于同一个部门(同一个分区)的每个员工的cnt值相同,这是由于在遇到新部门之前不会重置聚集。
另外partition by字句的优点是:在同一个select语句中,一个窗口函数的计算独立于按其他列分区的其他窗口函数的计算。例如下面的查询,返回每个员工、他的部门、他的部门中的员工数、他的职位以及跟他相同职位的员工数:
1 select first_name,department_id,count(*) over (partition by department_id) as dept_cnt, 2 job_id, 3 count(*) over(partition by job_id) as job_cnt 4 from employees 5* order by 2 SQL> / FIRST_NAME DEPARTMENT_ID DEPT_CNT JOB_ID JOB_CNT -------------------- ------------- ---------- ---------- ---------- Jennifer 10 1 AD_ASST 1 Michael 20 2 MK_MAN 1 Pat 20 2 MK_REP 1 Sigal 30 6 PU_CLERK 5 Alexander 30 6 PU_CLERK 5 Shelli 30 6 PU_CLERK 5 Karen 30 6 PU_CLERK 5 Den 30 6 PU_MAN 1 Guy 30 6 PU_CLERK 5 Susan 40 1 HR_REP 1 Donald 50 45 SH_CLERK 20
当在窗口函数over字句中使用order by 字句时,就指定了两件事:
1、分区中的行如何排序
2、在计算中包含哪些行
请看下面的查询,它计算了30号员工的工资的累计和
1 select department_id,first_name,hire_date,salary, 2 sum(salary) over(partition by department_id) as total1, 3 sum(salary) over() as total2, 4 sum(salary) over(order by hire_date) as running_total 5 from employees 6* where department_id=30 SQL> / DEPARTMENT_ID FIRST_NAME HIRE_DATE SALARY TOTAL1 ------------- -------------------- -------------- ---------- ---------- TOTAL2 RUNNING_TOTAL ---------- ------------- 30 Den 07-12月-02 11000 24900 24900 11000 30 Alexander 18-5月 -03 3100 24900 24900 14100 30 Sigal 24-7月 -05 2800 24900 24900 16900 DEPARTMENT_ID FIRST_NAME HIRE_DATE SALARY TOTAL1 ------------- -------------------- -------------- ---------- ---------- TOTAL2 RUNNING_TOTAL ---------- ------------- 30 Shelli 24-12月-05 2900 24900 24900 19800 30 Guy 15-11月-06 2600 24900 24900 22400 30 Karen 10-8月 -07 2500 24900 24900 24900 已选择6行。
上面的查询语句相当于:
1 select department_id,first_name,hire_date,salary, 2 sum(salary) over(partition by department_id) as total1, 3 sum(salary) over() as total2, 4 sum(salary) over(order by hire_date range between unbounded preceding and current row) as running_total 5 from employees 6* where department_id=30
也就说默认情况下会告诉查询:计算所有行的和,即从当前行开始、包括它前面的所有行。对从当前行开始、包括它前面的所有行进行求和,就可以得到累计和效果了。
通过,框架字句允许定义数据的不同“子窗口”,以便在计算中使用,有很多方式可以指定这样的子窗口。如:
1 select department_id,first_name,salary, 2 sum(salary) over (order by hire_date range between unbounded preceding and current row) as run_total1, 3 sum(salary) over(order by hire_date rows between 1 preceding and current row) as run_total2, 4 sum(salary) over(order by hire_date range between current row and unbounded following) as run_total3, 5 sum(salary) over(order by hire_date rows between current row and 1 following) as run_total4 6 from employees 7* where department_id=30 SQL> / DEPARTMENT_ID FIRST_NAME SALARY RUN_TOTAL1 RUN_TOTAL2 RUN_TOTAL3 ------------- -------------------- ---------- ---------- ---------- ---------- RUN_TOTAL4 ---------- 30 Den 11000 11000 11000 24900 14100 30 Alexander 3100 14100 14100 13900 5900 30 Sigal 2800 16900 5900 10800 5700 DEPARTMENT_ID FIRST_NAME SALARY RUN_TOTAL1 RUN_TOTAL2 RUN_TOTAL3 ------------- -------------------- ---------- ---------- ---------- ---------- RUN_TOTAL4 ---------- 30 Shelli 2900 19800 5700 8000 5500 30 Guy 2600 22400 5500 5100 5100 30 Karen 2500 24900 5100 2500 2500 已选择6行。
其中:
range between unbounded preceding and current row 指定计算当前行开始、当前行之前的所有值;
rows between 1 preceding and current row 指定计算当前行的前一行开始,其范围一直延续到当前行;
range between current row and unbounded following 指定计算从当前行开始,包括它后面的所有行;
rows between current row and 1 following 指定计算当前行和它后面的一行;
最后一个例子,展示 了框架字句对查询输出的影响,请看下面查询:
1 select first_name,salary,min(salary) over(order by salary) min1, 2 max(salary) over(order by salary) max1, 3 min(salary) over(order by salary range between unbounded preceding and unbounded following) min2, 4 max(salary) over(order by salary range between unbounded preceding and unbounded following) max2, 5 min(salary) over(order by salary range between current row and current row) min3, 6 max(salary) over(order by salary range between current row and current row) max3, 7 max(salary) over(order by salary rows between 3 preceding and 3 following) max4 8* from employees SQL> / FIRST_NAME SALARY MIN1 MAX1 MIN2 MAX2 -------------------- ---------- ---------- ---------- ---------- ---------- MIN3 MAX3 MAX4 ---------- ---------- ---------- TJ 2100 2100 2100 2100 24000 2100 2100 2400 Steven 2200 2100 2200 2100 24000 2200 2200 2400 Hazel 2200 2100 2200 2100 24000 2200 2200 2500
请仔细观察计算结果,领会子窗口的内涵;