Analytic Functions in Oracle 8i and 9i
Oracle 8i and 9i分析函数
Contents目录
Overview and Introduction概述与简介
How Analytic Functions Work分析函数原理
The Syntax句法
Calculate a running Total累计计算
Top-N Queries前N条查询
Example 1例1
Example 2例2
Windows窗口
Range Windows范围窗口
Compute average salary for defined range计算定义范围的平均工资
Row Windows行窗口
Accessing Rows Around Your Current Row访问当前行前后的行
LAG
LEAD
Determine the First Value / Last Value of a Group确定组的首值和末值
Crosstab or Pivot Queries交叉表或Pivot查询
Conclusion结论
Links and Documents链接和文档
Overview概述:
Analytic Functions, which have been available since Oracle 8.1.6, are designed to address such problems as "Calculate a running total", "Find percentages within a group", "Top-N queries", "Compute a moving average" and many more. Most of these problems can be solved using standard PL/SQL, however the performance is often not what it should be. Analytic Functions add extensions to the SQL language that not only make these operations easier to code; they make them faster than could be achieved with pure SQL or PL/SQL. These extensions are currently under review by the ANSI SQL committee for inclusion in the SQL specification.
分析函数,最早是从ORACLE8.1.6开始出现的,它的设计目的是为了解决诸如“累计计算”,“找出分组内百分比”,“前-N条查询”,“移动平均数计算”"等问题。其实大部分的问题都可以用PL/SQL解决,但是它的性能并不能达到你所期望的效果。分析函数是SQL言语的一种扩充,它并不是仅仅试代码变得更简单而已,它的速度比纯粹的SQL或者PL/SQL更快。现在这些扩展已经被纳入了美国国家标准化组织SQL委员会的SQL规范说明书中。
How Analytic Functions Work ?分析函数的原理
Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic clause. For each row, a "sliding" window of rows is defined. The window determines the range of rows used to perform the calculations for the "current row". Window sizes can be based on either a physical number of rows or a logical interval such as time. 分析函数是在一个记录行分组的基础上计算它们的总值。与集合函数不同,他们返回各分组的多行记录。行的分组被称窗口,并通过分析语句定义。对于每记录行,定义了一个“滑动”窗口。该窗口确定“当前行”计算的范围。窗口的大小可由各行的实际编号或由时间等逻辑间隔确定。
Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause. 除了ORDER BY(按…排序)语句外,分析函数是一条查询被执行的操作。所有合并、WHERE、GROUP BY、HAVING语句都是分析函数处理之前完成的。因此,分析函数只出现在选择目录或ORDER BY(按…排序)语句中。
The Syntax句法
The Syntax of analytic functions is rather straightforward in appearance分析函数的句法非常简单。
Analytic-Function(<Argument>,<Argument>,...)
OVER (
<Query-Partition-Clause>
<Order-By-Clause>
<Windowing-Clause>
)
o Analytic-Function分析函数的种类
Specify the name of an analytic function, Oracle actually provides many analytic functions such as AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.
分析函数的名称,ORACLE通常多个分析函数,包括:AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.
o Arguments参数
Analytic functions take 0 to 3 arguments. 分析函数通常有0到3个参数
o Query-Partition-Clause查询划分语句
The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words "partition" and "group" are used synonymously here. The analytic functions are applied to each group independently, they are reset for each group. 根据划分表达式设置的规则,PARTITION BY(按…划分)将一个结果逻辑分成N个分组划分表达式。在此“划分”和“分组”用作同义词。分析函数独立应用于各个分组,并在应用时重置。
o Order-By-Clause排序语句
The ORDER BY clause specifies how the data is sorted within each group (partition). This will definitely affect the outcome of any analytic function. ORDER BY(按…排序)语句规定了每个分组(划分)的数据如何排序。这必然影响分析函数的结果。
o Windowing-Clause窗口生成语句
The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group. More information on windows can be found here. 窗口生成语句用以定义滑动或固定数据窗口,分析函数在分组内进行分析。该语句能够对分组中任意定义的滑动或固定窗口进行计算。点击此处了解更多。
Example: Calculate a running Total例:累计计算
This example shows the cumulative salary within a departement row by row, with each row including a summation of the prior rows salary.本例中对某部门的工资进行逐行计算,每行包括之前所有行中工资的合计。
set autotrace traceonly explain
break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999
column seq format 999
SELECT ename "Ename", deptno "Deptno", sal "Sal",
SUM(sal)
OVER (ORDER BY deptno, ename) "Running Total",
SUM(SAL)
OVER (PARTITION BY deptno
ORDER BY ename) "Dept Total",
ROW_NUMBER()
OVER (PARTITION BY deptno
ORDER BY ENAME) "Seq"
FROM emp
ORDER BY deptno, ename
/
Ename Deptno Sal Running Total Dept Total Seq
------ ------ ------ ------------- ---------- ----
CLARK 10 2450 2450 2450 1
KING 5000 7450 7450 2
MILLER 1300 8750 8750 3
ADAMS 20 1100 9850 1100 1
FORD 3000 12850 4100 2
JONES 2975 15825 7075 3
SCOTT 3000 18825 10075 4
SMITH 800 19625 10875 5
ALLEN 30 1600 21225 1600 1
BLAKE 2850 24075 4450 2
JAMES 950 25025 5400 3
MARTIN 1250 26275 6650 4
TURNER 1500 27775 8150 5
WARD 1250 29025 9400 6
Execution Plan
---------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 WINDOW (SORT)
2 1 TABLE ACCESS (FULL) OF 'EMP'
Statistics
---------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
1658 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed
The example shows how to calculate a "Running Total" for the entire query. This is done using the entire ordered result set, via SUM(sal) OVER (ORDER BY deptno, ename).本例指出了如何计算整条查询的“累计”。即使用排序后的整个结果集合,通过SUM(sal) OVER (ORDER BY deptno, ename)函数得到。
Further, we were able to compute a running total within each department, a total that would be reset at the beginning of the next department. The PARTITION BY deptno in that SUM(sal) caused this to happen, a partitioning clause was specified in the query in order to break the data up into groups.可以进一步计算各个部门的累计值,该值在开始下一个部门计算时将被重置。由SUM(sal)中的PARTITION BY deptno实现。该条查询中指定划分语句将数据进行分组。
The ROW_NUMBER() function is used to sequentially number the rows returned in each group, according to our ordering criteria (a "Seq" column was added to in order to display this position).根据排序规则(增加了“Seq”列以显示该状态),ROW_NUMBER()函数将每组返回的记录行进行顺序编号,
The execution plan shows, that the whole query is very well performed with only 3 consistent gets, this can never be accomplished with standard SQL or even PL/SQL. 执行计划显示,整条查询仅需3条一致get函数就可以很好的执行。这一点是标准SQL甚至PL/SQL不能都实现的。
Top-N Queries前N条查询
How can we get the Top-N records by some set of fields ?如何通过部分字段得到前N条记录?
Prior to having access to these analytic functions, questions of this nature were extremely difficult to answer.在未使用这些分析函数之前,很难对此类问题做出回答。
There are some problems with Top-N queries however; mostly in the way people phrase them. It is something to be careful about when designing reports. Consider this seemingly sensible request:人们关于前N条查询的说法存在问题。在设计报告时,应留意这一点。
I would like the top three paid sales reps by department我需要知道部门工资为前3名销售代表的谁。
The problem with this question is that it is ambiguous. It is ambiguous because of repeated values, there might be four people who all make the same salary, what should we do then ?这句话的问题在于含混不清。因为存在重复的值,如果有四个人领着同样的工资,该怎么处理?
Let's look at three examples, all use the well known table EMP.以下3个例子均使用EMP表。
Example 1例1
Sort the sales people by salary from greatest to least. Give the first three rows. If there are less then three people in a department, this will return less than three records.从多到少排列销售人员的工资,取前三行。如果该部门少于三人,则返回的记录少于三个。
set autotrace on explain
break on deptno skip 1
SELECT * FROM (
SELECT deptno, ename, sal, ROW_NUMBER()
OVER (
PARTITION BY deptno ORDER BY sal DESC
) Top3 FROM emp
)
WHERE Top3 <= 3
/
DEPTNO ENAME SAL TOP3
---------- ---------- ---------- ----------
10 KING 5000 1
CLARK 2450 2
MILLER 1300 3
20 SCOTT 3000 1
FORD 3000 2
JONES 2975 3
30 BLAKE 2850 1
ALLEN 1600 2
TURNER 1500 3
9 rows selected.
Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT)
3 2 TABLE ACCESS (FULL) OF 'EMP'
This query works by sorting each partition (or group, which is the deptno), in a descending order, based on the salary column and then assigning a sequential row number to each row in the group as it is processed. The use of a WHERE clause after doing this to get just the first three rows in each partition.该查询根据工资列以降序排列各个划分(或分组,属于该deptno),并在处理过程中为每行分配一个顺序号。然后使用WHERE语句得到各划分的前三行。
Example 2例2
Give me the set of sales people who make the top 3 salaries - that is, find the set of distinct salary amounts, sort them, take the largest three, and give me everyone who makes one of those values.我需要工资为前三位的销售人员名字——即查找工资金额、排序、取最高的三项金额、给我领取这些工资的人员的名字。
SELECT * FROM (
SELECT deptno, ename, sal,
DENSE_RANK()
OVER (
PARTITION BY deptno ORDER BY sal desc
) TopN FROM emp
)
WHERE TopN <= 3
ORDER BY deptno, sal DESC
/
DEPTNO ENAME SAL TOPN
---------- ---------- ---------- ----------
10 KING 5000 1
CLARK 2450 2
MILLER 1300 3
20 SCOTT 3000 1 <--- !
FORD 3000 1 <--- !
JONES 2975 2
ADAMS 1100 3
30 BLAKE 2850 1
ALLEN 1600 2
30 TURNER 1500 3
10 rows selected.
Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT PUSHED RANK)
3 2 TABLE ACCESS (FULL) OF 'EMP'
Here the DENSE_RANK function was used to get the top three salaries. We assigned the dense rank to the salary column and sorted it in a descending order.其中,使用DENSE_RANK函数得出最高的三个工资金额。然后指定Dense rank至工资列,并将其按降序排列。
The DENSE_RANK function computes the rank of a row in an ordered group of rows. The ranks are consecutive integers beginning with 1. The largest rank value is the number of unique values returned by the query. Rank values are not skipped in the event of ties. Rows with equal values for the ranking criteria receive the same rank. DENSE_RANK函数计算排序后分组中各行的序数。序数为从1开始的连续整数。最大的序数就是查询所所返回唯一值的个数。如果出现并列,序数不跳计。具有相同值的列的序数相同。
The DENSE_RANK function does not skip numbers and will assign the same number to those rows with the same value. Hence, after the resultset is built in the inline view, we can simply select all of the rows with a dense rank of three or less, this gives us everyone who makes the top three salaries by department number. DENSE_RANK函数不跳计序数,并为相同值的列赋予相同的序数。结果集合在当前窗口建立后,通过部门编号选择Dense rank为3 或3之前的行,就可以知道工资在该部门前三位的名字。
Windows窗口
The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. The default window is an anchored window that simply starts at the first row of a group an continues to the current row.窗口语句用以定义滑动或固定数据窗口。在其上面运行组的分析函数。默认窗口为固定窗口,从第一行开始到当前行。
We can set up windows based on two criteria: RANGES of data values or ROWS offset from the current row. It can be said, that the existance of an ORDER BY in an analytic function will add a default window clause of RANGE UNBOUNDED PRECEDING. That says to get all rows in our partition that came before us as specified by the ORDER BY clause.可根据两种规则对窗口进行设置:数据值的范围或当前行指定区距的行。分析函数中的ORDER BY会默认添加一条窗口语句:RANGE UNBOUNDED PRECEDING。即按照ORDER BY语句取得划分中的当前之前的所有行。
Let's look at an example with a sliding window within a group and compute the sum of the current row's SAL column plus the previous 2 rows in that group. If we need a report that shows the sum of the current employee's salary with the preceding two salaries within a departement, it would look like this.5。以下例子为一个分组中的滑动窗口,计算该组中当前行与其前两行的SAL列的和。如我们需要计算当前员工的工资和其之前的两人工资的和,如下例所示。
break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999
SELECT deptno "Deptno", ename "Ename", sal "Sal",
SUM(SAL)
OVER (PARTITION BY deptno
ORDER BY ename
ROWS 2 PRECEDING) "Sliding Total"
FROM emp
ORDER BY deptno, ename
/
Deptno Ename Sal Sliding Total
------ ------ ------ -------------
10 CLARK 2450 2450
KING 5000 7450
MILLER 1300 8750
20 ADAMS 1100 1100
FORD 3000 4100
JONES 2975 7075 ^
SCOTT 3000 8975 |
SMITH 800 6775 "-- Sliding Window
30 ALLEN 1600 1600
BLAKE 2850 4450
JAMES 950 5400
MARTIN 1250 5050
TURNER 1500 3700
WARD 1250 4000
The partition clause makes the SUM (sal) be computed within each department, independent of the other groups. Tthe SUM (sal) is ' reset ' as the department changes. The ORDER BY ENAME clause sorts the data within each department by ENAME; this allows the window clause: ROWS 2 PRECEDING, to access the 2 rows prior to the current row in a group in order to sum the salaries.划分语句使SUM (sal)在各部门内进行,并独立于其他组。当部门改变时,SUM (sal) 也被“重置”。ORDER BY ENAME语句通过ENAME排列各部门的数据。这使得窗口语句:ROWS 2 PRECEDING获取该分组中当前行之前两行的数据以计算合计工资。
For example, if you note the SLIDING TOTAL value for SMITH is 6 7 7 5, which is the sum of 800, 3000, and 2975. That was simply SMITH's row plus the salary from the preceding two rows in the window.例如,SMITH的SLIDING TOTAL(滑动合计)6 7 7 5是800、3000以及2975的和。即窗口中SMITH行及其之前两行工资的简单相加。
Range Windows范围窗口
Range windows collect rows together based on a WHERE clause. If I say ' range 5 preceding ' for example, this will generate a sliding window that has the set of all preceding rows in the group such that they are within 5 units of the current row. These units may either be numeric comparisons or date comparisons and it is not valid to use RANGE with datatypes other than numbers and dates.范围窗口根据WHERE语句对行进行收集。例如“之前5”将会生成一个滑动视窗,包括该分组中当前行之前的5个单位所有行。这些单位可以是数值或日期,使用数字或日期以外的其他数据类型表示的范围无效。
Example例
Count the employees which where hired within the last 100 days preceding the own hiredate. The range window goes back 100 days from the current row's hiredate and then counts the rows within this range. The solution ist to use the following window specification:计算当前雇佣日期之前100天内雇佣的员工的数量。范围窗口返回当前行雇佣日期100天之前并在这个范围内计算行数。计算使用以下窗口规格:
COUNT(*) OVER (ORDER BY hiredate ASC RANGE 100 PRECEDING)
column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column cnt heading "Cnt" format 99
SELECT ename, hiredate, hiredate-100 hiredate_pre,
COUNT(*)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) cnt
FROM emp
ORDER BY hiredate ASC
/
Name Hired Hired-100 Cnt
-------- ---------- ---------- ---
SMITH 17-DEC-80 08-SEP-80 1
ALLEN 20-FEB-81 12-NOV-80 2
WARD 22-FEB-81 14-NOV-80 3
JONES 02-APR-81 23-DEC-80 3
BLAKE 01-MAY-81 21-JAN-81 4
CLARK 09-JUN-81 01-MAR-81 3
TURNER 08-SEP-81 31-MAY-81 2
MARTIN 28-SEP-81 20-JUN-81 2
KING 17-NOV-81 09-AUG-81 3
JAMES 03-DEC-81 25-AUG-81 5
FORD 03-DEC-81 25-AUG-81 5
MILLER 23-JAN-82 15-OCT-81 4
SCOTT 09-DEC-82 31-AUG-82 1
ADAMS 12-JAN-83 04-OCT-82 2
We ordered the single partition by hiredate ASC. If we look for example at the row for CLARK we can see that his hiredate was 09-JUN-81, and 100 days prior to that is the date 01-MAR-81. If we look who was hired between 01-MAR-81 and 09-JUN-81, we find JONES (hired: 02-APR-81) and BLAKE (hired: 01-MAY-81). This are 3 rows including the current row, this is what we see in the column "Cnt" of CLARK's row.根据雇佣日期ASC对每个划分进行排序。例中CLARK行可看到其雇佣日期为1981年6月9日,100天之前是1981年3月1日,看看在这期间雇佣的员工,会发现JONES(雇佣日期:1981年4月2日)、BLAKE(雇佣日期:1981年5月1日),共3行,包括当前行,在CLARK行“Cnt”列中。
Compute average salary for defined range计算定义范围的平均工资
As an example, compute the average salary of people hired within 100 days before for each employee. The query looks like this:例如,计算每个员工雇佣之前100天内雇佣员工的平均工资。查询如下:
column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column avg_sal heading "Avg-100" format 999999
SELECT ename, hiredate, sal,
AVG(sal)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) avg_sal
FROM emp
ORDER BY hiredate ASC
/
Name Hired SAL Avg-100
-------- ---------- ---------- -------
SMITH 17-DEC-80 800 800
ALLEN 20-FEB-81 1600 1200
WARD 22-FEB-81 1250 1217
JONES 02-APR-81 2975 1942
BLAKE 01-MAY-81 2850 2169
CLARK 09-JUN-81 2450 2758
TURNER 08-SEP-81 1500 1975
MARTIN 28-SEP-81 1250 1375
KING 17-NOV-81 5000 2583
JAMES 03-DEC-81 950 2340
FORD 03-DEC-81 3000 2340
MILLER 23-JAN-82 1300 2563
SCOTT 09-DEC-82 3000 3000
ADAMS 12-JAN-83 1100 2050
Look at CLARK again, since we understand his range window within the group. We can see that the average salary of 2758 is equal to (2975+2850+2450)/3. This is the average of the salaries for CLARK and the rows preceding CLARK, those of JONES and BLAKE. The data must be sorted in ascending order.再看看CLARK,我们已知道他在本组中的范围窗口,可以看到平均工资2758由(2975+2850+2450)/3得来,是CLARK行和其之前的JONES和BLAKE行工资的平均数。数据必须按由小到大顺序排列。
Row Windows行窗口
Row Windows are physical units; physical number of rows, to include in the window. For example you can calculate the average salary of a given record with the (up to 5) employees hired before them or after them as follows:行窗口为实际单位,是包括在窗口中实际行数。例如可以计算一给定记录的平均工资,该记录包括其之前或之后雇佣的员工(至多5名),具体如下:
set numformat 9999
SELECT ename, hiredate, sal,
AVG(sal)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) AvgAsc,
COUNT(*)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) CntAsc,
AVG(sal)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) AvgDes,
COUNT(*)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) CntDes
FROM emp
ORDER BY hiredate
/
ENAME HIREDATE SAL AVGASC CNTASC AVGDES CNTDES
---------- --------- ----- ------ ------ ------ ------
SMITH 17-DEC-80 800 800 1 1988 6
ALLEN 20-FEB-81 1600 1200 2 2104 6
WARD 22-FEB-81 1250 1217 3 2046 6
JONES 02-APR-81 2975 1656 4 2671 6
BLAKE 01-MAY-81 2850 1895 5 2675 6
CLARK 09-JUN-81 2450 1988 6 2358 6
TURNER 08-SEP-81 1500 2104 6 2167 6
MARTIN 28-SEP-81 1250 2046 6 2417 6
KING 17-NOV-81 5000 2671 6 2392 6
JAMES 03-DEC-81 950 2333 6 1588 4
FORD 03-DEC-81 3000 2358 6 1870 5
MILLER 23-JAN-82 1300 2167 6 1800 3
SCOTT 09-DEC-82 3000 2417 6 2050 2
ADAMS 12-JAN-83 1100 2392 6 1100 1
The window consist of up to 6 rows, the current row and five rows " in front of " this row, where " in front of " is defined by the ORDER BY clause. With ROW partitions, we do not have the limitation of RANGE partition - the data may be of any type and the order by may include many columns. Notice, that we selected out a COUNT(*) as well. This is useful just to demonstrate how many rows went into making up a given average. We can see clearly that for ALLEN's record, the average salary computation for people hired before him used only 2 records whereas the computation for salaries of people hired after him used 6.该窗口中包括6行,现有行及此行“之前”的5行,其中“之前”由ORDER BY语句定义。对于ROW(行)的划分,不受RANGE(范围)划分的限制——数据可以是任何类型,order by可包括许多列。注意,也要选择COUNT(*),可以说明是多少行的平均值。从ALLEN记录可以清楚看到,他之前雇佣员工平均工资的计算使用了2个记录,他之后雇佣员工平均工资的计算使用了6个记录。
Accessing Rows Around Your Current Row访问当前行前后的行
Frequently you want to access data not only from the current row but the current row " in front of " or " behind " them. For example, let's say you need a report that shows, by department all of the employees; their hire date; how many days before was the last hire; how many days after was the next hire.我们常常不仅想访问当前行,还想访问“之前”或“之后”行中的数据。例如,某份报告需要表明各部门的所有员工、员工雇佣日期、距上一雇佣的天数、距下一雇佣的天数。
Using straight SQL this query would be difficult to write. Not only that but its performance would once again definitely be questionable. The approach I typically took in the past was either to "select a select" or write a PL/SQL function that would take some data from the current row and "find" the previous and next rows data. This worked,but introduce large overhead into both the development of the query and the run-time execution of the query.直接编写SQL会比较困难,其执行性能必然存在问题。过去我常用的方法是“select a select”或编写PL/SQL函数,从当前行得到数据,并“找到”之前以及之后行中的数据。这样可以达到目的,查询的开发与运行会带来很大开销。
Using analytic functions, this is easy and efficient to do.使用分析函数,简单易行且有效。
set echo on
column deptno format 99 heading Dep
column ename format a6 heading Ename
column hiredate heading Hired
column last_hire heading LastHired
column days_last heading DaysLast
column next_hire heading NextHire
column days_next heading NextDays
break on deptno skip 1
SELECT deptno, ename, hiredate,
LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) last_hire,
hiredate - LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) days_last,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) next_hire,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) - hiredate days_next
FROM emp
ORDER BY deptno, hiredate
/
Dep Ename Hired LastHired DaysLast NextHire NextDays
--- ------ --------- --------- -------- --------- --------
10 CLARK 09-JUN-81 17-NOV-81 161
KING 17-NOV-81 09-JUN-81 161 23-JAN-82 67
MILLER 23-JAN-82 17-NOV-81 67
20 SMITH 17-DEC-80 02-APR-81 106
JONES 02-APR-81 17-DEC-80 106 03-DEC-81 245
FORD 03-DEC-81 02-APR-81 245 09-DEC-82 371
SCOTT 09-DEC-82 03-DEC-81 371 12-JAN-83 34
ADAMS 12-JAN-83 09-DEC-82 34
30 ALLEN 20-FEB-81 22-FEB-81 2
WARD 22-FEB-81 20-FEB-81 2 01-MAY-81 68
BLAKE 01-MAY-81 22-FEB-81 68 08-SEP-81 130
TURNER 08-SEP-81 01-MAY-81 130 28-SEP-81 20
MARTIN 28-SEP-81 08-SEP-81 20 03-DEC-81 66
JAMES 03-DEC-81 28-SEP-81 66
The LEAD and LAG routines could be considered a way to "index into your partitioned group ". Using these functions you can access any individual row. Notice for example in the above printout, it shows that the record for KING includes the data (in bold red font) from the prior row (LAST HIRE) and the next row (NEXT-HIRE). We can access the fields in records preceding or following the current record in an ordered partition easily. LEAD和LAG例程可看作对分组进行索引。使用这些函数可以访问任何一行。上面例子表明KING记录包括之前一行(上一雇佣)和下一行(下一)中的数据(加粗红色字体)。可以很容易的访问排序后的分组中当前记录之前或之后记录中的字段。
LAG
LAG ( value_expr [, offset] [, default] )
OVER ( [query_partition_clause] order_by_clause )
LAG provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position. LAG无须自合并就可以获取同一个表格中的多行记录。知道来自查询中的多行以及光标的位置,LEAD就可以进入位置指定区距之前的行。
If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the window. If you do not specify default, then its default value is null.如果不指定区距,默认值为1。如果区距超出窗口范围,则返回可指定默认值。如不指定默认值,则默认值为null。
The following example provides, for each person in the EMP table, the salary of the employee hired just before:下例中,EMP表中的每个人之前雇佣员工的工资:
SELECT ename,hiredate,sal,
LAG(sal, 1, 0)
OVER (ORDER BY hiredate) AS PrevSal
FROM emp
WHERE job = 'CLERK';
Ename Hired SAL PREVSAL
------ --------- ----- -------
SMITH 17-DEC-80 800 0
JAMES 03-DEC-81 950 800
MILLER 23-JAN-82 1300 950
ADAMS 12-JAN-83 1100 1300
LEAD
LEAD ( value_expr [, offset] [, default] )
OVER ( [query_partition_clause] order_by_clause )
LEAD provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset beyond that position. LEAD不用自合并就可同时进入一个表格中的多行。知道来自查询中的多行以及光标的位置,LEAD就可以进入位置指定区距之后的行。
If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the table. If you do not specify default, then its default value is null.如不指定区距,则默认值为1。如区距超出窗口范围则返回任意默认值。如不指定默认值,默认值为0。
The following example provides, for each employee in the EMP table, the hire date of the employee hired just after:下例,EMP表中的每个员工,他们之后雇佣员工的雇佣日期:
SELECT ename, hiredate,
LEAD(hiredate, 1)
OVER (ORDER BY hiredate) AS NextHired
FROM emp WHERE deptno = 30;
Ename Hired NEXTHIRED
------ --------- ---------
ALLEN 20-FEB-81 22-FEB-81
WARD 22-FEB-81 01-MAY-81
BLAKE 01-MAY-81 08-SEP-81
TURNER 08-SEP-81 28-SEP-81
MARTIN 28-SEP-81 03-DEC-81
JAMES 03-DEC-81
Determine the First Value / Last Value of a Group确定组中的第一个值/最后一个值
The FIRST_VALUE and LAST_VALUE functions allow you to select the first and last rows from a group. These rows are especially valuable because they are often used as the baselines in calculations. FIRST_VALUE与LAST_VALUE函数可以选择一组中的第一行和最后一行。这些行很有用,因为它们经常用作计算的基线。
Example例
The following example selects, for each employee in each department, the name of the employee with the lowest salary. 下例为每个部门的每名员工、最高工资员工的姓名。
break on deptno skip 1
SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (PARTITION BY deptno
ORDER BY sal ASC) AS MIN_SAL_HAS
FROM emp
ORDER BY deptno, ename;
DEPTNO ENAME SAL MIN_SAL_HAS
---------- ---------- ---------- -----------
10 CLARK 2450 MILLER
KING 5000 MILLER
MILLER 1300 MILLER
20 ADAMS 1100 SMITH
FORD 3000 SMITH
JONES 2975 SMITH
SCOTT 3000 SMITH
SMITH 800 SMITH
30 ALLEN 1600 JAMES
BLAKE 2850 JAMES
JAMES 950 JAMES
MARTIN 1250 JAMES
TURNER 1500 JAMES
WARD 1250 JAMES
The following example selects, for each employee in each department, the name of the employee with the highest salary.下例中为每个部门的每名员工、最高工资员工的姓名。
SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (PARTITION BY deptno
ORDER BY sal DESC) AS MAX_SAL_HAS
FROM emp
ORDER BY deptno, ename;
DEPTNO ENAME SAL MAX_SAL_HAS
---------- ---------- ---------- -----_-----
10 CLARK 2450 KING
KING 5000 KING
MILLER 1300 KING
20 ADAMS 1100 FORD
FORD 3000 FORD
JONES 2975 FORD
SCOTT 3000 FORD
SMITH 800 FORD
30 ALLEN 1600 BLAKE
BLAKE 2850 BLAKE
JAMES 950 BLAKE
MARTIN 1250 BLAKE
TURNER 1500 BLAKE
WARD 1250 BLAKE
The following example selects, for each employee in department 30 the name of the employee with the lowest salary using an inline view下例为第30个部门中的每名员工、工资最低员工的姓名,使用内联视图。
SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (ORDER BY sal ASC) AS MIN_SAL_HAS
FROM (SELECT * FROM emp WHERE deptno = 30)
DEPTNO ENAME SAL MIN_SAL_HAS
---------- ---------- ---------- -----------
30 JAMES 950 JAMES
MARTIN 1250 JAMES
WARD 1250 JAMES
TURNER 1500 JAMES
ALLEN 1600 JAMES
BLAKE 2850 JAMES
Crosstab or Pivot Queries交叉表或Pivot查询
A crosstab query, sometimes known as a pivot query, groups your data in a slightly different way from those we have seen hitherto. A crosstab query can be used to get a result with three rows (one for each project), with each row having three columns (the first listing the projects and then one column for each year) -- like this:交叉表查询,或者pivot查询,用稍微不同的方法将这些数据分组。交叉表查询可根据三行(一行代表一个项目),每行有三列(第一列列出项目,然后一列代表一年)得出结果——如下:
Project 2001 2002
ID CHF CHF
-------------------------------
100 123.00 234.50
200 543.00 230.00
300 238.00 120.50
Example例
Let's say you want to show the top 3 salary earners in each department as columns. The query needs to return exactly 1 row per department and the row would have 4 columns. The DEPTNO, the name of the highest paid employee in the department, the name of the next highest paid, and so on. Using analytic functions this almost easy, without analytic functions this was virtually impossible.如果需要以列的形式显示每个部门中工资最多的3个人,查询为每个部门返回一行,一行有4列。DEPTNO,该部门中工资最高的人,工资第二高人的名字,依次类推。使用分析函数很容易做到,不采用有分析函数这将是不可能的。
SELECT deptno,
MAX(DECODE(seq,1,ename,null)) first,
MAX(DECODE(seq,2,ename,null)) second,
MAX(DECODE(seq,3,ename,null)) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
GROUP BY deptno
/
DEPTNO FIRST SECOND THIRD
---------- ---------- ---------- ----------
10 KING CLARK MILLER
20 SCOTT FORD JONES
30 BLAKE ALLEN TURNER
Note the inner query, that assigned a sequence (RowNr) to each employee by department number in order of salary.注意内查询,它按照工资高低根据部门为每名员工赋予一个序列(RowNr)。
SELECT deptno, ename, sal,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) RowNr
FROM emp;
DEPTNO ENAME SAL ROWNR
---------- ---------- ---------- ----------
10 KING 5000 1
10 CLARK 2450 2
10 MILLER 1300 3
20 SCOTT 3000 1
20 FORD 3000 2
20 JONES 2975 3
20 ADAMS 1100 4
20 SMITH 800 5
30 BLAKE 2850 1
30 ALLEN 1600 2
30 TURNER 1500 3
30 WARD 1250 4
30 MARTIN 1250 5
30 JAMES 950 6
The DECODE in the outer query keeps only rows with sequences 1, 2 or 3 and assigns them to the correct "column". The GROUP BY gets rid of the redundant rows and we are left with our collapsed result. It may be easier to understand if you see the resultset without the aggregate function MAX grouped by deptno. 外查询中DECODE只保留含有序列1、2或3的记录行,并将它们分派到正确的“列”。GROUP BY去掉多余行,只剩下压缩结果。如果没有根据deptno分组的合计函数MAX,结果会更容易理解。
SELECT deptno,
DECODE(seq,1,ename,null) first,
DECODE(seq,2,ename,null) second,
DECODE(seq,3,ename,null) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
/
DEPTNO FIRST SECOND THIRD
---------- ---------- ---------- ----------
10 KING
10 CLARK
10 MILLER
20 SCOTT
20 FORD
20 JONES
30 BLAKE
30 ALLEN
30 TURNER
The MAX aggregate function will be applied by the GROUP BY column DEPTNO. In any given DEPTNO above only one row will have a non-null value for FIRST, the remaining rows in that group will always be NULL. The MAX function will pick out the non-null row and keep that for us. Hence, the group by and MAX will collapse our resultset, removing the NULL values from it and giving us what we want.通过GROUP BY列DEPTNO(按DEPTNO分组列)应用MAX合计函数。在上面任意给出的DEPTNO中,只有一行将对FIRST有非null值,其余行将永远是NULL。MAX函数将选出非NULL值并保存。因此,group by和MAX将会压缩结果集,去掉NULL值并给出我们想要的值。
Conclusion结论
This new set of functionality holds some exiting possibilities. It opens up a whole new way of looking at the data. It will remove a lot of procedural code and complex or inefficient queries that would have taken a long tome to develop, to achieve the same result.新的函数集包括了一些现有的可能性,它开辟处理数据的一种全新方法,减少过程代码以及浪费大量时间开发的复杂或低效查询,实现相同的结果。
Links and Documents链接和参考文档
Further articles about Analytic Functions can be found in:有关分析函数的更多文章请看:
o Oracle9i SQL Reference Release 1 (9.0.1) Oracle9i SQL参考版本1(9.0.1)。
o Oracle9i Database Concepts Release 1 (9.0.1) Oracle9i数据库概念版本1(9.0.1)。
o Oracle9i Data Warehousing Guide Release 1 (9.0.1) Oracle9i数据仓储指南版本1(9.0.1)。