NTILE (page 263)
The NTILE function divides an ordered set of rows in a data partition, groups them in to buckets , and
assigns a unique group number to each group. This function is useful in statistical analysis. For example,
if you want to remove the outliers (values that are outside the norm), you can group them in the top or
bottom buckets and eliminate those values from the statistical analysis. Oracle Database statistics
collection packages also use NTILE functions to calculate histogram boundaries. In statistical
terminology, the NTILE function creates equi-width histograms.
NTILE函数在一数据分区中划分一有序行集,将他们分组成 桶 ,再赋予每一组唯一的组号。该函数在统计分析中很有用。 例如,如果你想要去除极端值(超出正常范围的值),你能把它们归组到头或者末位桶,再从统计分析的数据中把它们清除掉。Oracle数据库统计集合包也使用NTILE函数计算柱状图边界。用统计术语说,NTILE函数创建等宽柱状图。
The number of buckets is passed as the argument to this analytic function. For example, ntile(100)
will group the rows into 100 buckets, assigning an unique number for each bucket. This function does
not support windowing clauses, however.
桶的数量作为参数传给该分析函数。例如,ntile(100)将把行集分成100桶,赋予每桶唯一的数。然而,该函数不支持开窗子句。
In the Listing 8-18, you split a data partition into 10 buckets using the clause ntile (10). Rows are
sorted by the Sale column in the descending order. The NTILE function groups rows into buckets with
each bucket containing equal number of rows. Since the rows are sorted by the Sale column values in
descending order, rows with lower group numbers have higher Sale column value. Outliers in the data
can be easily removed with this technique.
在列表8-18中,你使用子句ntile(10)划分数据区成10桶。行集按Sale列降序排列。NTILE函数将行集归组成桶,使得各桶包含相等数量的行。由于行集按Sale列降序排列,低组号的行集对应的Sale列值高。使用这种技术数据区中的极端值容易被去除。
There may be a row count difference of at most 1 between the buckets if the rows can not be divided
equally. In this example, rows for the year= 2001 is divided in to 10 buckets, each bucket having 5 rows,
but the last bucket 10 has only 4 rows.
如果行集不能均匀划分可能使得桶之间的行数最多相差1。 在本例中,year=2001的行集划分成10桶,每桶有5行,但是最后第10桶只有4行。
Listing 8-18. NTILE Function
1 select year, week,sale,
2 ntile (10) over(
3 partition by product, country, region , year
4 order by sale desc
5 ) group#
6 from sales_fact
7* where country in ('Australia') and product ='Xtend Memory'
YEAR WEEK SALE GROUP#
----- ---- ---------- ----------
2001 16 278.44 1
2001 4 256.70 1
2001 21 233.70 1
2001 48 182.96 1
2001 14 162.91 1
...
2001 52 23.14 9
2001 50 23.14 10
2001 6 22.44 10
2001 23 22.38 10
2001 18 22.37 10
The NTILE function is useful in real world applications such as dividing total work among N parallel
processes. Let’s say you have ten parallel processes; you can divide the total work into 10 buckets and
assign each bucket to a process.
NTILE 函数在现实应用中很有用,诸如划分整体工作成N个并发的进程。我们说你有十个并发的进程;你可划分整个工作成10个桶然后赋予每个桶一个进程。
Stddev
The stddev function can be used to calculate standard deviation among a set of rows in a data partition or in the result set if no partitioning clause is specified. This function calculates the standard deviation, defined as square root of variance, for a data partition specified using a partitioning clause. If
partitioning clause is not specified, this function calculates the stddev for all rows in the result set.
stddev能被用于 计算 在一数据分区,或者若没有指定分区子句则在结果集,的行集中的标准差。 对于一使用分区子句指定的数据分区,该函数计算标准差,定义平方根的变化 。如果分区子句没有指定,该函数计算结果集所有行的标准差。
In the Listing 8-19, the clause stddev (sale) is calculating the stddev on Sale column among the
rows in a data partition. Partitioning clause partition by product, country, region, year specifies
the partitioning columns. The windowing clause rows between unbounded preceding and unbounded
following specifies the window as all rows in that data partition. Essentially, this SQL is calculating the
standard deviation on Sale column amongst all rows in a data partition.
在列表8-19中,子句stdev(sale)计算在一数据分区行集中Sale列的标准差。分区子句 partition by product, country, region, year 指定分区列。开窗子句 rows between unbounded preceding and unbounded following指定那个数据分区的所有行为窗口。本质上,这条SQL计算在数据分区上所有行间Sale列的标准差。
Standard deviation can be calculated at coarser or granular level by specifying appropriate
partition-by clause and windowing clause.
能够通过指定适当的分区子句和开窗子句在粗粒度 或细粒度 层上计算标准差。
Listing 8-19. STDDEV Function
1 select year, week,sale,
2 stddev (sale) over(
3 partition by product, country, region , year
4 order by Sale desc
5 rows between unbounded preceding and unbounded following
6 ) stddv
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9* order by year, week
YEAR WEEK SALE STDDV
----- ---- ---------- ----------
...
2000 50 21.19 49.8657423
2000 52 67.45 49.8657423
2001 1 92.26 59.1063592
2001 2 118.38 59.1063592
2001 3 47.24 59.1063592
...
There are various other statistics functions that can be used to calculate statistical metrics; for
example, stddev_samp calculates the cumulative sample standard deviation , stddev_pop calculates the
population standard deviation , etc. Detailed discussion about various statistics functions is out of the
scope of this book, however.
有多种其他的统计函数能用来计算统计标准;例如stddev_samp计算 累积样本标准差 ,stddev_pop计算总体标准差 ,等等。然而,关于 各种统计函数的 详细讨论超出了本书的范围。
Listagg
Oracle Database version 11gR2 introduced another analytic function, the Listagg function, which is very useful in string manipulation. This analytic function provides the ability to convert column values from
multiple rows in to a list format. For example, if you want to concatenate all the employee names in a
department, then you can use this function to concatenate all names in to a list.
Oracle数据库版本11gR2 引入了另一个分析函数,Listagg函数,在字符串操作上十分有用。该分析函数提供把多行列值转换成一格式化列表的能力。 例如,如果你想连结某部门的所有员工名字,你就能用这个函数吧所有名字连成一列表。
Syntax for this function is of the format: 该函数的语法格式:
Listagg (string, separator ) within group (order-by-clause)
Over (partition-by-clause )
Syntax for the Listagg function uses the clause within group (order-by-clause) to specify sorting
order. This clause is similar to order-by clause in other analytic functions. The first argument to this
function is the string or column name to concatenate. The second argument is the separator for the
values. In the Listing 8-20, the partitioning clause is not specified and rows are ordered by the Country
column in the descending order. The output shows that country names are converted to a list separated
by comma.
Listagg 函数使用子句 within group (order-by-clause) 定义排序顺序。该子句类似于 其他分析函数中的order-by子句。该函数的第一个参数是要连结的字符串或者列名 。第二个参数是这些值的分隔符。 在列表8-20中,分区子句没有指定且行按照Country列降序排列。输出展示,国家名称被转换成由逗号分隔的列表了。
Note that Listagg function does not support windowing clauses.
注意Listagg函数也不支持开窗子句。
Listing 8-20. LISTAGG Function
1 select listagg (country, ',')
2 within group (order by country desc)
3 from (
4 select distinct country from sales_fact
5 order by country
6* )
LISTAGG(COUNTRY,',')WITHINGROUP(ORDERBYCOUNTRYDESC)
--------------------------------------------------------------
United States of America,United Kingdom,Turkey,Spain,Singapore,
Saudi Arabia,Poland,New Zealand, Japan,Italy,Germany,France,
Denmark,China,Canada,Brazil,Australia,Argentina