Ratio_to_report
(page 259)
The analytic function ratio_to_report calculates the ratio of a value to the sum of values in the data
partition. If the partitioning clause is not specified, this function calculates the ratio of a value to the sum
values in the whole result set. This analytic function is very useful in calculating ratios at various levels
without a need for self-joins.
分析函数ratio_to_report计算在数据分区中对汇总的比率值。如果分区子句没有指定,函数在整个结果集上计算对汇总的比率值。 该分析函数对计算各个层次上的比率,而不需要自连接,是很有用的。
Ratio_to_report is useful in computing the percentage of a value compared to the total value in a
report. For example, consider a sales report of a product in a retail chain.
Each outlet in the retail chain
contributed to the total sum of sales computed for that product , and knowing what percentage of sales is generated from an outlet is quite useful for market trend analysis. Ratio_to_report allows you to
compute the percentage easily. Further, this ratio can be calculated at various levels such as district,
region, and country.
Essentially, data can be diced and sliced at various ways for market trend analysis.
Ratio_to_report 在报告中计算某数值相对于总数的百分比是有用的。 例如,考虑一零售链上某产品的销售报告。
在零售链中的每一分店某款产品的销售额汇总成了那款产品的总销售金额 ,知道某一分店的销售额百分比,对于市场趋势分析是非常有用的。Ratio_to_report使得你计算百分比非常容易。再者,这个比率能在各种层次上计算,诸如district(地区),region(区域), and country(国家).
本质上,数据可以用各种方式切割 以利于市场趋势分析。
In the Listing 8-14, the SQL statement is computing two ratios: Sales_ratio_yr is computed at
product, country, region, and year level and the ratio sales_ratio_prod is computed at the product,
country, and region level. Ratio_to_report function returns a ratio and it is multipled by 100 to compute
percentage.
在列表8-14,SQL语句计算两个比率值:Sales_ratio_yr 计算product, country, region,和 year 层的,sales_ratio_prod计算product, country, 和region 层次的。Ratio_to_report函数返回比率值再乘以100得到百分比。
The ratio_to_report(sale) over( partition by product, country, region, year) clause
calculates the ratio of Sale column value to the sum of Sale column values in a data partition, partitioned
by the columns Product, Country, Region, and Year. The next clause ratio_to_report(sale) over(
partition by product, country, region) is different as the Year column is not included in the
partitioning columns. So, the ratio is calculated for all years.
ratio_to_report(sale) over( partition by product, country, region, year) 子句在数据分区中计算Sale列值对汇总Sale列值的比率,按列Product, Country, Region, 和Year分区。下一个子句 ratio_to_report(sale) over( partition by product, country, region) 的区别是Year列没有包含在数据分区中。这样计算的是在所有年份中的比率。
The ratio_to_report function will return a null value if the expression or column specified in the
function returns null values. But other null values in the data partition will be handled as either zero
values or empty strings, similar to aggregation functions.
如果在函数中指定的表达式或列返回null值则 ratio_to_report 函数将返回null值。但是在数据分区中的其他的null值将被处理成要么0值要么空串,类似于聚合函数。
Listing 8-14. ratio_to_report Function
1 select year, week,sale,
2
trunc(100*
3
ratio_to_report(sale) over(partition by product, country, region ,year)
4
,2) sales_yr,
5
trunc(100*
6
ratio_to_report(sale) over(partition by product, country, region)
7
,2) sales_prod
8 from sales_fact
9 where country in ('Australia') and product ='Xtend Memory'
10 order by product, country,year, week
/
YEAR WEEK SALE SALES_YR SALES_PROD
---- ---- ---------- ---------- ----------
2000 1 46.70 1.31 .35
2000 3 93.41 2.63 .7
2000 4 46.54 1.31 .35
2000 5 46.70 1.31 .35
2000 7 70.80 2 .53
2000 8 46.54 1.31 .35
...
Percent_rank
The percent_rank function returns the rank of a value in a data partition, expressed as a fraction between 0 and 1. Percent_rank is calculated as (rank -1)/(N-1) whereas N is the number of elements in the data partition if the partitioning clause is specified or the total number of rows in the result set if the
partitioning clause is not specified. The percent_rank function is useful to compute the relative standing
of a value in a result set, as a percentile.
percent_rank 返回数据分区中的秩(rank)值,用一个0到1之间分数表示。percent_rank的计算公式是(rank -1)/(N-1),这里的N,如果分区子句指定了则是数据分区中的元素数量,若没有指定分区子句则是结果集中的所有行数。 percent_rank函数 在计算一个结果集中的值的相对位置时是有用的,用百分率表示。
This relative rank can be calculated relative to a partition or the whole result set. For example,
computing the sales percentile of a retail outlet in a district or region helps find the top performing
outlets or the worst performing outlet.
相对秩能相对于分区或整个结果集来计算。例如,计算在一地区或区域某零售商店的销售百分比有助于找出业绩最佳和最差的商店。
In Listing 8-15, you calculate the top fifty sale percentile by year using the percent_rank function.
The clause percent_rank() over(partition by product, country, region , year order by sale desc)
calculates the percent rank of the Sale column in a data partition defined by the partitioning columns
Product, Country, Region, and Year. Rows are ordered by the Sale column in descending order. Function
output is multiplied by 100 to compute percentile.
在列表8-15中,你使用percent_rank函数按年计算前50名销售额的百分比值。子句percent_rank() over(partition by product, country, region , year order by sale desc) 计算分区列所定义数据分区中Sale列的百分比秩。
Listing 8-15. percent_rank Function
1 select * from (
2 select year, week,sale,
3
100 * percent_rank() over(
4
partition by product, country, region, year
5
order by sale desc
6
) pr
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9 ) where pr <50
10* order by year, sale desc
YEAR WEEK SALE PR
----- ---- ---------- -------
2001 16 278.44 .00
2001 4 256.70 2.27
2001 21 233.70 4.55
2001 48 182.96 6.82
...
Percentile_cont
The percentile_cont function is useful to compute the interpolated values, such as the median
household income per region or city.
The percentile_cont function takes a probability value between 0
and 1 and returns an interpolated percentile value that would equal the percent_rank value with respect
to the sort specification. In fact, percentile_cont function performs inverse of percent_rank function
and it is easier to understand the percentile_cont function in conjunction with the output of
percent_rank function.
percentile_cont函数用于计算内插值,诸如每一区域或城市家庭收入的中位数。percentile_cont函数接收一个0到1的概率值, 返回一内插的百分比值,它等于percent_rank相对于排序规范的值。事实上,percentile_cont函数是 percent_rank函数的倒数,结合percent_rank函数的输出就容易理解percentile_cont函数了。
The percentile_cont function retrieves the column value matching (or interpolated) with the
percent_rank of the argument. For example, the clause percentile_cont(0.25) retrieves the value that
has percent_rank of 0.25, assuming matching sort order for these two functions. Another example is
computing the median household income in a city or region.
The median value will have a percent_rank
of 0.5 by the definition of median value. The clause percentile_cont(0.5) will return the median value
as the percentile_cont function is calculating the value with a percent_rank of 0.5.
In fact, median
function is a specific case of the percentile_cont function with a default value of 0.5.
percentile_cont函数检索匹配(或者差值替换)percent_rank参数的列值。例如,子句 percentile_cont(0.25) 检索percent_rank为0.25的值,假定这两个函数的排序顺序一致。另一例子是计算某一城市或区域家庭收入的中位数。
依照中位数值的定义中位数对应的percent_rank值是0.5。 子句 percentile_cont(0.5) 将返回中位数,因为percentile_cont函数计算的是percent_rank为0.5的值。
事实上,median函数是 percentile_cout 函数默认值为0.5的特例。
Nulls are ignored by the function. This function does not support windowing clauses either.
percentile_cont函数忽略Null。该函数也不支持开窗子句。
Syntax for the percentile_cont function is: percentile_cont 函数的语法是:
Percentile_cont(expr) within group (sort-clause)
over (partition-clause order-by-clause)
The syntax for the percentile_cont function is slightly different from the analytic functions
discussed so far.
A new clause within group (order by sale desc) replaces the order-by clause, and it
is functionally same as specifying an order-by clause. In Listing 8-16, the clause percentile_cont (0.5)
within group (order by sale desc) over( partition by product, country, region , year) is calling
the percentile_cont function and passing a probability value of 0.5. Sort order is defined by the clause
within group (order by sale desc) . The partition-by clause over( partition by product, country,
region , year) is specifying the partitioning columns.
percentile_cont函数的句法不同于现在为止所讨论的那些分析函数。
新子句group (order by sale desc)代替了order-by子句,且它的功能和order-by子句一样。 列表8-16中,子句percentile_cont (0.5) within group (order by sale desc) over( partition by product, country, region , year) 调用percentile_cont函数传入概率值0.5。排序顺序由子句group (order by sale desc) 定义。分区(partition-by)子句 over( partition by product, country, region , year) 指定分区列。
Listing 8-16 shows the output of percent_rank in a side-by-side comparison to that from
percentile_cont , with a similar partition-by clause and order-by clause. Notice that for the column
values year=2001 and week=5, the Sale column value is 93.44 and the percent_rank of that value is 0.5.
Essentially, value of 93.44 is occurring with a percent_rank of 0.5 in the descending order of Sale column
values in that data partition. In a nutshell, the value of 93.44 is a median value and thus the percent_rank is 0.5. Hence the percent_rank function with an argument of 0.5 returns a value of 93.44.
列表8-16展示了,带有相似分区(partition-by)子句和排序(order-by)子句的,percent_rank一对一 percentile_cont 的输出。注意列值year=2001 and week=5,Sale列值是93.44,percent_rank值是0.5 。实质上,值93.44出现在那个数据分区降序排列的Sale列值的percent_rank为0.5处。简而言之,值93.44是中位数且 percent_rank是0.5。因此percent_rank函数带有参数0.5返回的是值93.44 。
Further, note that the output row for the column values with year=2000. There is no Sale column
value with a percent_rank matching exactly to 0.5 in the data partition.
If there is no value matching
exactly, then the percentile_cont function computes an interpolated value using the nearest values.
Note that there is a row in that data partition with a percent_rank of 0.48 for the Sale column value of
79.36 and the next row in that sort order has a percent_rank of 0.51 for the Sale column value of 78.82.
Since the specified percent_rank of 0.5 is between 0.48 and 0.51, the percentile_cont function
interpolated these two corresponding Sale column values 79.36 and 78.82 and calculated
percentile_cont (0.5) as 79.09, an average of those two Sale column values.
Values are averaged as this
function assumes continuous distribution.
还有,注意列值为year=2000所在输出行。在数据分区中没有percent_rank准确匹配0.5的Sale列值。
如果没有准确的匹配值,则 percentile_cout函数计算一个最接近的内插值。 注意在数据分区中有一行的percent_rank值是0.48和Sale列值为79.36 且顺序的下一行的percent_rank值是0.51和Sale列值是78.82。因此指定的percent_rank值0.5介于0.48和0.51 之间,percentile_cont函数内插于这两者对应的Sale列值79.36和78.82中且计算percentile_cont (0.5) 是79.09,这两个Sale列值的平均值。
因为函数假定是连续分布所以取的平均值。
Notice that output rows are not sorted in the Listing 8-16.
Reason is that, even though there is an
order-by clause specified in the analytic function specification (line 3 and line 7), there is no order-by
clause in the main body of the query.
Should you need rows to be sorted, you need to specify sorting
order explicitly in the main body of the query also.
注意在列表8-16中输出行没有排序。
因为,即使在分析函数规范中指定order-by子句(行3和行7),在查询的主体中没有order-by子句。如果你需要行有序,就需要在查询主体中明确的指定排序顺序。
Listing 8-16. The percentile_cont Function
1 select year, week,sale,
2
percentile_cont (0.5) within group
3
(order by sale desc)
4
over( partition by product, country, region , year ) pc,
5
percent_rank () over (
6
partition by product, country, region , year
7
order by sale desc ) pr
8 from sales_fact
9* where country in ('Australia') and product ='Xtend Memory'
YEAR WEEK SALE PC PR
---- ---- ---------- ---------- ----------
...
2001 27 94.48 93.44 .454545455
2001 46 93.58 93.44 .477272727
2001 5 93.44 93.44 .5
2001 37 93.16 93.44 .522727273
2001 9 92.67 93.44 .545454545
...
2000 40 89.56 79.09 .435897436
2000 28 88.96 79.09 .461538462
2000 38 79.36 79.09 .487179487
2000 35 78.82 79.09 .512820513
2000 7 70.80 79.09 .538461538
2000 15 70.47 79.09 .564102564
...
Percentile_disc
The percentile_disc function is functionally similar to percentile_cont except that the percentile_cont function uses a continuous distribution model and the percentile_disc function assumes a discrete distribution model. As discussed in the earlier section, when there is no value matching exactly with the specified percent_rank, then the percentile_cont(0.5) computes an average of two nearest values. In contrast, the percentile_disc function retrieves the value with a percent_rank just greater than the passed argument, in the case of ascending order. In the case of descending order, the percentile_cont function retrieves the value that has a percent_rank just smaller than the passed argument.
percentile_dise函数在功能上相似于percentile_cont函数,只不过percentile_cont函数使用连续分布模型而 percentile_disc函数假定离散分布模型。 正如前面一节讨论的,当没有值准确的匹配指定的percent_rank,则 percentile_cont(0.5)计算两个最接近的平均值。相比而言,若升序percentile_disc函数检索其percent_rank 大于传入参数的值。若降序,percentile_disc函数检索其percent_rank小于传入参数的值。
In Listing 8-17, the percentile_cont function is replaced by two calls to the percentile_disc function. The first call to the function starting in line 2 specifies decending sort order, and the next call in line 4 specifies no sort order, so it defaults to ascending sort order. In both calls to the percentile_disc function, and argument of 0.5 is passed. as there is no row with a percent_rank of 0.5, the percentile_disc function with the descending sort order specification returns a value of 79.36, as this value has a percent_rank of 0.48 just below the specified argument 0.5. For the ascending order, this function return a value of 78.82 as this value has a percent_rank of 0.51 ---- just above 0.5 .
在列表8-17中,percentile_cont 函数被两次percentile_disc函数的调用所取代。 第一次函数调用开始于第2行,指定为降序。下一次调用在第4行没有指定排序顺序,所以取默认 值升序排列。两次percentile_disc函数调用,传入的参数都是0.5. 因为没有任何行的percent_rank为0.5,指定降序排列的percentile_disc函数返回值79.36,因为这个值的 percent_rank是0.48低于指定的参数0.5。对于升序,函数返回值78.82该值的percent_rank是0.51 ---- 正好高于0.5 。
Listing 8-17. The percentile_disc Function
1 select year, week, sale,
2 percentile_disc(0.5) within group( order by sale desc )
3 over( partition by product, country, region, year) pd_desc,
4 percentile_disc(0.5) within group( order by sale )
5 over( partition by product, country, region, year) pd_asc,
6 percent_rank() over(
7 partition by product, country, region, year
8 order by sale desc ) pr
9 from sales_fact
10* where country in ('Australia') and product = 'Xtend Memory'
YEAR WEEK SALE PD_DESC PD_ASC PR
---- ---- ---------- ---------- ---------- ----------
2000 3 93.41 79.36 78.82 .41025641
2000 40 89.56 79.36 78.82 .435897436
2000 28 88.96 79.36 78.82 .461538462
2000 38 79.36 79.36 78.82 .487179487
2000 35 78.82 79.36 78.82 .512820513
2000 7 70.80 79.36 78.82 .538461538
2000 15 70.47 79.36 78.82 .564102564
2000 45 67.62 79.36 78.82 .58974359
2000 52 67.45 79.36 78.82 .615384615