Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
|
|
Table 9-51. Ordered-Set Aggregate Functions
Function | Direct Argument Type(s) | Aggregated Argument Type(s) | Return Type | Description |
---|---|---|---|---|
mode() WITHIN GROUP (ORDER BYsort_expression) |
any sortable type | same as sort expression | returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results) | |
percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression) |
double precision | double precisionor interval | same as sort expression | continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed |
percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression) |
double precision[] | double precisionor interval | array of sort expression's type | multiple continuous percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the value corresponding to that percentile |
percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression) |
double precision | any sortable type | same as sort expression | discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction |
percentile_disc(fractions) WITHIN GROUP (ORDER BY sort_expression) |
double precision[] | any sortable type | array of sort expression's type | multiple discrete percentile: returns an array of results matching the shape of the fractionsparameter, with each non-null element replaced by the input value corresponding to that percentile |
All the aggregates listed in Table 9-51 ignore null values in their sorted input. For those that take a fraction parameter, the fraction value must be between 0 and 1; an error is thrown if not. However, a null fraction value simply produces a null result.
postgres=# create table test(id int, info text);CREATE TABLEpostgres=# insert into test values (1,'test1');INSERT 0 1postgres=# insert into test values (1,'test1');INSERT 0 1postgres=# insert into test values (1,'test2');INSERT 0 1postgres=# insert into test values (1,'test3');INSERT 0 1postgres=# insert into test values (2,'test1');INSERT 0 1postgres=# insert into test values (2,'test1');INSERT 0 1postgres=# insert into test values (2,'test1');INSERT 0 1postgres=# insert into test values (3,'test4');INSERT 0 1postgres=# insert into test values (3,'test4');INSERT 0 1postgres=# insert into test values (3,'test4');INSERT 0 1postgres=# insert into test values (3,'test4');INSERT 0 1postgres=# insert into test values (3,'test4');INSERT 0 1postgres=# select * from test;id | info----+-------1 | test11 | test11 | test21 | test32 | test12 | test12 | test13 | test43 | test43 | test43 | test43 | test4(12 rows)
postgres=# select mode() within group (order by info) from test;mode-------test1(1 row)
postgres=# select mode() within group (order by info) from test group by info;mode-------test1test2test3test4(4 rows)
postgres=# select mode() within group (order by info) from test group by id;mode-------test1test1test4(3 rows)
postgres=# select id,info,count(*) from test group by id,info;id | info | count----+-------+-------1 | test1 | 21 | test3 | 13 | test4 | 51 | test2 | 12 | test1 | 3(5 rows)
postgres=# select id,info,cnt from (select id,info,cnt,row_number() over(partition by id order by cnt desc) as rn from (select id,info,count(*) cnt from test group by id,info) t) t where t.rn=1;id | info | cnt----+-------+-----1 | test1 | 22 | test1 | 33 | test4 | 5(3 rows)
postgres=# select mode() within group (order by id) from test;mode------3(1 row)postgres=# select mode() within group (order by id+1) from test;mode------4(1 row)
src/backend/utils/adt/orderedsetaggs.cif (percentile < 0 || percentile > 1 || isnan(percentile))ereport(ERROR,(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),errmsg("percentile value %g is not between 0 and 1",percentile)));
postgres=# create table test(id int, info text);CREATE TABLEpostgres=# insert into test values (1,'test1');INSERT 0 1postgres=# insert into test values (2,'test2');INSERT 0 1postgres=# insert into test values (3,'test2');INSERT 0 1postgres=# insert into test values (4,'test2');INSERT 0 1postgres=# insert into test values (5,'test2');INSERT 0 1postgres=# insert into test values (6,'test2');INSERT 0 1postgres=# insert into test values (7,'test2');INSERT 0 1postgres=# insert into test values (8,'test3');INSERT 0 1postgres=# insert into test values (100,'test3');INSERT 0 1postgres=# insert into test values (1000,'test4');INSERT 0 1postgres=# select * from test;id | info------+-------1 | test12 | test23 | test24 | test25 | test26 | test27 | test28 | test3100 | test31000 | test4(10 rows)
postgres=# select percentile_cont(0.5) within group (order by id) from test;percentile_cont-----------------5.5(1 row)
If (CRN = FRN = RN) then the result is(value of expression from row at RN)Otherwise the result is(CRN - RN) * (value of expression for row at FRN) +(RN - FRN) * (value of expression for row at CRN)
N = 当前分组的行数 = 10RN = (1+传入参数*(N-1)) = (1+0.5*(10-1)) = 5.5CRN = ceiling(RN) = 6FRN = floor(RN) = 5value of expression for row at FRN : 当前分组内第FRN行的值 = 5value of expression for row at CRN : 当前分组内第CRN行的值 = 6所以最终中位数值 :(CRN - RN) * (value of expression for row at FRN) +(RN - FRN) * (value of expression for row at CRN) =(6-5.5)*(5) + (5.5 - 5)*(6) = 5.5;
postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;percentile_cont | info-----------------+-------1 | test14.5 | test254 | test31000 | test4(4 rows)
2 | test23 | test24 | test25 | test26 | test27 | test2N = 当前分组的行数 = 6RN = (1+传入参数*(N-1)) = (1+0.5*(6-1)) = 3.5CRN = ceiling(RN) = 4FRN = floor(RN) = 3value of expression for row at FRN : 当前分组内第FRN行的值 = 4value of expression for row at CRN : 当前分组内第CRN行的值 = 5所以最终中位数值 :(CRN - RN) * (value of expression for row at FRN) +(RN - FRN) * (value of expression for row at CRN) =(4-3.5)*(4) + (3.5 - 3)*(5) = 4.5;
postgres=# select percentile_cont(array[0.5, 1]) within group (order by id) from test;percentile_cont-----------------{5.5,1000}(1 row)
postgres=# select id from test;id------123456781001000(10 rows)
postgres=# select percentile_disc(0.5) within group (order by id) from test;percentile_disc-----------------5(1 row)postgres=# select percentile_disc(0.5) within group (order by id^2) from test;percentile_disc-----------------25(1 row)
postgres=# select percentile_disc(0.11) within group (order by id) from test;percentile_disc-----------------2(1 row)
postgres=# select id,info,count(*) over (partition by info) from test;id | info | count------+-------+-------1 | test1 | 12 | test2 | 63 | test2 | 64 | test2 | 65 | test2 | 66 | test2 | 67 | test2 | 68 | test3 | 2100 | test3 | 21000 | test4 | 1(10 rows)
postgres=# select info,percentile_disc(0.3) within group (order by id) from test group by info;info | percentile_disc-------+-----------------test1 | 1test2 | 3test3 | 8test4 | 1000(4 rows)
postgres=# select percentile_cont(0.5) within group (order by id^2),info from test group by info;percentile_cont | info-----------------+-------1 | test120.5 | test25032 | test31000000 | test4(4 rows)
postgres=# select percentile_cont(0.5) within group (order by id),info from test group by info;percentile_cont | info-----------------+-------1 | test14.5 | test254 | test31000 | test4(4 rows)
postgres=# select 4.5^2;?column?---------------------20.2500000000000000(1 row)
postgres=# select 54^2;?column?----------2916(1 row)
PERCENTILE_CONT函数解释 :The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, you can compute the row number you are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+(P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).
The final result will be:
If (CRN = FRN = RN) then the result is(value of expression from row at RN)Otherwise the result is(CRN - RN) * (value of expression for row at FRN) +(RN - FRN) * (value of expression for row at CRN)
PERCENTILE_DISC函数解释 :The first expr must evaluate to a numeric value between 0 and 1, because it is a percentile value. This expression must be constant within each aggregate group. The ORDER BY clause takes a single expression that can be of any type that can be sorted.
For a given percentile value P, PERCENTILE_DISC sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST value (with respect to the same sort specification) that is greater than or equal to P.
MEDIAN(中位数)详解, Oracle有单独的计算中位数的函数, 实际上就是PERCENTILE_CONT(0.5) :MEDIAN is an inverse distribution function that assumes a continuous distribution model. It takes a numeric or datetime value and returns the middle value or an interpolated value that would be the middle value once the values are sorted. Nulls are ignored in the calculation.
This function takes as arguments any numeric data type or any nonnumeric data type that can be implicitly converted to a numeric data type. If you specify only expr, then the function returns the same data type as the numeric data type of the argument. If you specify the OVER clause, then Oracle Database determines the argument with the highest numeric precedence, implicitly converts the remaining arguments to that data type, and returns that data type.The result of MEDIAN is computed by first ordering the rows. Using N as the number of rows in the group, Oracle calculates the row number (RN) of interest with the formula RN = (1 + (0.5*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).
The final result will be:
if (CRN = FRN = RN) then(value of expression from row at RN)else(CRN - RN) * (value of expression for row at FRN) +(RN - FRN) * (value of expression for row at CRN)