group by 用来在原始数据上创建聚合来将数据转化为有用的信息。
基本的group by 列出个个部门的名称,员工总数
select d.dname, count(empno) empcount
from scott.dept d
left outer join scott.emp e
on d.deptno = e.deptno
group by d.dname
order by d.dname;
select列表中的每一列必须包含在group by子句中。如果没有则会导致错误。如:
SQL> select d.dname, d.loc, count(empno) empcount
2 from scott.emp e
3 join scott.dept d
4 on d.deptno = e.deptno
5 group by d.dname;
select d.dname, d.loc, count(empno) empcount
*
第 1 行出现错误:
ORA-00979: 不是 GROUP BY 表达式
尽管包含group by子句的select语句输出看上去是按顺序列出的,你不能期待group by每次都排好序地返回数据,如果输出结果必须按照一定的顺序排列,则必须使用order by子句。
--没有排序的group by
select deptno,count(*)
from emp
group by deptno;
--复杂的sql
set serveroutput off;
--复杂的sql
select /* lst7-4 */
distinct dname,
decode(d.deptno,
1,
(select count(*) from emp where deptno = 1),
2,
(select count(*) from emp where deptno = 2),
3,
(select count(*) from emp where deptno = 3),
(select count(*) from emp where deptno not in (1, 2, 3))) dept_count
from (select distinct deptno from emp) d
join dept d2
on d.deptno = d.deptno;
@E:\bjc2016\study\pln lst7-4
上面的写法,会使SQL语句更加复杂难以理解并且难以维护。group by子句极大的简化必须写的sql语句以外,还消除了数据库不必要的IO。
set serveroutput off;
--复杂的sql
select /* lst7-5 */
distinct dname,
count(empno) empcount
from emp e
join dept d
on d.deptno = d.deptno
group by d.dname
order by d.dname;
@E:\bjc2016\study\pln lst7-5
group by 优点:
- 使sql语句更具有可读性
- 书写起来比使用很多相关子查询更简单
- 减少了重复访问同一个数据块的次数,从而得到更好的性能。
在分组之后,还在数据集上应用了having子句。另一方面,在获取数据行之后,进行分组之前,应用了where子句。having 子句中可以使用运算,函数及子查询。
SQL> --having子句
SQL> select /* lst7-6 */
2 d.dname, trunc(e.hiredate, 'yyyy') hiredate, count(empno) empcount
3 from emp e
4 join dept d
5 on e.deptno = e.deptno
6 group by d.dname, trunc(e.hiredate, 'yyyy')
7 having count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select min(hiredate)
8 from scott.emp) and (select max(hiredate)
9 from scott.emp)
10 order by d.dname;
未选定行
SQL> @E:\bjc2016\study\pln lst7-6
原值 8: WHERE UPPER(SQL_TEXT) LIKE '%&1%'
新值 8: WHERE UPPER(SQL_TEXT) LIKE '%lst7-6%'
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 0wcfknkztdxqt, child number 0
-------------------------------------
select /* lst7-6 */ d.dname, trunc(e.hiredate, 'yyyy') hiredate,
count(empno) empcount from emp e join dept d on e.deptno =
e.deptno group by d.dname, trunc(e.hiredate, 'yyyy') having
count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select
min(hiredate)
from scott.emp) and (select max(hiredate)
from
scott.emp) order by d.dname
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 239717969
---------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
|* 1 | FILTER | | | | | |
| 2 | SORT GROUP BY | | 1 | 2048 | 2048 | 2048 (0)|
| 3 | MERGE JOIN CARTESIAN| | 40 | | | |
| 4 | TABLE ACCESS FULL | DEPT | 4 | | | |
| 5 | BUFFER SORT | | 10 | 2048 | 2048 | 2048 (0)|
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 6 | TABLE ACCESS FULL | EMP | 10 | | | |
| 7 | SORT AGGREGATE | | 1 | | | |
| 8 | TABLE ACCESS FULL | EMP | 10 | | | |
| 9 | SORT AGGREGATE | | 1 | | | |
| 10 | TABLE ACCESS FULL | EMP | 10 | | | |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((COUNT(*)<=5 AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
fmyyyy')>= AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'fmyyyy')<=))
Note
-----
- Warning: basic plan statistics not available. These are only collected when
:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system leve
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
l
已选择41行。
SQL>
group by的cube扩展
当与group by子句一起使用时,将会使得对每一行都要考虑包含在cube的参数中的所有可能的元素组合。这个运算将会生成比表中实际存在的行数更多的数据行。
-- hr.emplyees表的cube运算
select last_name, first_name
from hr.employees
group by first_name, last_name;
set autotrace off;
set autotrace on statistics;
with emps as
(select /* lst-7 */
last_name, first_name
from hr.employees
group by cube(last_name, first_name))
select rownum, last_name, first_name from emps;
对于每一对last_name,first_name,cube将会按顺序为每个元素替换为null值。cube生成的数据行在Oracle文档中称为超级聚合行,可以在运算列中加入null值来识别。
SQL> set autotrace off;
SQL> --预测cube返回行数
SQL> with counts as
2 (select count(distinct first_name) first_name_count,
3 count(distinct last_name) last_name_count,
4 count(distinct(first_name || last_name)) full_name_count
5 from hr.employees)
6 select first_name_count,
7 last_name_count,
8 full_name_count,
9 first_name_count + last_name_count + full_name_count + 1 total_count
10 from counts;
FIRST_NAME_COUNT LAST_NAME_COUNT FULL_NAME_COUNT TOTAL_COUNT
---------------- --------------- --------------- -----------
91 102 107 301
下面用SQL语句模拟cube,可以看出cube为我们节省了许多力气。
--用union all生成cube数据行
with emps as (
select last_name,first_name from hr.employees
),
mycube as (
select last_name,first_name from emps
union all
select last_name,null first_name from emps
union all
select null last_name,first_name from emps
union all
select null last_name,null first_name from emps
)
select /*+ gather_plan_statistics */ *
from mycube group by last_name,first_name;
cube实际应用
sales_history模式中包含1998~2001年的销售数据。
下面的SQL展示2001年的所有销售数据。并想要查看各个产品种类的销售情况汇总,包含基于10年消费者年龄段,收入水平的聚合;按照收入水平而不考虑年龄的汇总;以及按年龄而不考虑收入水平的聚合。
--销售数据的union all查询
with tsales as
(select /* lst7-10 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001)),
gb as
(select --Q1 - 所有分类通过收入和年龄范围
'Q1' query_tag,
prod_category,
cust_income_level,
age_range,
sum(profit) profit
from tsales
group by prod_category, cust_income_level, age_range
union all
select --Q2 - 所有分类通过年龄范围
'Q2' query_tag,
prod_category,
'ALL INCOME' cust_income_level,
age_range,
sum(profit) profit
from tsales
group by prod_category, 'ALL INCOME', age_range
union all
select --Q3 - 所有分类通过收入
'Q3' query_tag,
prod_category,
cust_income_level,
'ALL AGE' age_range,
sum(profit) profit
from tsales
group by prod_category, cust_income_level, 'ALL AGE'
union all
select --Q4 - 所有分类
'Q4' query_tag,
prod_category,
'ALL INCOME' cust_income_level,
'ALL AGE' age_range,
sum(profit) profit
from tsales
group by prod_category, 'ALL INCOME', 'ALL AGE'
)
select * from gb order by prod_category, profit;
【语法】NVL (expr1, expr2)
【功能】若expr1为NULL,返回expr2;expr1不为NULL,返回expr1。
注意两者的类型要一致
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level
若c.cust_income_level为null,则返回'A: below 30,000'
mod(x,y)
【功能】返回x除以y的余数
【参数】x,y,数字型表达式
【返回】数字
【示例】
select mod(23,8),mod(24,8) from dual;
返回:7,0
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' || to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range
上面是求年龄段,如果56,则求出的范围为50_60
--用cube代替union all
with tsales as
(select /* lst7-11 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
null,
decode(age_range, null, 4, 3),
decode(age_range, null, 2, 1)) query_tag,
prod_category,
cust_income_level,
age_range,
sum(profit) profit
from tsales
group by prod_category, cube(cust_income_level, age_range)
order by prod_category, profit;
decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)
【功能】根据条件返回相应值
【参数】c1, c2, ...,cn,字符型/数值型/日期型,必须类型相同或null
注:值1……n 不能为条件表达式,这种情况只能用case when then end解决
·含义解释:
decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)
该函数的含义如下:
IF 条件=值1 THEN
RETURN(翻译值1)
ELSIF 条件=值2 THEN
RETURN(翻译值2)
......
ELSIF 条件=值n THEN
RETURN(翻译值n)
ELSE
RETURN(缺省值)
END IF
或:
when case 条件=值1 THEN
RETURN(翻译值1)
ElseCase 条件=值2 THEN
RETURN(翻译值2)
......
ElseCase 条件=值n THEN
RETURN(翻译值n)
ELSE
RETURN(缺省值)
END
'Q' || decode(cust_income_level, null,decode(age_range, null, 4, 3),decode(age_range, null, 2, 1)) query_tag
是返回查询分类标识cust_income_level为null返返回decode(age_range, null, 4, 3)否则返回decode(age_range, null, 2, 1)
cust_income_level==null and age_range==null,query_tag=4
cust_income_level==null and age_range!=null,query_tag=3
cust_income_level!=null and age_range==null,query_tag=2
cust_income_level!=null and age_range!=null,query_tag=1
用grouping()函数排除空值
上面的SQL有个问题,尽管总行数与之前使用union all运算符所得到的相一致,一些数据行中的cust_income_level和age_range具有空值,并且有一行的这两列都为空值。当cube的参数中包含生成列的所有可能组合时,每一列都有会产生n-1个空值,n为列表中的数目。在查询的例子中有两个例,因此对于每个唯一的age_range值都会在cust_income_level列上产生空值。对于age_range列来说也适用同样的规则。如果这两列中的数据在某些行上原本就有空值,这些空值就可能出问题。如何辨别数据中原有的空值和cube扩展所插入的值呢?在oracle 8i中引入了grouping()函数,可以用来识别这些超聚合行。被用来作为grouping()函数参数的表达式必须与出现在group by子句中的表达式相匹配。例如
decode(grouping(age_range),1,'ALL AGE',age_range) age_range
age_range检测age_range是否有一行由cube产生的空值,或者是否其在数据库中本身就是空值。如果当前行是由cube生成的超聚合行则返回值为1,对于其它所有情况返回值都为0。
当与case()表达式或decode()函数组合时,超聚合行中的空值可以用一个报告中有用的值替换。这种情况下,decode()看上去是更好的选择,因为它更简便并且grouping()函数仅有两种可能的返回值。
--grouping()函数
--无grouping
with tsales as
(select /* lst7-11 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
null,
decode(age_range, null, 4, 3),
decode(age_range, null, 2, 1)) query_tag,
prod_category,
cust_income_level,
age_range,
sum(profit) profit
from tsales
group by prod_category, cube(cust_income_level, age_range)
order by prod_category, profit;
--有grouping
--case和decode都可以工作,我更喜欢用decode
with tsales as
(select /* lst7-12 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
null,
decode(age_range, null, 4, 3),
decode(age_range, null, 2, 1)) query_tag,
prod_category,
case grouping(cust_income_level)
when 1 then
'ALL INCOME'
else
cust_income_level
end cust_income_level,
decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
sum(profit) profit
from tsales
group by prod_category, cube(cust_income_level, age_range)
order by prod_category, profit;
用grouping()扩展报告
另一种使用grouping()的方法是放在having子句中,用来控制在输出中显示哪个层级的聚合。
使用grouping()函数可以被浓缩为对cube扩展中的各行或所有行进行滚动小计。
--在having子句中进行grouping()
with tsales as
(select /* lst7-13 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
null,
decode(age_range, null, 4, 3),
decode(age_range, null, 2, 1)) query_tag,
prod_category,
case grouping(cust_income_level)
when 1 then
'ALL INCOME'
else
cust_income_level
end cust_income_level,
decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
sum(profit) profit
from tsales
group by prod_category, cube(cust_income_level, age_range)
--having grouping(cust_income_level)=1
--having grouping(age_range)=1
having grouping(cust_income_level)=1 and grouping(age_range)=1
order by prod_category, profit;
上面sql的数据可以看到将grouping()应用到cust_income_level列对所有age_range值跨各个收入层次创建聚合。对age_range列进行这样的操作会得到类似的效果,对所有cust_income_level值进行聚合而不考虑age_range的值。将cube扩展中的所有列作为grouping()函数的参数将会导致聚合被浓缩为一行类似sum(profit)和group by prod_category所实现的功能。但是,使用cube扩展简单修改having子句就可以创建几份不同的报告。
用grouping_id()扩展报告
grouping_id()函数相对grouping()函数来说是相对较新的,在oracle 9i中引入,与grouping()函数在某种程度上是类似的。不同的是grouping()计算一个表达式并返回0或1,而grouping_id()计算一个表达式,确定其参数中的哪一行(如果有的话)用来生成超聚合行,然后创建一个位矢量,并将该值作为整形值返回。
--group_id()位矢量
with rowgen as (
select 1 bit_1,0 bit_0
from dual
), cubed as (
select
grouping_id(bit_1,bit_0) gid,
to_char(grouping(bit_1)) bv_1,
to_char(grouping(bit_0)) bv_0,
decode(grouping(bit_1),1,'GRP BIT 1') gb_1,
decode(grouping(bit_0),1,'GRP BIT 0') gb_0
from rowgen
group by cube(bit_1,bit_0)
)
select gid,bv_1 || bv_0 bit_vector,
gb_1,
gb_0
from cubed
order by gid;
我们己经知道如何使用grouping()通过having子句来控制输出,但考虑数据库效率时,单独的grouping_id()调用可以用来取代所有不同的having grouping()子句。grouping()函数的功能仅仅用来辨别数据行,因为它仅能返回0或1。由于grouping_id()函数返回一个基于位矢量的数值,它可以轻易被用来进行各种不同的比较而不用修改sql语句。
为什么要关注不改变SQL语句就能改变比较呢?如上面基于销售历史的例子中,用户可能会被给出4个输出选项,任意一个或多个可能会被选中。用户的选择可以用来作为使用having grouping_id()函数的一个单独的sql语句,而不是基于having grouping()的不同组全的多个sql语句的输入,因此需要数据库解析sql语句的次数也比就较少。同时这也会使得需要执行的sql语句更少,使用更小的IO,以及更少的内存。
正如使用cube来避免通过union all将多个sql语句结合起来一样,grouping_id()能够避免在应用中使用多个sql语句。
--显示所有收入层次和年龄段的聚合
variable N_ALL_DATA number
--显示所有年龄段的聚合
variable N_AGE_RANGE number
--显示所有收入层次的聚合
variable N_INCOME_LEVEL number
--只给出汇总
variable N_SUMMAY number
begin
:N_ALL_DATA := 0; -- 1 生效
:N_AGE_RANGE := 2; -- 2 生效
:N_INCOME_LEVEL := 0; -- 3 生效
:N_SUMMAY := 4; -- 4 生效
end;
/
with tsales as
(select /* lst7-15 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range,
sum(profit) profit
from tsales
group by prod_category,cube(cust_income_level,age_range)
having grouping_id(cust_income_level,age_range)+1 in(:N_ALL_DATA,:N_AGE_RANGE,:N_INCOME_LEVEL,:N_SUMMAY)
order by prod_category,profit;
使用grouping函数也可以实现同的结果,但需要在having子句中进行一些测试。示例销售历史数据查询在cube参数中只包含两列。在having子句中总共需要进行4次测试,因为grouping子句将会返回1或者0,每一列有两个可能的值。从而需要4次测试。如果3列,则需要8次,所需的测试次数将会是2的n次方,其中n为cube中参数列或表达式的个数。
用grouping()代替grouping_id()的having子句的例子
--显示所有收入层次和年龄段的聚合
variable N_ALL_DATA number
--显示所有年龄段的聚合
variable N_AGE_RANGE number
--显示所有收入层次的聚合
variable N_INCOME_LEVEL number
--只给出汇总
variable N_SUMMAY number
begin
:N_ALL_DATA := 0; -- 1 生效
:N_AGE_RANGE := 2; -- 2 生效
:N_INCOME_LEVEL := 0; -- 3 生效
:N_SUMMAY := 4; -- 4 生效
end;
/
with tsales as
(select /* lst7-16 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range,
sum(profit) profit
from tsales
group by prod_category,cube(cust_income_level,age_range)
having
(bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_ALL_DATA)
or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_AGE_RANGE)
or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_INCOME_LEVEL)
or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_SUMMAY)
order by prod_category,profit;
1. 使用grouping可以判断该行是数据库中本来的行,还是有统计产生的行
grouping值为0时说明这个值是数据库中本来的值,为1说明是统计的结果(也可以说该列为空时是1,不为空时是0)
2. GROUPING_ID()函数可以接受一列或多列,返回GROUPING位向量的十进制值。GROUPING位向量的计算方法是将按照顺序对每一列调用GROUPING函数的结果组合起来,所以说01和10的值不一样的
3. group_id的使用 当group by子句中重复使用一个列时,通过group_id来去除重复值
grouping sets与rollup()
group by的grouping sets()扩展在oracle 9i中初次登场,前面的例子中的整个group by...having子句可以用group by grouping sets()替换。
with tsales as
(select /* lst7-17 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range,
sum(profit) profit
from tsales
group by prod_category,grouping sets(
rollup(prod_category), --产品分类小计
(cust_income_level),--产品分类和收入层次
(age_range), --产品分类和年龄范围
(cust_income_level,age_range) --产品分类,年龄范围和收入层次
)
--having group_id() < 1
order by prod_category,profit;
group by cube having grouping_id()与group by grouping sets一个主要的区别是,前者能将变量设定为正确的值来简便修改输出,而后者的输出不能修改,除非修改或动态生成sql语句。修改sql语句意味着需要维护更多的代码并且占用更多的数据库资源。最好尽量避免使用动态生成sql语句,因为它会消耗的数据库资源更多,并且在出现问题时难以检修。
某些时候grouping_sets()扩展会导致输出中出现重复。重复是由rollup(prod_category)产生的。可以通过去掉rollup()然后重新运行得到验证,重复的行将不复存在。但是,每种产品种类的总计也不存在了。解决的办法就是使用group_id()函数标记重复的行,并将其插入到having子句中。
在上面的sql中将--having group_id() < 1
改成 having group_id() < 1
这样,输出结果就如预期的那样不包含重复的行了。有趣的是如果将rollup(prod_category)
改成null
,去掉having子句,同时还能得到预期的输出。代码如下:
with tsales as
(select /* lst7-17-1 */
s.quantity_sold,
s.amount_sold,
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
p.prod_name,
p.prod_desc,
p.prod_category,
(pf.unit_cost * s.quantity_sold) total_cost,
s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
from sh.sales s
join sh.customers c
on c.cust_id = s.cust_id
join sh.products p
on p.prod_id = s.prod_id
join sh.times t
on t.time_id = s.time_id
join sh.costs pf
on pf.channel_id = s.channel_id
and pf.prod_id = s.prod_id
and pf.promo_id = s.promo_id
and pf.time_id = s.time_id
where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range,
sum(profit) profit
from tsales
group by prod_category,grouping sets(
null,
--rollup(prod_category), --产品分类小计
(cust_income_level),--产品分类和收入层次
(age_range), --产品分类和年龄范围
(cust_income_level,age_range) --产品分类,年龄范围和收入层次
)
--having group_id() < 1
order by prod_category,profit
group by的rollup()扩展也可以单独用来计算否则将会需要由union all结合起来的多个查询完成的小计。
例如:创建显示器所有名字以Sul开头的消费者各自的购买总额报告,并且要求对每个消费者分别按年,产品分类进行小计,还要有所有消费的总计。这种类型的任务可以使用rollup()完成。
--rollup()小计
with mysales as (
select c.cust_last_name || ',' || c.cust_first_name cust_name,
p.prod_category,
to_char(trunc(time_id,'YYYY'),'YYYY') sale_year,
p.prod_name,
s.amount_sold
from sh.sales s
join sh.products p on p.prod_id=s.prod_id
join sh.customers c on c.cust_id=s.cust_id
where c.cust_last_name like 'Sul%'
)
select
decode(grouping(m.cust_name),1,'GRAND TOTAL',m.cust_name) cust_name,
decode(grouping(m.sale_year),1,'TOTAL BY YEAR',m.sale_year) sale_year,
decode(grouping(m.prod_category),1,'TOTAL BY CATEGORY',m.prod_category) prod_category,
sum(m.amount_sold) amount_sold
from mysales m
group by rollup(m.cust_name,m.prod_category,m.sale_year)
order by grouping(m.cust_name), 1,2,3;
注意decode()和grouping()函数再一次被用来表示小计行。使用grouping(m.cust_name)将总计显示在报告的最后。由于这个值>0的唯一情况就是当计算所有消费者总计时,这个总计值只会出现在报告的最后。
group by的局限性
- LOB列,嵌套表或数组不能用做group by表达式的一部分
SQL> with lobtest as (
2 select to_clob(d.dname) dname
3 from scott.emp e
4 join scott.dept d on d.deptno=e.deptno
5 )
6 select l.dname
7 from lobtest l
8 group by l.dname;
group by l.dname
*
第 8 行出现错误:
ORA-00932: 数据类型不一致: 应为 -, 但却获得 CLOB
- 不允许使用标量子查询表达式
SQL> select d.dname,count(empno) empcount
2 from scott.emp e
3 join scott.dept d on d.deptno=e.deptno
4 group by (select dname from scott.dept d2 where d2.dname = d.dname )
5 order by d.dname;
group by (select dname from scott.dept d2 where d2.dname = d.dname )
*
第 4 行出现错误:
ORA-22818: 这里不允许出现子查询表达式
- 如果group by子句引用任何对象类型的列则查询不能并行化
create type dept_location_type as object
(
street_address varchar2(40),
postal_code varchar2(10),
city varchar2(30),
state_province varchar2(10),
country_id char(2),
order member function match (e dept_location_type) return integer
);
/
create or replace type body dept_location_type
as order member function match (e dept_location_type) return integer
is
begin
if city e.city then
return 1;
else
return 0;
end if;
end;
end;
/
create table deptobj
as
select d.deptno,d.dname
from scott.dept d;
alter table deptobj add (dept_location dept_location_type);
select * from deptobj;
update deptobj set dept_location=dept_location_type('1234 fenmenao st','453076','ShenZhen','GuangDong','GD') where deptno=1;
update deptobj set dept_location=dept_location_type('345 Leshan st','123456','LeShan','SiCuan','SC') where deptno=2;
update deptobj set dept_location=dept_location_type('345 ChongQing st','123456','ChongQing','ChongQing','CQ') where deptno=3;
update deptobj set dept_location=dept_location_type('345 ChangChun st','123456','ChangChun','GuiYang','GY') where deptno=4;
--对象列的并行group by
select /*+ gather_plan_statictics parallel(e 2)*/
d.dept_location,count(e.ename) ecount
from scott.emp e,deptobj d
where e.deptno=d.deptno
group by d.dept_location
order by d.dept_location;
执行结果
dept_location类型体中的成员函数匹配用来进行城市值的比较,然后使用group by将雇员按城市分组。最后一个列出的局限性在后期的版本是可以工作的。
总结
Oracle以group by子句扩展的形式为SQL开发者提供了一些极佳的工具,帮助我们不仅能够减少代码量,并且能提高数据库效率。大多数的特性也要与其它不同的功能进行组合。