oracle学习笔记7: 高级分组

group by 用来在原始数据上创建聚合来将数据转化为有用的信息。

基本的group by 列出个个部门的名称,员工总数

select d.dname, count(empno) empcount
  from scott.dept d
  left outer join scott.emp e
    on d.deptno = e.deptno
 group by d.dname
 order by d.dname;

select列表中的每一列必须包含在group by子句中。如果没有则会导致错误。如:


SQL> select d.dname, d.loc, count(empno) empcount
  2    from scott.emp e
  3     join scott.dept d
  4      on d.deptno = e.deptno
  5   group by d.dname;
select d.dname, d.loc, count(empno) empcount
                *
第 1 行出现错误:
ORA-00979: 不是 GROUP BY 表达式

尽管包含group by子句的select语句输出看上去是按顺序列出的,你不能期待group by每次都排好序地返回数据,如果输出结果必须按照一定的顺序排列,则必须使用order by子句。

--没有排序的group by 
select deptno,count(*)
from emp
group by deptno;
--复杂的sql
set serveroutput off;
--复杂的sql
select /* lst7-4 */
distinct dname,
         decode(d.deptno,
                1,
                (select count(*) from emp where deptno = 1),
                2,
                (select count(*) from emp where deptno = 2),
                3,
                (select count(*) from emp where deptno = 3),
                (select count(*) from emp where deptno not in (1, 2, 3))) dept_count
  from (select distinct deptno from emp) d
  join dept d2
    on d.deptno = d.deptno;

@E:\bjc2016\study\pln lst7-4

上面的写法,会使SQL语句更加复杂难以理解并且难以维护。group by子句极大的简化必须写的sql语句以外,还消除了数据库不必要的IO。

set serveroutput off;
--复杂的sql
select /* lst7-5 */
distinct dname,
         count(empno) empcount
  from  emp e
  join dept d
    on d.deptno = d.deptno
    group by d.dname
    order by d.dname;

@E:\bjc2016\study\pln lst7-5

group by 优点:

  • 使sql语句更具有可读性
  • 书写起来比使用很多相关子查询更简单
  • 减少了重复访问同一个数据块的次数,从而得到更好的性能。

在分组之后,还在数据集上应用了having子句。另一方面,在获取数据行之后,进行分组之前,应用了where子句。having 子句中可以使用运算,函数及子查询。


SQL> --having子句
SQL> select /* lst7-6 */
  2   d.dname, trunc(e.hiredate, 'yyyy') hiredate, count(empno) empcount
  3    from emp e
  4    join dept d
  5      on e.deptno = e.deptno
  6   group by d.dname, trunc(e.hiredate, 'yyyy')
  7  having count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select min(hiredate)
  8                                                                    from scott.emp) and (select max(hiredate)
  9                                                                                           from scott.emp)
 10   order by d.dname;

未选定行

SQL> @E:\bjc2016\study\pln lst7-6
原值    8:        WHERE UPPER(SQL_TEXT) LIKE '%&1%'
新值    8:        WHERE UPPER(SQL_TEXT) LIKE '%lst7-6%'

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  0wcfknkztdxqt, child number 0
-------------------------------------
select /* lst7-6 */  d.dname, trunc(e.hiredate, 'yyyy') hiredate,
count(empno) empcount   from emp e   join dept d     on e.deptno =
e.deptno  group by d.dname, trunc(e.hiredate, 'yyyy') having
count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select
min(hiredate)
        from scott.emp) and (select max(hiredate)
                                                                   from
scott.emp)  order by d.dname


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 239717969

---------------------------------------------------------------------------
| Id  | Operation              | Name | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |        |       |       |          |
|*  1 |  FILTER                |      |        |       |       |          |
|   2 |   SORT GROUP BY        |      |      1 |  2048 |  2048 | 2048  (0)|
|   3 |    MERGE JOIN CARTESIAN|      |     40 |       |       |          |
|   4 |     TABLE ACCESS FULL  | DEPT |      4 |       |       |          |
|   5 |     BUFFER SORT        |      |     10 |  2048 |  2048 | 2048  (0)|

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   6 |      TABLE ACCESS FULL | EMP  |     10 |       |       |          |
|   7 |   SORT AGGREGATE       |      |      1 |       |       |          |
|   8 |    TABLE ACCESS FULL   | EMP  |     10 |       |       |          |
|   9 |   SORT AGGREGATE       |      |      1 |       |       |          |
|  10 |    TABLE ACCESS FULL   | EMP  |     10 |       |       |          |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((COUNT(*)<=5 AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
              fmyyyy')>= AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'fmyyyy')<=))


Note
-----
   - Warning: basic plan statistics not available. These are only collected when
:

       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system leve

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
l



已选择41行。

SQL>

group by的cube扩展

当与group by子句一起使用时,将会使得对每一行都要考虑包含在cube的参数中的所有可能的元素组合。这个运算将会生成比表中实际存在的行数更多的数据行。

-- hr.emplyees表的cube运算
select last_name, first_name
  from hr.employees
 group by first_name, last_name;

set autotrace off;
set autotrace on statistics;
with emps as
 (select /* lst-7 */
   last_name, first_name
    from hr.employees
   group by cube(last_name, first_name))
select rownum, last_name, first_name from emps;

对于每一对last_name,first_name,cube将会按顺序为每个元素替换为null值。cube生成的数据行在Oracle文档中称为超级聚合行,可以在运算列中加入null值来识别。


SQL> set autotrace off;
SQL> --预测cube返回行数
SQL> with counts as
  2   (select count(distinct first_name) first_name_count,
  3           count(distinct last_name) last_name_count,
  4           count(distinct(first_name || last_name)) full_name_count
  5      from hr.employees)
  6  select first_name_count,
  7         last_name_count,
  8         full_name_count,
  9         first_name_count + last_name_count + full_name_count + 1 total_count
 10    from counts;

FIRST_NAME_COUNT LAST_NAME_COUNT FULL_NAME_COUNT TOTAL_COUNT
---------------- --------------- --------------- -----------
              91             102             107         301

下面用SQL语句模拟cube,可以看出cube为我们节省了许多力气。

--用union all生成cube数据行
with emps as (
 select last_name,first_name from hr.employees
),
mycube as (
       select last_name,first_name from emps
       union all
       select last_name,null first_name from emps
       union all
       select null last_name,first_name from emps
       union all
       select null last_name,null first_name from emps
)
select /*+ gather_plan_statistics */ * 
from mycube group by last_name,first_name;

cube实际应用

sales_history模式中包含1998~2001年的销售数据。
下面的SQL展示2001年的所有销售数据。并想要查看各个产品种类的销售情况汇总,包含基于10年消费者年龄段,收入水平的聚合;按照收入水平而不考虑年龄的汇总;以及按年龄而不考虑收入水平的聚合。

--销售数据的union all查询
with tsales as
 (select /* lst7-10 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001)),
gb as
 (select --Q1 - 所有分类通过收入和年龄范围
   'Q1' query_tag,
   prod_category,
   cust_income_level,
   age_range,
   sum(profit) profit
    from tsales
   group by prod_category, cust_income_level, age_range
  union all
  select --Q2 - 所有分类通过年龄范围
   'Q2' query_tag,
   prod_category,
   'ALL INCOME' cust_income_level,
   age_range,
   sum(profit) profit
    from tsales
   group by prod_category, 'ALL INCOME', age_range
  union all
  select --Q3 - 所有分类通过收入
   'Q3' query_tag,
   prod_category,
   cust_income_level,
   'ALL AGE' age_range,
   sum(profit) profit
    from tsales
   group by prod_category, cust_income_level, 'ALL AGE'
  union all
  select --Q4 - 所有分类
   'Q4' query_tag,
   prod_category,
   'ALL INCOME' cust_income_level,
   'ALL AGE' age_range,
   sum(profit) profit
    from tsales
   group by prod_category, 'ALL INCOME', 'ALL AGE'
  )
select * from gb order by prod_category, profit;

【语法】NVL (expr1, expr2)
【功能】若expr1为NULL,返回expr2;expr1不为NULL,返回expr1。
注意两者的类型要一致
nvl(c.cust_income_level, 'A: below 30,000') cust_income_level
若c.cust_income_level为null,则返回'A: below 30,000'
mod(x,y)
【功能】返回x除以y的余数
【参数】x,y,数字型表达式
【返回】数字
【示例】
select mod(23,8),mod(24,8) from dual;
返回:7,0
to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' || to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range
上面是求年龄段,如果56,则求出的范围为50_60

--用cube代替union all
with tsales as
 (select /* lst7-11 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       cust_income_level,
       age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 order by prod_category, profit;

decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)
【功能】根据条件返回相应值
【参数】c1, c2, ...,cn,字符型/数值型/日期型,必须类型相同或null
注:值1……n 不能为条件表达式,这种情况只能用case when then end解决
·含义解释:  
  decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)  
  该函数的含义如下:  
  IF 条件=值1 THEN
  RETURN(翻译值1)
  ELSIF 条件=值2 THEN
  RETURN(翻译值2)
  ......
  ELSIF 条件=值n THEN
  RETURN(翻译值n)  
  ELSE
  RETURN(缺省值)
  END IF
  
或:
  when case 条件=值1 THEN
  RETURN(翻译值1)
  ElseCase 条件=值2 THEN
  RETURN(翻译值2)
  ......
  ElseCase 条件=值n THEN
  RETURN(翻译值n)  
  ELSE

RETURN(缺省值)
  END
'Q' || decode(cust_income_level, null,decode(age_range, null, 4, 3),decode(age_range, null, 2, 1)) query_tag
是返回查询分类标识cust_income_level为null返返回decode(age_range, null, 4, 3)否则返回decode(age_range, null, 2, 1)
cust_income_level==null and age_range==null,query_tag=4
cust_income_level==null and age_range!=null,query_tag=3
cust_income_level!=null and age_range==null,query_tag=2
cust_income_level!=null and age_range!=null,query_tag=1

用grouping()函数排除空值

上面的SQL有个问题,尽管总行数与之前使用union all运算符所得到的相一致,一些数据行中的cust_income_level和age_range具有空值,并且有一行的这两列都为空值。当cube的参数中包含生成列的所有可能组合时,每一列都有会产生n-1个空值,n为列表中的数目。在查询的例子中有两个例,因此对于每个唯一的age_range值都会在cust_income_level列上产生空值。对于age_range列来说也适用同样的规则。如果这两列中的数据在某些行上原本就有空值,这些空值就可能出问题。如何辨别数据中原有的空值和cube扩展所插入的值呢?在oracle 8i中引入了grouping()函数,可以用来识别这些超聚合行。被用来作为grouping()函数参数的表达式必须与出现在group by子句中的表达式相匹配。例如
decode(grouping(age_range),1,'ALL AGE',age_range) age_range
age_range检测age_range是否有一行由cube产生的空值,或者是否其在数据库中本身就是空值。如果当前行是由cube生成的超聚合行则返回值为1,对于其它所有情况返回值都为0。
当与case()表达式或decode()函数组合时,超聚合行中的空值可以用一个报告中有用的值替换。这种情况下,decode()看上去是更好的选择,因为它更简便并且grouping()函数仅有两种可能的返回值。

--grouping()函数
--无grouping
with tsales as
 (select /* lst7-11 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       cust_income_level,
       age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 order by prod_category, profit;

--有grouping
--case和decode都可以工作,我更喜欢用decode
with tsales as
 (select /* lst7-12 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       case grouping(cust_income_level)
         when 1 then
          'ALL INCOME'
         else
          cust_income_level
       end cust_income_level,
       decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 order by prod_category, profit;

用grouping()扩展报告

另一种使用grouping()的方法是放在having子句中,用来控制在输出中显示哪个层级的聚合。
使用grouping()函数可以被浓缩为对cube扩展中的各行或所有行进行滚动小计。

--在having子句中进行grouping()
with tsales as
 (select /* lst7-13 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || decode(cust_income_level,
                     null,
                     decode(age_range, null, 4, 3),
                     decode(age_range, null, 2, 1)) query_tag,
       prod_category,
       case grouping(cust_income_level)
         when 1 then
          'ALL INCOME'
         else
          cust_income_level
       end cust_income_level,
       decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
       sum(profit) profit
  from tsales
 group by prod_category, cube(cust_income_level, age_range)
 --having grouping(cust_income_level)=1
 --having grouping(age_range)=1
 having grouping(cust_income_level)=1 and grouping(age_range)=1
 order by prod_category, profit;

上面sql的数据可以看到将grouping()应用到cust_income_level列对所有age_range值跨各个收入层次创建聚合。对age_range列进行这样的操作会得到类似的效果,对所有cust_income_level值进行聚合而不考虑age_range的值。将cube扩展中的所有列作为grouping()函数的参数将会导致聚合被浓缩为一行类似sum(profit)和group by prod_category所实现的功能。但是,使用cube扩展简单修改having子句就可以创建几份不同的报告。

用grouping_id()扩展报告

grouping_id()函数相对grouping()函数来说是相对较新的,在oracle 9i中引入,与grouping()函数在某种程度上是类似的。不同的是grouping()计算一个表达式并返回0或1,而grouping_id()计算一个表达式,确定其参数中的哪一行(如果有的话)用来生成超聚合行,然后创建一个位矢量,并将该值作为整形值返回。

--group_id()位矢量
with rowgen as (
     select 1 bit_1,0 bit_0
     from dual
), cubed as (
 select 
 grouping_id(bit_1,bit_0) gid,
 to_char(grouping(bit_1)) bv_1,
 to_char(grouping(bit_0)) bv_0,
 decode(grouping(bit_1),1,'GRP BIT 1') gb_1,
 decode(grouping(bit_0),1,'GRP BIT 0') gb_0
 from rowgen
 group by cube(bit_1,bit_0)
)
select gid,bv_1 || bv_0 bit_vector,
gb_1,
gb_0
from cubed
order by gid;
oracle学习笔记7: 高级分组_第1张图片
group_id()位矢量运行结果

我们己经知道如何使用grouping()通过having子句来控制输出,但考虑数据库效率时,单独的grouping_id()调用可以用来取代所有不同的having grouping()子句。grouping()函数的功能仅仅用来辨别数据行,因为它仅能返回0或1。由于grouping_id()函数返回一个基于位矢量的数值,它可以轻易被用来进行各种不同的比较而不用修改sql语句。
为什么要关注不改变SQL语句就能改变比较呢?如上面基于销售历史的例子中,用户可能会被给出4个输出选项,任意一个或多个可能会被选中。用户的选择可以用来作为使用having grouping_id()函数的一个单独的sql语句,而不是基于having grouping()的不同组全的多个sql语句的输入,因此需要数据库解析sql语句的次数也比就较少。同时这也会使得需要执行的sql语句更少,使用更小的IO,以及更少的内存。
正如使用cube来避免通过union all将多个sql语句结合起来一样,grouping_id()能够避免在应用中使用多个sql语句。

--显示所有收入层次和年龄段的聚合
variable N_ALL_DATA number 
--显示所有年龄段的聚合
variable N_AGE_RANGE number 
--显示所有收入层次的聚合
variable N_INCOME_LEVEL number 
--只给出汇总
variable N_SUMMAY number 
begin
 :N_ALL_DATA      := 0;      -- 1 生效
 :N_AGE_RANGE     := 2;      -- 2 生效
 :N_INCOME_LEVEL  := 0;      -- 3 生效
 :N_SUMMAY        := 4;      -- 4 生效
end;
/
with tsales as
 (select /* lst7-15 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,cube(cust_income_level,age_range)
having grouping_id(cust_income_level,age_range)+1 in(:N_ALL_DATA,:N_AGE_RANGE,:N_INCOME_LEVEL,:N_SUMMAY)
order by prod_category,profit;

使用grouping函数也可以实现同的结果,但需要在having子句中进行一些测试。示例销售历史数据查询在cube参数中只包含两列。在having子句中总共需要进行4次测试,因为grouping子句将会返回1或者0,每一列有两个可能的值。从而需要4次测试。如果3列,则需要8次,所需的测试次数将会是2的n次方,其中n为cube中参数列或表达式的个数。

用grouping()代替grouping_id()的having子句的例子

--显示所有收入层次和年龄段的聚合
variable N_ALL_DATA number 
--显示所有年龄段的聚合
variable N_AGE_RANGE number 
--显示所有收入层次的聚合
variable N_INCOME_LEVEL number 
--只给出汇总
variable N_SUMMAY number 
begin
 :N_ALL_DATA      := 0;      -- 1 生效
 :N_AGE_RANGE     := 2;      -- 2 生效
 :N_INCOME_LEVEL  := 0;      -- 3 生效
 :N_SUMMAY        := 4;      -- 4 生效
end;
/
with tsales as
 (select /* lst7-16 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,cube(cust_income_level,age_range)
having 
 (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_ALL_DATA)
 or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_AGE_RANGE)
 or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_INCOME_LEVEL)
 or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_SUMMAY)
order by prod_category,profit;

1. 使用grouping可以判断该行是数据库中本来的行,还是有统计产生的行
grouping值为0时说明这个值是数据库中本来的值,为1说明是统计的结果(也可以说该列为空时是1,不为空时是0)
2. GROUPING_ID()函数可以接受一列或多列,返回GROUPING位向量的十进制值。GROUPING位向量的计算方法是将按照顺序对每一列调用GROUPING函数的结果组合起来,所以说01和10的值不一样的
3. group_id的使用 当group by子句中重复使用一个列时,通过group_id来去除重复值

grouping sets与rollup()

group by的grouping sets()扩展在oracle 9i中初次登场,前面的例子中的整个group by...having子句可以用group by grouping sets()替换。


with tsales as
 (select /* lst7-17 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,grouping sets(
rollup(prod_category), --产品分类小计
(cust_income_level),--产品分类和收入层次
(age_range), --产品分类和年龄范围
(cust_income_level,age_range) --产品分类,年龄范围和收入层次
)
--having group_id() < 1
order by prod_category,profit;

group by cube having grouping_id()与group by grouping sets一个主要的区别是,前者能将变量设定为正确的值来简便修改输出,而后者的输出不能修改,除非修改或动态生成sql语句。修改sql语句意味着需要维护更多的代码并且占用更多的数据库资源。最好尽量避免使用动态生成sql语句,因为它会消耗的数据库资源更多,并且在出现问题时难以检修。
某些时候grouping_sets()扩展会导致输出中出现重复。重复是由rollup(prod_category)产生的。可以通过去掉rollup()然后重新运行得到验证,重复的行将不复存在。但是,每种产品种类的总计也不存在了。解决的办法就是使用group_id()函数标记重复的行,并将其插入到having子句中。
在上面的sql中将--having group_id() < 1 改成 having group_id() < 1
这样,输出结果就如预期的那样不包含重复的行了。有趣的是如果将rollup(prod_category)改成null,去掉having子句,同时还能得到预期的输出。代码如下:


with tsales as
 (select /* lst7-17-1 */
   s.quantity_sold,
   s.amount_sold,
   to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
   to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
   nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
   p.prod_name,
   p.prod_desc,
   p.prod_category,
   (pf.unit_cost * s.quantity_sold) total_cost,
   s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
    from sh.sales s
    join sh.customers c
      on c.cust_id = s.cust_id
    join sh.products p
      on p.prod_id = s.prod_id
    join sh.times t
      on t.time_id = s.time_id
    join sh.costs pf
      on pf.channel_id = s.channel_id
     and pf.prod_id = s.prod_id
     and pf.promo_id = s.promo_id
     and pf.time_id = s.time_id
   where (t.fiscal_year = 2001))
select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
prod_category,
decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
sum(profit) profit
from tsales
group by prod_category,grouping sets(
null,
--rollup(prod_category), --产品分类小计
(cust_income_level),--产品分类和收入层次
(age_range), --产品分类和年龄范围
(cust_income_level,age_range) --产品分类,年龄范围和收入层次
)
--having group_id() < 1
order by prod_category,profit

group by的rollup()扩展也可以单独用来计算否则将会需要由union all结合起来的多个查询完成的小计。
例如:创建显示器所有名字以Sul开头的消费者各自的购买总额报告,并且要求对每个消费者分别按年,产品分类进行小计,还要有所有消费的总计。这种类型的任务可以使用rollup()完成。

--rollup()小计
with mysales as (
    select c.cust_last_name || ',' || c.cust_first_name cust_name,
    p.prod_category,
    to_char(trunc(time_id,'YYYY'),'YYYY') sale_year,
    p.prod_name,
    s.amount_sold
    from sh.sales s
    join sh.products p on p.prod_id=s.prod_id
    join sh.customers c on c.cust_id=s.cust_id
    where c.cust_last_name like 'Sul%'
)
select 
    decode(grouping(m.cust_name),1,'GRAND TOTAL',m.cust_name) cust_name,
    decode(grouping(m.sale_year),1,'TOTAL BY YEAR',m.sale_year) sale_year,
    decode(grouping(m.prod_category),1,'TOTAL BY CATEGORY',m.prod_category) prod_category,
    sum(m.amount_sold) amount_sold
from mysales m
group by rollup(m.cust_name,m.prod_category,m.sale_year)
order by grouping(m.cust_name), 1,2,3;

注意decode()和grouping()函数再一次被用来表示小计行。使用grouping(m.cust_name)将总计显示在报告的最后。由于这个值>0的唯一情况就是当计算所有消费者总计时,这个总计值只会出现在报告的最后。

group by的局限性

  • LOB列,嵌套表或数组不能用做group by表达式的一部分
SQL> with lobtest as (
  2   select to_clob(d.dname) dname
  3   from scott.emp e
  4   join scott.dept d on d.deptno=e.deptno
  5  )
  6  select l.dname
  7  from lobtest l
  8  group by l.dname;
group by l.dname
         *
第 8 行出现错误:
ORA-00932: 数据类型不一致: 应为 -, 但却获得 CLOB
  • 不允许使用标量子查询表达式

SQL> select d.dname,count(empno) empcount
  2  from scott.emp e
  3  join scott.dept d on d.deptno=e.deptno
  4  group by (select dname from scott.dept d2 where d2.dname = d.dname )
  5  order by d.dname;
group by (select dname from scott.dept d2 where d2.dname = d.dname )
          *
第 4 行出现错误:
ORA-22818: 这里不允许出现子查询表达式
  • 如果group by子句引用任何对象类型的列则查询不能并行化
create type dept_location_type as object
(
       street_address varchar2(40),
       postal_code varchar2(10),
       city varchar2(30),
       state_province varchar2(10),
       country_id char(2),
       order member function match (e dept_location_type) return integer
);
/
create or replace type body  dept_location_type
as order member function match (e dept_location_type) return integer
is 
   begin
         if city  e.city then
                 return 1;
         else
                 return 0;        
         end if;
   end;
end;
/      

create table deptobj
as 
select d.deptno,d.dname
from scott.dept d;     

alter table  deptobj add (dept_location dept_location_type);   

select * from deptobj;

update deptobj set dept_location=dept_location_type('1234 fenmenao st','453076','ShenZhen','GuangDong','GD') where deptno=1;
update deptobj set dept_location=dept_location_type('345 Leshan st','123456','LeShan','SiCuan','SC') where deptno=2;
update deptobj set dept_location=dept_location_type('345 ChongQing st','123456','ChongQing','ChongQing','CQ') where deptno=3;
update deptobj set dept_location=dept_location_type('345 ChangChun st','123456','ChangChun','GuiYang','GY') where deptno=4;
--对象列的并行group by
select /*+ gather_plan_statictics parallel(e 2)*/
d.dept_location,count(e.ename) ecount
from scott.emp e,deptobj d
where e.deptno=d.deptno
group by d.dept_location
order by d.dept_location;

执行结果

oracle学习笔记7: 高级分组_第2张图片
对象列的并行group by

dept_location类型体中的成员函数匹配用来进行城市值的比较,然后使用group by将雇员按城市分组。最后一个列出的局限性在后期的版本是可以工作的。

总结

Oracle以group by子句扩展的形式为SQL开发者提供了一些极佳的工具,帮助我们不仅能够减少代码量,并且能提高数据库效率。大多数的特性也要与其它不同的功能进行组合。

oracle学习笔记7: 高级分组_第3张图片
group by总结

你可能感兴趣的:(oracle学习笔记7: 高级分组)