今天看concept,看到dimension时候,看的不太懂,后来去网上百度了一下,在asktom上看到了实验,就去实际做了一下,做后,终于明白了之前看到
的query rewrite的意思和dimension配合物化视图的巨大作用,可以很大程度的提高查询的性能,在数据仓库中,应用很显著。
首先介绍一下dimension(维度),下面是concept中的概念:
A dimension table is a logical structure that defines hierarchical (parent/child) relationships between pairs of columns or column sets. For example, a dimension can indicate that within a row the city column implies the value of the state column, the state column implies the value of the countrycolumn, and so on.
简而言之,就是维度只是一个逻辑结构,主要有三个重要属性,第一level,用于定义一个或者一组列为一个整体;第二hierarchy,定义各个level之间的层次关系;第三attribute,定义level和某个列的1:1关系
维度使用的场所是,当你打开物化视图的查询重写(query rewrite)时,对于包含聚合函数的SQL,可以重新定位到对应的物化视图,而物化视图里面保存的已经有的数据,这样就可以提供查询性能,而当有时
不满足物化视图的查询条件时,比如物化视图里面定义的月的聚合情况,而查询条件为查询季的聚合情况时,就不走物化视图,这样性能大大减弱,而维度,就是用来解决这个问题的,他可以表示日、月、季、年
等等之间的层次关系,这样虽然查询的是季的情况,他可以通过月的情况,得到季的情况。
下面用实验说话:
先创建一张1000多万的代表:
SQL> desc sales;
Name Null? Type
----------------------------------------------------------------------------------- -------- --------------------------------------------------------
TRANS_DATE DATE
CUST_ID NUMBER(38)
SALES_AMOUNT NUMBER
SQL> select count(*) from sales;
COUNT(*)
----------
14680064
创建索引组织表,用于保存日、月、季、年之间的关系:
create table time_hierarchy(day primary,monthy,qtr_yyyy,year) organization index as select distinct to_char(trans_date,'yyyy-mm-dd'),to_char(trans_date,'yyyy-mm'),to_char(trans_date,'Q'),to_char(trans_date,'yyyy') from sales;
创建物化视图,用于存储每个客户对应每个月的销售情况:
create materialized view mv_sales
build immediate
refresh on demand
enable query rewrite
as
select sales.cust_id,time_hierarchy.monthy,sum(sales.sales_amount)
from sales,time_hierarchy
where to_char(sales.trans_date,'yyyy-mm-dd')=time_hierarchy.day
group by sales.cust_id,time_hierarchy.monthy
对基表进行分析,以使优化器能够优化物化视图的查询重写功能:
analyze table sales compute statistics;
analyze table time_hierarchy compute statistics;
设置会话的查询重写功能:
alter session set query_rewrite_enabled=true;
alter session set query_rewrite_integrity=trusted;
按月统计销售量:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.monthy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.monthy
SQL> /
MONTHY SUM(SALES_AMOUNT)
--------------------- -----------------
1981-12 4141875200
1987-04 3145728000
1981-05 2988441600
1982-01 1363148800
1981-09 2883584000
1987-05 1153433600
1981-02 2988441600
1981-11 5242880000
1981-04 3119513600
1980-12 838860800
1981-06 2569011200
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3566649941
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 252 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 14 | 252 | 4 (25)| 00:00:01 |
| 2 | MAT_VIEW REWRITE ACCESS FULL| MV_SALES | 14 | 252 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
6 physical reads
0 redo size
835 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed
可以看到,查询优化器走了物化视图,而没有走基表,consistent gets也只有3,查询性能十分快
如果不按月查询,而按季查询销售量,结果如下:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.qtr_yyyy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.qtr_yyyy
SQL> /
QTR SUM(SALES_AMOUNT)
--- -----------------
1 4351590400
3 2883584000
4 1.0224E+10
2 1.2976E+10
Execution Plan
----------------------------------------------------------
Plan hash value: 3402703070
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 84 | 10095 (10)| 00:02:02 |
| 1 | HASH GROUP BY | | 4 | 84 | 10095 (10)| 00:02:02 |
|* 2 | HASH JOIN | | 14M| 294M| 9329 (3)| 00:01:52 |
| 3 | INDEX FULL SCAN | SYS_IOT_TOP_58953 | 13 | 143 | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| SALES | 14M| 140M| 9256 (2)| 00:01:52 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TIME_HIERARCHY"."DAY"=TO_CHAR(INTERNAL_FUNCTION("SALES"."TRANS_DAT
E"),'yyyy-mm-dd'))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
41368 consistent gets
41362 physical reads
0 redo size
683 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
可以看到走的是基表,没走物化视图,并且consistent gets为41368,查的时候很慢
下面建立dimension,用来表示日、月、季、年之间的层次关系,这样就可以使用query rewrite功能了
下面是建立维护:
create dimension time_hierarchy_dim
level day is time_hierarchy.day
level monthy is time_hierarchy.monthy
level qtr_yyyy is time_hierarchy.qtr_yyyy
level year is time_hierarchy.year
hierarchy time_rollup
(
day child of
monthy child of
qtr_yyyy child of
year
)
attribute monthy
determines monthy;
再次按季查询销售量,结果如下:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.qtr_yyyy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.qtr_yyyy
SQL> /
QTR SUM(SALES_AMOUNT)
--- -----------------
1 4351590400
3 2883584000
4 1.0224E+10
2 1.2976E+10
Execution Plan
----------------------------------------------------------
Plan hash value: 1315230953
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 42 | 6 (34)| 00:00:01 |
| 1 | HASH GROUP BY | | 3 | 42 | 6 (34)| 00:00:01 |
| 2 | VIEW | | 3 | 42 | 6 (34)| 00:00:01 |
| 3 | HASH UNIQUE | | 3 | 114 | 6 (34)| 00:00:01 |
|* 4 | HASH JOIN | | 17 | 646 | 5 (20)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_IOT_TOP_58953 | 13 | 104 | 1 (0)| 00:00:01 |
| 6 | MAT_VIEW REWRITE ACCESS FULL| MV_SALES | 14 | 420 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("MONTHY"="MV_SALES"."MONTHY")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
683 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
可以看到再次走了物化视图,并且逻辑读减少到只有4,这样的性能提高特别明显。