7.5 Putting CUBE to work CUBE实战 (page 201)
When teaching us a new word in fourth grade English class, Mrs. Draper would say, “Now use it in a sentence.” Much like that, you now need to put the CUBE extension to practical use. It was fun to see what is doing and just how much work it saves you, but now you need to see its practical use.
在四年级英语课上,当教我们一个新词,Draper老师就会说:“用它造句”。很相似,你现在需要将CUBE扩展用于实践。看它做了什么,为你节约了多少工作是非常有趣的,但是现在你需要了解它在实践中的运用了。
When using the GROUP BY clause to perform aggregations, you’ve probably written several similar
SQL statements—just so you could see the aggregations based on different sets of columns, much like what is seen in Listing 7-10. You already know that the CUBE extension can eliminate a lot of work in the database, so let’s now put it to “real world” practice, using the test demo test data created earlier.
当使用GROUP BY子句执行聚集,你可能要写好几个相似的SQL语句--这样你就能看到基于不同的列集的聚合,就像列表7-10所看到的。你已经知道CUBE扩展能减少数据库很多工作,让我做一个“现实世界”的实践,使用先前创建的测试demo测试数据。
The SALES_HISTORY schema contains sales data for the years 1998 – 2001. You need to provide a
report to satisfy the following request: “Please show me all sales data for the year 2001. I would like to see sales summarized by product category, with aggregates based on 10-year customer age ranges, income levels, as well as summaries broken out by income level regardless of age group, and by age group regardless of income levels.”
SALES_HISTORY schema包含1998-2001的销售数据。你需要提供一个报告满足以下要求:“请给我展示2001年所有销售数据。我想看基于1)产品种类 (product category),10年客户年龄段的客户群,收入级别的,2)还有只考虑收入级别而不考虑年龄组的,3)只考虑年龄组而不考虑收入级别的,销售汇总值。”
Your task probably seems daunting at first, but you know all the data is available. You will need to
build a query using the COSTS, CUSTOMERS, PRODUCTS, SALES, and TIMES tables. (Now would be a
good time to put this book down and try your hand at building such a query.) Perhaps you will create a query like the one in Listing 7-11, as it is a common type of solution for such a request. Prior to the
introduction of the CUBE extension, Listing 7-11 is style of query that would be needed to satisfy the
request.
你的任务初看很繁琐,但是你知道所有的数据都是有效的。你需要使用COSTS, CUSTOMERS, PRODUCTS, SALES, 和TIMES构建一个查询。(现在是个绝好时机,放下书然后动手构建这个查询)。可能你将创建一个像列表7-11所示的查询,它是这种需求的一般形式的解法。在CUBE扩展引入之前,列表7-11是满足需求的必然的查询形式。
Looking at Listing 7-11, you will find four separate queries joined by the UNION ALL operator. These queries are labeled Q1-Q4. The output from the query includes a QUERY_TAG column so that the results from each separate query can clearly be identified in the output. The customer is happy; the output is exactly the output asked for. The query can also be easily changed to report on data for any year.
查看列表7-11,你将发现四个独立的查询通过UNION ALL操作连接在一起。这些查询被标识为Q1-Q4。查询的输出包含一个QUERY_TAG列这样(最终)结果来自于哪个独立的查询(的结果)就能清楚的确认。顾客是高兴的;结果准确地符合要求。报告任何年份的数据查询也是容易更改的。
The operations folks that run the Data Center, however, are not so happy with this new report.
When you take a look at the query statistics for the SQL, you can understand why they may not hold this report in high regard. Maybe it’s the 10521 physical reads that concerns them. If the query were run only once, this would not be problem, but the marketing folks are running this query multiple times daily to report on different years, trying to discover sales trends, and it is causing all sorts of havoc as IO rates and response times increase for other users of the database.
但是数据中心的操作人员,然而,对这份新报告就不会那么满意了。当你看一下SQL的查询统计,你就能理解为什么他们不会对这份报告高度评价了。可能他们考虑到它有10521次物理读。如果查询只运行一次,这可能不算什么问题,但是市场人员要每天运行这个查询多次来报告不同的年份(的数据),试图发现销售趋势,这对其他数据用户而言将导致各种灾难,诸如IO 吞吐量和响应时间的增加。
Now you see that there are four table scans taking place in the execution plan. The factored
subquery tsales allows the optimizer to create a temporary table that can then be used by all the queries in the gb subquery, but the use of UNION ALL makes it necessary to do four full table scans on that table, resulting in a lot of database IO.
你可以看出在执行计划中有四次全表扫描发生。因子子查询tsale允许优化器创建一临时表,能被gb子查询中的所有查询所使用,但是使用UNION ALL使得它必须在那(临时)表上做四次全表扫描,产生了很多数据库IO操作。
Listing 7-11. UNION ALL Query of Sales Data
1 with tsales as (
2 select /*+ gather_plan_statistics */
3 s.quantity_sold
4 , s.amount_sold
5 , to_char(mod(cust_year_of_birth,10) * 10 ) || '-' ||
6 to_char((mod(cust_year_of_birth,10) * 10 ) + 10) age_range
7 , nvl(c.cust_income_level,'A: Below 30,000') cust_income_level
8 , p.prod_name
9 , p.prod_desc
10 , p.prod_category
11 , (pf.unit_cost * s.quantity_sold) total_cost
12 , s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
13 from sh.sales s
14 join sh.customers c on c.cust_id = s.cust_id
15 join sh.products p on p.prod_id = s.prod_id
16 join sh.times t on t.time_id = s.time_id
17 join sh.costs pf on
18 pf.channel_id = s.channel_id
19 and pf.prod_id = s.prod_id
20 and pf.promo_id = s.promo_id
21 and pf.time_id = s.time_id
22 where (t.fiscal_year = 2001)
23 )
24 , gb as (
25 select -- Q1 - all categories by cust income and age range
26 'Q1' query_tag
27 , prod_category
28 , cust_income_level
29 , age_range
30 , sum(profit) profit
31 from tsales
32 group by prod_category, cust_income_level, age_range
33 union all
34 select -- Q2 - all categories by cust age range
35 'Q2' query_tag
36 , prod_category
37 , 'ALL INCOME' cust_income_level
38 , age_range
39 , sum(profit) profit
40 from tsales
41 group by prod_category, 'ALL INCOME', age_range
42 union all
43 select -- Q3 - all categories by cust income
44 'Q3' query_tag
45 , prod_category
46 , cust_income_level
47 , 'ALL AGE' age_range
48 , sum(profit) profit
49 from tsales
50 group by prod_category, cust_income_level, 'ALL AGE'
51 union all
52 select -- Q4 - all categories
53 'Q4' query_tag
54 , prod_category
55 , 'ALL INCOME' cust_income_level
56 , 'ALL AGE' age_range
57 , sum(profit) profit
58 from tsales
59 group by prod_category, 'ALL INCOME', 'ALL AGE'
60 )
61 select *
62 from gb
63 order by prod_category, profit;
QUERY AGE
TAG PRODUCT CATEGORY INCOME LEVEL RANGE PROFIT
------ ------------------------------ -------------------- -------- ---------------
…
Q2 Hardware K: 250,000 - 299,999 ALL AGE $26,678.00
Q2 Hardware L: 300,000 and above ALL AGE $28,974.28
Q1 Hardware F: 110,000 - 129,999 70-80 $30,477.16
Q2 Hardware J: 190,000 - 249,999 ALL AGE $43,761.47
Q2 Hardware B: 30,000 - 49,999 ALL AGE $53,612.04
Q2 Hardware A: Below 30,000 ALL AGE $55,167.88
Q2 Hardware I: 170,000 - 189,999 ALL AGE $57,089.05
Q2 Hardware C: 50,000 - 69,999 ALL AGE $76,612.64
Q3 Hardware ALL INCOME 60-70 $85,314.04
Q3 Hardware ALL INCOME 10-20 $90,849.87
Q3 Hardware ALL INCOME 0-10 $92,207.47
Q3 Hardware ALL INCOME 50-60 $93,811.96
Q3 Hardware ALL INCOME 80-90 $95,391.82
Q2 Hardware H: 150,000 - 169,999 ALL AGE $95,437.74
Q3 Hardware ALL INCOME 40-50 $97,492.51
Q3 Hardware ALL INCOME 20-30 $101,140.69
Q2 Hardware D: 70,000 - 89,999 ALL AGE $102,940.44
Q3 Hardware ALL INCOME 30-40 $102,946.85
Q3 Hardware ALL INCOME 90-100 $110,310.69
Q2 Hardware G: 130,000 - 149,999 ALL AGE $112,688.64
Q3 Hardware ALL INCOME 70-80 $117,920.88
Q2 Hardware E: 90,000 - 109,999 ALL AGE $135,154.59
Q2 Hardware F: 110,000 - 129,999 ALL AGE $199,270.01
Q4 Hardware ALL INCOME ALL AGE $987,386.78
...
714 rows selected.
Elapsed: 00:00:14.53
Statistics
----------------------------------------------------------
18464 recursive calls
4253 db block gets
22759 consistent gets
10521 physical reads
4216 redo size
25086 bytes sent via SQL*Net to client
601 bytes received via SQL*Net from client
9 SQL*Net roundtrips to/from client
174 sorts (memory)
0 sorts (disk)
714 rows processed
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
| Id |Operation |Name |Starts |E-Rows |A-Rows |
-----------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1 | | 714 |
| 1| TEMP TABLE TRANSFORMATION| | 1 | | 714 |
| 2| LOAD AS SELECT | | 1 | | 0 |
|* 3| HASH JOIN | | 1 | 17116 | 258K|
| 4| TABLE ACCESS FULL |PRODUCTS | 1 | 72 | 72 |
|* 5| HASH JOIN | | 1 | 17116 | 258K|
|* 6| HASH JOIN | | 1 | 17116 | 258K|
|* 7| TABLE ACCESS FULL |TIMES | 1 | 304 | 364 |
| 8| PARTITION RANGE AND | | 1 | 82112 | 259K|
|* 9| HASH JOIN | | 4 | 82112 | 259K|
| 10| TABLE ACCESS FULL |COSTS | 4 | 82112 | 29766 |
| 11| TABLE ACCESS FULL |SALES | 4 | 918K| 259K|
| 12| TABLE ACCESS FULL |CUSTOMERS | 1 | 55500 | 55500 |
| 13| SORT ORDER BY | | 1 | 16 | 714 |
| 14| VIEW | | 1 | 16 | 714 |
| 15| UNION-ALL | | 1 | | 714 |
| 16| HASH GROUP BY | | 1 | 3 | 599 |
| 17| VIEW | | 1 | 17116 | 258K|
| 18| TABLE ACCESS FULL |SYS_TEMP_0FD9D6620_8BE55C| 1 | 17116 | 258K|
| 19| HASH GROUP BY | | 1 | 4 | 60 |
| 20| VIEW | | 1 | 17116 | 258K|
| 21| TABLE ACCESS FULL |SYS_TEMP_0FD9D6620_8BE55C| 1 | 17116 | 258K|
| 22| HASH GROUP BY | | 1 | 4 | 50 |
| 23| VIEW | | 1 | 17116 | 258K|
| 24| TABLE ACCESS FULL |SYS_TEMP_0FD9D6620_8BE55C| 1 | 17116 | 258K|
| 25| HASH GROUP BY | | 1 | 5 | 5 |
| 26| VIEW | | 1 | 17116 | 258K|
| 27| TABLE ACCESS FULL |SYS_TEMP_0FD9D6620_8BE55C| 1 | 17116 | 258K|
-----------------------------------------------------------------------------------
Thinking back on your earlier experiment with CUBE, you know that multiple queries each doing a
GROUP BY and joined by UNION ALL can be replaced with one query using GROUP BY with the CUBE
extension. This is due to the requirement to create summaries based on all possible combinations of the CUST_INCOME_LEVEL and AGE_RANGE columns output from the tsales subquery. The CUBE extension can accomplish the same result, but with less code and less database IO.
回想下之前你对CUBE所作的实验,你知道多个各自都做一次GROUP BY的查询再用UNION ALL连接,能被一个带有CUBE扩展的GROUP BY 查询所替换。这是由于需求要创建:来至于tsales子查询输出,基于CUST_INCOME_LEVEL和AGE_RANGE列各种组合的汇总。 CUBE扩展能完成相同的结果,但是用更少的代码和更少的数据库IO操作。
While the difference in IO rate and timing in that earlier experiment was not very significant, you
will see that when used with larger data sets, the difference can be substantial. Listing 7-12 shows the query after it has been modified to use the CUBE extension to GROUP BY. After running the new query, the first thing you look at are the statistics and the execution plan. Removing the entire gb subquery and using GROUP BY CUBE on the output from the tsales subquery reduced physical IO from 10521 physical reads to 2169, nearly a factor of 5. That alone is enough to recommend the use of CUBE; the fact that it results in much less SQL to write is a bonus.
虽然之前的实验在IO吞吐量和耗时上的区别不是很显著,你将看到随着大数据集的使用,区别是很大的。列表7-12展示了修改成带有CUBE扩展GROUP BY的查询。在运行新查询之后,你首先要看统计信息和执行计划。去除整个gb子查询和在tsales子查询输出基础上使用GROUP BY CUBE把物理IO从10521次物理读降到2169次,将近5倍。仅凭这一点就足以推荐使用CUBE;事实上另一个好处是它所需的SQL也较少。
Listing 7-12. Replace UNION ALL with CUBE
1 with tsales as (
2 select /*+ gather_plan_statistics */
3 s.quantity_sold
4 , s.amount_sold
5 , to_char(mod(cust_year_of_birth,10) * 10 ) || '-' ||
6 to_char((mod(cust_year_of_birth,10) * 10 ) + 10) age_range
7 , nvl(c.cust_income_level,'A: Below 30,000') cust_income_level
8 , p.prod_name
9 , p.prod_desc
10 , p.prod_category
11 , (pf.unit_cost * s.quantity_sold) total_cost
12 , s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
13 from sh.sales s
14 join sh.customers c on c.cust_id = s.cust_id
15 join sh.products p on p.prod_id = s.prod_id
16 join sh.times t on t.time_id = s.time_id
17 join sh.costs pf on
18 pf.channel_id = s.channel_id
19 and pf.prod_id = s.prod_id
20 and pf.promo_id = s.promo_id
21 and pf.time_id = s.time_id
22 where (t.fiscal_year = 2001)
23 )
24 select
25 'Q' || decode(cust_income_level,
26 null,decode(age_range,null,4,3),
27 decode(age_range,null,2,1)
28 ) query_tag
29 , prod_category
30 , cust_income_level
31 , age_range
32 , sum(profit) profit
33 from tsales
34 group by prod_category, cube(cust_income_level,age_range)
35 order by prod_category, profit;
QUERY AGE
TAG PRODUCT CATEGORY INCOME LEVEL RANGE PROFIT
------ ------------------------------ -------------------- -------- ---------------
...
Q2 Hardware K: 250,000 - 299,999 $26,678.00
Q2 Hardware L: 300,000 and above $28,974.28
Q1 Hardware F: 110,000 - 129,999 70-80 $30,477.16
Q2 Hardware J: 190,000 - 249,999 $43,761.47
Q2 Hardware B: 30,000 - 49,999 $53,612.04
Q2 Hardware A: Below 30,000 $55,167.88
Q2 Hardware I: 170,000 - 189,999 $57,089.05
Q2 Hardware C: 50,000 - 69,999 $76,612.64
Q3 Hardware 60-70 $85,314.04
Q3 Hardware 10-20 $90,849.87
Q3 Hardware 0-10 $92,207.47
Q3 Hardware 50-60 $93,811.96
Q3 Hardware 80-90 $95,391.82
Q2 Hardware H: 150,000 - 169,999 $95,437.74
Q3 Hardware 40-50 $97,492.51
Q3 Hardware 20-30 $101,140.69
Q2 Hardware D: 70,000 - 89,999 $102,940.44
Q3 Hardware 30-40 $102,946.85
Q3 Hardware 90-100 $110,310.69
Q2 Hardware G: 130,000 - 149,999 $112,688.64
Q3 Hardware 70-80 $117,920.88
Q2 Hardware E: 90,000 - 109,999 $135,154.59
Q2 Hardware F: 110,000 - 129,999 $199,270.01
Q4 Hardware $987,386.78
...
714 rows selected.
Elapsed: 00:00:08.98
Statistics
----------------------------------------------------------
17901 recursive calls
0 db block gets
5935 consistent gets
2169 physical reads
260 redo size
24694 bytes sent via SQL*Net to client
601 bytes received via SQL*Net from client
9 SQL*Net roundtrips to/from client
174 sorts (memory)
0 sorts (disk)
714 rows processed
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 714 |
| 1 | SORT ORDER BY | | 1 | 2251 | 714 |
| 2 | SORT GROUP BY | | 1 | 2251 | 714 |
| 3 | GENERATE CUBE | | 1 | 2251 | 2396 |
| 4 | SORT GROUP BY | | 1 | 2251 | 599 |
|* 5 | HASH JOIN | | 1 | 17116 | 258K|
| 6 | VIEW | index$_join$_004 | 1 | 72 | 72 |
|* 7 | HASH JOIN | | 1 | | 72 |
| 8 | INDEX FAST FULL SCAN| PRODUCTS_PK | 1 | 72 | 72 |
| 9 | INDEX FAST FULL SCAN| PRODUCTS_PROD_CAT_IX | 1 | 72 | 72 |
|* 10 | HASH JOIN | | 1 | 17116 | 258K|
|* 11 | HASH JOIN | | 1 | 17116 | 258K|
|* 12 | TABLE ACCESS FULL | TIMES | 1 | 304 | 364 |
| 13 | PARTITION RANGE AND | | 1 | 82112 | 259K|
|* 14 | HASH JOIN | | 4 | 82112 | 259K|
| 15 | TABLE ACCESS FULL | COSTS | 4 | 82112 | 29766 |
| 16 | TABLE ACCESS FULL | SALES | 4 | 918K| 259K|
| 17 | TABLE ACCESS FULL | CUSTOMERS | 1 | 55500 | 55500 |
---------------------------------------------------------------------------------------