大多数维度都具有一个或多个层次。例如,示例数据仓库中的日期维度就有一个四级层次:年、季度、月和日。这些级别用date_dim表里的列表示。日期维度是一个单路径层次,因为除了年-季度-月-日这条路径外,它没有任何其它层次。为了识别数据仓库里一个维度的层次,首先要理解维度中列的含义,然后识别两个或多个列是否具有相同的主题。例如,年、季度、月和日具有相同的主题,因为它们都是关于日期的。具有相同主题的列形成一个组,组中的一列必须包含至少一个组内的其它成员(除了最低级别的列),如在前面提到的组中,月包含日。这些列的链条形成了一个层次,例如,年-季度-月-日这个链条是一个日期维度的层次。除了日期维度,邮编维度中的地理位置信息,产品维度的产品与产品分类,也都构成层次关系。表1显示了三个维度的层次。
zip_code_dim |
product_dim |
date_dim |
zip_code |
product_name |
date |
city |
product_category |
month |
state |
|
quarter |
|
|
year |
select product_category,year,quarter,month,sum(order_amount) s_amount from v_sales_order_fact a,product_dim b,date_dim c where a.product_sk = b.product_sk and a.year_month = c.year * 100 + c.month group by product_category, year, quarter, month order by product_category, year, quarter, month;这是一个非常简单的分组查询,结果输出的每一行度量(销售订单金额)都沿着年-季度-月的层次分组。
-- 使用union all select product_category, time, order_amount from (select product_category, case when sequence = 1 then 'year: '||time when sequence = 2 then 'quarter: '||time else 'month: '||time end time, order_amount, sequence, date from (select product_category, min(date) date, year time, 1 sequence, sum(order_amount) order_amount from v_sales_order_fact a, product_dim b, date_dim c where a.product_sk = b.product_sk and a.year_month = c.year * 100 + c.month group by product_category , year union all select product_category, min(date) date, quarter time, 2 sequence, sum(order_amount) order_amount from v_sales_order_fact a, product_dim b, date_dim c where a.product_sk = b.product_sk and a.year_month = c.year * 100 + c.month group by product_category , year , quarter union all select product_category, min(date) date, month time, 3 sequence, sum(order_amount) order_amount from v_sales_order_fact a, product_dim b, date_dim c where a.product_sk = b.product_sk and a.year_month = c.year * 100 + c.month group by product_category , year , quarter , month) x) y order by product_category , date , sequence , time; -- 使用grouping sets select product_category, case when gid = 3 then 'year: '||year when gid = 1 then 'quarter: '||quarter else 'month: '||month end time, order_amount from (select product_category, year, quarter, month, min(date) date, sum(order_amount) order_amount,grouping(product_category,year,quarter,month) gid from v_sales_order_fact a, product_dim b, date_dim c where a.product_sk = b.product_sk and a.year_month = c.year * 100 + c.month group by grouping sets ((product_category,year,quarter,month),(product_category,year,quarter),(product_category,year))) x order by product_category , date , gid desc, time;以上两种不同写法的查询语句结果相同,如图1所示。
alter table tds.month_dim add column campaign_session varchar(30) default null; comment on column tds.month_dim.campaign_session is '促销期'; create table rds.campaign_session (campaign_session varchar(30),month smallint,year smallint);
假设所有促销期都不跨年,并且一个促销期可以包含一个或多个月份,但一个月份只能属于一个促销期。为了理解促销期如何工作,表2给出了一个促销期定义的示例。
促销期 |
月份 |
2017 年第一促销期 |
1月—4月 |
2017 年第二促销期 |
5月—7月 |
2017 年第三促销期 |
8月 |
2017 年第四促销期 |
9月—12月 |
表2
每个促销期有一个或多个月。一个促销期也许并不是正好一个季度,也就是说,促销期级别不能上卷到季度,但是促销期可以上卷至年级别。假设2017年促销期的数据如下,并保存在/home/gpadmin/campaign_session.csv文件中。2016 First Campaign,1,2017 2017 First Campaign,2,2017 2017 First Campaign,3,2017 2017 First Campaign,4,2017 2017 Second Campaign,5,2017 2017 Second Campaign,6,2017 2017 Second Campaign,7,2017 2017 Third Campaign,8,2017 2017 Last Campaign,9,2017 2017 Last Campaign,10,2017 2017 Last Campaign,11,2017 2017 Last Campaign,12,2017现在可以执行下面的脚本把2017年的促销期数据装载进月维度。
copy rds.campaign_session from '/home/gpadmin/campaign_session.csv' with delimiter ','; set search_path = tds; create table tmp as select t1.month_sk month_sk, t1.month month1, t1.month_name month_name, t1.quarter quarter, t1.year year1, t2.campaign_session campaign_session from month_dim t1 left join rds.campaign_session t2 on t1.year = t2.year and t1.month = t2.month; truncate table month_dim; insert into month_dim select * from tmp; drop table tmp;此时查询月份维度表,可以看到2017年的促销期已经有数据,其它年份的campaign_session字段值为null,如图2所示。
,1,2017 2017 Early Spring Campaign,2,2017 2017 Early Spring Campaign,3,2017 ,4,2017 2017 Spring Campaign,5,2017 ,6,2017 2017 Last Campaign,7,2017 2017 Last Campaign,8,2017 ,9,2017 ,10,2017 ,11,2017 ,12,2017重新向month_dim表装载促销期数据。
truncate table rds.campaign_session; copy rds.campaign_session from '/home/gpadmin/ragged_campaign.csv' with delimiter ','; set search_path = tds; create table tmp as select t1.month_sk month_sk, t1.month month1, t1.month_name month_name, t1.quarter quarter, t1.year year1, null campaign_session from month_dim t1; truncate table month_dim; insert into month_dim select t1.month_sk, t1.month1, t1.month_name, t1.quarter, t1.year1, case when t2.campaign_session != '' then t2.campaign_session else t1.month_name end campaign_session from tmp t1 left join rds.campaign_session t2 on t1.year1 = t2.year and t1.month1 = t2.month; drop table tmp;在有促销期的月份,campaign_session列填写促销期名称,而对于没有促销期的月份,该列填写月份名称。轻微参差不齐层次没有固定的层次深度,但层次深度有限。如地理层次深度通常包含3到6层。与其使用复杂的机制构建难以预测的可变深度层次,不如将其变换为固定深度位置设计,针对不同的维度属性确立最大深度,然后基于业务规则放置属性值。
下面的语句查询年-促销期-月层次,查询结果如图3所示。
select product_category,
case when gid = 3 then cast(year as varchar(10))
when gid = 1 then campaign_session
else month_name
end time,
order_amount
from (select product_category,
year,
campaign_session,
month,
month_name,
sum(month_order_amount) order_amount,
sum(month_order_quantity) order_quantity,
grouping(product_category,year,campaign_session,month) gid,
min(month) min_month
from v_month_end_sales_order_fact a, product_dim b, month_dim c
where a.product_sk = b.product_sk
and a.year_month = c.year * 100 + c.month
and c.year = 2017
group by grouping sets ((product_category,year,campaign_session,month,month_name),(product_category,year,campaign_session),(product_category,year))) x
order by product_category, min_month, gid desc, month;