CAMPAIGN SESSION,MONTH,YEAR 2016 First Campaign,1,2016 2016 First Campaign,2,2016 2016 First Campaign,3,2016 2016 First Campaign,4,2016 2016 Second Campaign,5,2016 2016 Second Campaign,6,2016 2016 Second Campaign,7,2016 2016 Third Campaign,8,2016 2016 Last Campaign,9,2016 2016 Last Campaign,10,2016 2016 Last Campaign,11,2016 2016 Last Campaign,12,2016如上所示,促销期数据源的粒度是月,因为每行都有一个月份元素。而且一个促销期可能延续多个月,正如上面显示的2016年第一个促销期有四个月。这意味着促销期信息重复了四次,也就是四行。比方说希望简化促销期源数据的准备工作,每个促销期不管有多长,只准备一行数据。新的数据格式可以改成下面所示,存在non_campaign_session.csv文件中。
2016 First Campaign,1,2016,4,2016 2016 Second Campaign,5,2016,7,2016 2016 Third Campaign,8,2016,8,2016 2016 Last Campaign,9,2016,12,20161. 修改促销期装载脚本
USE rds; CREATE TABLE non_straight_campaign ( campaign_session CHAR(30), start_month CHAR(9), start_year INT, end_month CHAR(9), end_year INT ) row format delimited fields terminated by ',' stored as textfile;注意新的过渡表既有开始年月列也有结束年月列。下面给出了修改后的促销期装载脚本。
use rds; load data local inpath '/root/non_campaign_session.csv' overwrite into table non_straight_campaign; use dw; drop table if exists tmp; create table tmp as select t1.month_sk, t1.month, t1.month_name, t3.campaign_session, t1.quarter, t1.year from month_dim t1, month_dim t2, rds.non_straight_campaign t3 where t1.year = t3.start_year and t1.month >= t3.start_month and t2.year = t3.end_year and t2.month <= t3.end_month and t1.year = t2.year and t1.month = t2.month; delete from month_dim where month_dim.month_sk in (select month_sk from tmp); insert into month_dim select * from tmp;2. 测试
USE dw; UPDATE month_dim SET campaign_session = NULL;执行修改后的促销期装载脚本后,查询month_dim表,确认它被正确地装载,查询语句如下。
select month_sk m_sk, month_name, month m, campaign_session,quarter q from dw.month_dim where year = 2016;
查询结果如下图所示。