时间维度表是数据仓库的重要统计项:很多统计都是基于时间。
下面是我的一个时间维度表:
网上很多用mysql或者是oracle的存储过程初始化数据的,下面我用Hive Sql初始化,记录一下。
1,首先设置2个变量 初始开始日期和初始结束日期:
0: jdbc:hive2://node1.ansunangel.com:2181,nod> set hivevar:start_day=2020-07-01;
No rows affected (0.004 seconds)
0: jdbc:hive2://node1.ansunangel.com:2181,nod> set hivevar:end_day=2020-08-01;
No rows affected (0.004 seconds)
2,通过hive的datediff函数算出2个日期的间隔天数31天。
select datediff("${end_day}", "${start_day}");
INFO : Compiling command(queryId=hive_20200805170701_f6e524fb-71c3-42f3-bba2-6307fb5a9313): select datediff("2020-08-01", "2020-07-01")
INFO : Executing command(queryId=hive_20200805170701_f6e524fb-71c3-42f3-bba2-6307fb5a9313): select datediff("2020-08-01", "2020-07-01")
INFO : Completed executing command(queryId=hive_20200805170701_f6e524fb-71c3-42f3-bba2-6307fb5a9313); Time taken: 0.004 seconds
INFO : OK
+------+
| _c0 |
+------+
| 31 |
+------+
3,然后通过repeat函数,初始31个字符o
0: jdbc:hive2://node1.ansunangel.com:2181,nod> select repeat('o',31);
INFO : Compiling command(queryId=hive_20200805171047_82678305-d51c-4f98-99b1-cf3aa77c2e13): select repeat('o',31)
=hive_20200805171047_82678305-d51c-4f98-99b1-cf3aa77c2e13); Time taken: 0.004 seconds
INFO : OK
+----------------------------------+
| _c0 |
+----------------------------------+
| ooooooooooooooooooooooooooooooo |
+----------------------------------+
4,在通过split函数讲上面的输出得到一个大小为31的空数组。
0: jdbc:hive2://node1.ansunangel.com:2181,nod> select split('ooooooooooooooooooooooooooooooo','o');
INFO : Completed executing command(queryId=hive_20200805171255_a2c9911d-4d75-4df8-b385-9716400f58ce); Time taken: 0.004 seconds
INFO : OK
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| ["","","","","","","","","","","","","","","","","","","","","","","","","","","","","","","",""] |
5,通过posexplode函数讲大小的为31的数组行转列变成31个空行
0: jdbc:hive2://node1.ansunangel.com:2181,nod> select posexplode(split("ooooooooooooooooooooooooooooooo", "o"));
INFO : Completed executing command(queryId=hive_20200805171616_fa236d28-eda1-4dcb-b975-a4044d21a56b); Time taken: 0.004 seconds
INFO : OK
+------+------+
| pos | val |
+------+------+
| 0 | |
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 | |
| 12 | |
| 13 | |
| 14 | |
| 15 | |
| 16 | |
| 17 | |
| 18 | |
| 19 | |
| 20 | |
| 21 | |
| 22 | |
| 23 | |
| 24 | |
| 25 | |
| 26 | |
| 27 | |
| 28 | |
| 29 | |
| 30 | |
| 31 | |
+------+------+
32 rows selected (0.109 seconds)
完整的HQL:
set hivevar:start_day=2020-07-01;
set hivevar:end_day=2020-08-01;
with dates as (
select date_add("${start_day}", a.pos) as d
from (select posexplode(split(repeat("o", datediff("${end_day}", "${start_day}")), "o"))) a
)
insert into dwd_dim_date
select
d as d
, year(d) as year
, month(d) as month
, day(d) as day
, quarter(d) as quarter
, ''
, date_format(d, 'u') as daynumber_of_week
, concat(year(d),month(d))
from dates
order by year,month,day
;
最后的时间维度表