一、数据准备
事实表:dwd_payment_info
维度表:dwd_order_info和dwd_user_info
1.1建表
- dwd_payment_info
hive (gmall)>
drop table if exists dwd_payment_info;
create external table dwd_payment_info(
`id` bigint COMMENT '',
`out_trade_no` string COMMENT '',
`order_id` string COMMENT '',
`user_id` string COMMENT '',
`alipay_trade_no` string COMMENT '',
`total_amount` decimal(16,2) COMMENT '',
`subject` string COMMENT '',
`payment_type` string COMMENT '',
`payment_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_payment_info/'
tblproperties ("parquet.compression"="snappy")
;
- dwd_order_info
hive (gmall)>
drop table if exists dwd_order_info;
create external table dwd_order_info (
`id` string COMMENT '',
`total_amount` decimal(10,2) COMMENT '',
`order_status` string COMMENT ' 1 2 3 4 5',
`user_id` string COMMENT 'id',
`payment_way` string COMMENT '',
`out_trade_no` string COMMENT '',
`create_time` string COMMENT '',
`operate_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_order_info/'
tblproperties ("parquet.compression"="snappy")
;
- dwd_user_info
hive (gmall)>
drop table if exists dwd_user_info;
create external table dwd_user_info(
`id` string COMMENT 'id',
`name` string COMMENT '',
`birthday` string COMMENT '',
`gender` string COMMENT '',
`email` string COMMENT '',
`user_level` string COMMENT '',
`create_time` string COMMENT ''
)
PARTITIONED BY (`dt` string)
stored as parquet
location '/warehouse/gmall/dwd/dwd_user_info/'
tblproperties ("parquet.compression"="snappy")
;
此三张表组成简单的星型模型
因为dwd_order_info和dwd_user_info表为每日分区,kylin不支持维度表分区,导致外键重复,解决办法是使用临时表或者视图。
对维度表创建视图:
- dwd_order_view
hive (gmall)>
create view dwd_order_view as select * from dwd_order_info where dt=current_date;
- dwd_user_view
hive (gmall)>
create view dwd_user_view as select * from dwd_user_info where dt=current_date;
二、kylin操作
1.创建project(类比database)
点Add Project->gmall->test
2.导数据
data soucre->load table from tree-选择准备的三张表
选完后表名会变粗
下面可以看到有表元数据了:
3.创建model
3.1 点击new model->Model Name:module_payment
3.2 选择事实表
3.3 添加维度表
3.3.1 DWD_PAYMENT_INFO -> INNER JOIN -> DWD_ORDERE_INFO -> New Join Condition: ORDER_ID=ID
3.3.2 DWD_PAYMENT_INFO -> INNER JOIN -> DWD_USER_INFO -> New Join Condition:USER_ID=ID
3.4.Dimensions(维)
1.DWD_PAYMENT_INFO : PAYMENTN_TYPE
2.DWD_ORDER-INFO : PARMENT_WAY
3.DWD_USER_INFO : GENDER, USER_LEVEL
3.5. Messures(度量)
1.DWD_PAYMENT_INFO : TOTAL_AMOUNT
3.6.Settings
3.6.1Partiton
select Partition Table -> DWD_PAYMENT_INFO -> DT -> yyyy-MM-dd
3.6.2 Filter(过滤)
根据自己业务需要
4 创建cube
4.1 Cube info -> module_payment -> Cube_payment
4.2 Dimensions(维度)
Add Dimensions -> DWD_PAYMENT_INFO[FactTable]:选PAYMENT_TYPE -> DWD_ORDER_INFO:选PAYMENT_WAY -> DWD_USER_INFO: 选 GENDER和USER_LEVEL
另外,我们要选Normal,不选Derived(衍生,优化)
4.3 Measures(度量)
4.4Defresh Setting
直接默认值
每天做一个构建,数据存hbase,每天在hbase中新生成一个表,导致hbase去查数据时如果查询一个月的数据就要查询30个表,会很慢,所以就根据这个setting合并,7天一小并(将每天的合并),28天一大并(将每7天的数据合并)