最近在使用starrocks来进行实时数据项目的开发,尝试使用了一下starrocks的异步物化视图。
使用版本: 3.1.2-4f3a2ee
创建三个测试表, 注意只有test_mv_table1为分区表,其他两个都是非分区表:
CREATE TABLE `test_mv_table1` (
`periodday` DATE NOT NULL COMMENT "",
`fid` varchar(44) NOT NULL COMMENT "",
`fnumber` int NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`periodday`, `fid`)
COMMENT "数据1-分区"
PARTITION BY date_trunc('month', `periodday`)
DISTRIBUTED BY HASH(`fid`)
ORDER BY(`fid`, `fnumber`)
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true"
);
CREATE TABLE `test_mv_table2` (
`fid` varchar(44) NOT NULL COMMENT "",
`fnumber` int NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`fid`)
COMMENT "数据2-明细"
DISTRIBUTED BY HASH(`fid`)
ORDER BY(`fid`, `fnumber`)
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true"
);
CREATE TABLE `test_mv_table3` (
`fid` varchar(44) NOT NULL COMMENT "",
`fnumber` int NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`fid`)
COMMENT "数据3-明细"
DISTRIBUTED BY HASH(`fid`)
ORDER BY(`fid`, `fnumber`)
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true"
);
分别在三张表中插入初始化数据:
insert into test_mv_table1 (periodday, fid, fnumber) values
("2023-09-01", "aaa", 111),
("2023-09-02", "bbb", 222),
("2023-09-03", "ccc", 333),
("2023-10-01", "aaa", 111),
("2023-10-02", "bbb", 222),
("2023-10-03", "ccc", 333),
("2023-11-01", "aaa", 111),
("2023-11-02", "bbb", 222),
("2023-11-03", "ccc", 333);
insert into test_mv_table2 (fid, fnumber) values
("aaa", 666),
("bbb", 777),
("ccc", 888);
insert into test_mv_table3 (fid, fnumber) values
("aaa", 22200),
("bbb", 33300),
("ccc", 44400);
创建物化视图,物化视图的分区只能对应一个基表:
CREATE MATERIALIZED VIEW view_mv_test
COMMENT 'test-物化视图'
PARTITION BY `periodday`
DISTRIBUTED BY HASH(`fnumber`)
REFRESH ASYNC
PROPERTIES (
"replication_num" = "3",
"excluded_trigger_tables"="
test_mv_table2
,test_mv_table3",
"session.exec_mem_limit"="9147483648",
"session.query_timeout"="259000",
"session.new_planner_optimize_timeout"="5000",
"session.parallel_fragment_exec_instance_num"="10"
)
AS
select
a.periodday as periodday,
b.fnumber as fnumber,
a.fid as fid,
now() as insert_time
from test_mv_table1 a
left join test_mv_table2 b
on a.fid = t.fid
inner join test_mv_table3 c
on a.fid = c.fid
and b.fid = c.fid
;
注意这里创建的是自动刷新,也可以创建为定时刷新:
CREATE MATERIALIZED VIEW view_mv_test
COMMENT 'test-物化视图'
PARTITION BY `periodday`
DISTRIBUTED BY HASH(`fnumber`)
REFRESH ASYNC START('2023-11-20 10:00:00') EVERY (interval 1 minute)
PROPERTIES (
"replication_num" = "3",
"excluded_trigger_tables"="
test_mv_table2
,test_mv_table3",
"session.exec_mem_limit"="9147483648",
"session.query_timeout"="259000",
"session.new_planner_optimize_timeout"="5000",
"session.parallel_fragment_exec_instance_num"="10"
)
AS
select
a.periodday as periodday,
b.fnumber as fnumber,
a.fid as fid,
now() as insert_time
from test_mv_table1 a
left join test_mv_table2 b
on a.fid = t.fid
inner join test_mv_table3 c
on a.fid = c.fid
and b.fid = c.fid
;
创建完成后,数据会进行初始化计算
参考文档: https://docs.starrocks.io/zh-cn/latest/using_starrocks/data_modeling_with_materialized_views
通过物化视图将事实表和多个维度表进行关联:
这种分区关联可以支持多种业务场景:
注意这里,维度表的更新,是会触发整个物化视图的更新的