PipelineDB ,
有一个很好用的流式计算功能。将想要得到的结果逻辑,存储为表的雾化视图。
在插入数据的时候不会存储数据本身,是在每次插入或改变的时候按照当初建好的VIEW的逻辑存储数据,所以需要存储的数据很少。
实验:
Cent OS 7 + PG 10.5 + PipelineDB 1.0.0-4
建表
mytest=# CREATE FOREIGN TABLE t_test1104_pl_steam(
mytest(# id int,
mytest(# col1 varchar(100),
mytest(# col2 int,
mytest(# c_time timestamp without time zone)
mytest-# SERVER pipelinedb;
CREATE FOREIGN TABLE
实验插入的数据为:
id:随机生成0--9
col1:随机生成a--z
col2:随机生成0--100
c_time:now() (本实验中没有卵用...)
视图1: 根据id分组,计算col2的各类值.最后只会有10行数据,但是在插入过程中,数据一直在改变。可以在插入的时候不断做查询。
mytest=# CREATE VIEW vw_test1104_pl_stats_01 WITH (action=materialize) AS
mytest-# SELECT id,
mytest-# count(*) AS total_count,
mytest-# sum(col2) AS sum,
mytest-# min(col2) AS min,
mytest-# max(col2) AS max,
mytest-# avg(col2) AS avg,
mytest-# percentile_cont(0.99) WITHIN GROUP (ORDER BY col2) AS p99_views
mytest-# FROM t_test1104_pl_steam
mytest-# GROUP BY id;
CREATE VIEW
视图2: 根据id和col1分组,计算col2的各类值。最后会生成260行数据。同样,在插入过程中数据一致在改变
mytest=# create view vw_test1104_pl_stats_02 with (action=materialize) as
mytest-# SELECT id, col1,
mytest-# count(*) AS total_count,
mytest-# sum(col2) AS sum,
mytest-# min(col2) AS min,
mytest-# max(col2) AS max,
mytest-# avg(col2) AS avg,
mytest-# percentile_cont(0.99) WITHIN GROUP (ORDER BY col2) AS p99_views
mytest-# FROM t_test1104_pl_steam
mytest-# GROUP BY id, col1;
===============================================
用python写个脚本随机插入数据1亿条,可以在插入的时候不断观察。
# /bin/python
import psycopg2
import random
conn = psycopg2.connect("dbname=mytest user=dbadmin")
cur = conn.cursor()
for i in range(100000000):
id = random.randint(0,9)
col1 = chr(random.randint(97, 122))
col2 = random.randint(0,100)
cur.execute("INSERT INTO t_test1104_pl_steam VALUES (%s, %s, %s, now())", (id, col1, col2))
conn.commit()
===============================================
视图3:先执行插入数据,再观察一会儿视图。之后执行视图3,发现是不会找到之前的数据了。只会从当前开始,把计算结果插入到物化视图中。
mytest=# create view vw_test1104_pl_stats_03 with (action=materialize) as
mytest-# SELECT col2,
mytest-# count(*) AS total_count,
mytest-# sum(col2) AS sum,
mytest-# min(col2) AS min,
mytest-# max(col2) AS max,
mytest-# avg(col2) AS avg,
mytest-# percentile_cont(0.99) WITHIN GROUP (ORDER BY col2) AS p99_views
mytest-# FROM t_test1104_pl_steam
mytest-# GROUP BY col2;
CREATE VIEW