Postgresql - PipelineDB - 初识Stream功能 - 流式计算

PipelineDB ,

有一个很好用的流式计算功能。将想要得到的结果逻辑,存储为表的雾化视图。

在插入数据的时候不会存储数据本身,是在每次插入或改变的时候按照当初建好的VIEW的逻辑存储数据,所以需要存储的数据很少。

 

实验:

Cent OS 7 + PG 10.5 + PipelineDB 1.0.0-4

 

建表

mytest=# CREATE FOREIGN TABLE t_test1104_pl_steam(

mytest(# id int,

mytest(# col1 varchar(100),

mytest(# col2 int,

mytest(# c_time timestamp without time zone)

mytest-# SERVER pipelinedb;

CREATE FOREIGN TABLE

 

实验插入的数据为:

id:随机生成0--9

col1:随机生成a--z

col2:随机生成0--100

c_time:now() (本实验中没有卵用...)

 

视图1: 根据id分组,计算col2的各类值.最后只会有10行数据,但是在插入过程中,数据一直在改变。可以在插入的时候不断做查询。

mytest=# CREATE VIEW vw_test1104_pl_stats_01 WITH (action=materialize) AS

mytest-# SELECT id,

mytest-# count(*) AS total_count,

mytest-# sum(col2) AS sum,

mytest-# min(col2) AS min,

mytest-# max(col2) AS max,

mytest-# avg(col2) AS avg,

mytest-# percentile_cont(0.99) WITHIN GROUP (ORDER BY col2) AS p99_views

mytest-# FROM t_test1104_pl_steam

mytest-# GROUP BY id;

CREATE VIEW

 

视图2: 根据id和col1分组,计算col2的各类值。最后会生成260行数据。同样,在插入过程中数据一致在改变

mytest=# create view vw_test1104_pl_stats_02 with (action=materialize) as

mytest-# SELECT id, col1,

mytest-# count(*) AS total_count,

mytest-# sum(col2) AS sum,

mytest-# min(col2) AS min,

mytest-# max(col2) AS max,

mytest-# avg(col2) AS avg,

mytest-# percentile_cont(0.99) WITHIN GROUP (ORDER BY col2) AS p99_views

mytest-# FROM t_test1104_pl_steam

mytest-# GROUP BY id, col1;

 

===============================================

用python写个脚本随机插入数据1亿条,可以在插入的时候不断观察。

 

# /bin/python

import psycopg2

import random

 

 

conn = psycopg2.connect("dbname=mytest user=dbadmin")

cur = conn.cursor()

 

for i in range(100000000):

id = random.randint(0,9)

col1 = chr(random.randint(97, 122))

col2 = random.randint(0,100)

cur.execute("INSERT INTO t_test1104_pl_steam VALUES (%s, %s, %s, now())", (id, col1, col2))

 

conn.commit()

 

===============================================

 

视图3:先执行插入数据,再观察一会儿视图。之后执行视图3,发现是不会找到之前的数据了。只会从当前开始,把计算结果插入到物化视图中。

mytest=# create view vw_test1104_pl_stats_03 with (action=materialize) as

mytest-# SELECT col2,

mytest-# count(*) AS total_count,

mytest-# sum(col2) AS sum,

mytest-# min(col2) AS min,

mytest-# max(col2) AS max,

mytest-# avg(col2) AS avg,

mytest-# percentile_cont(0.99) WITHIN GROUP (ORDER BY col2) AS p99_views

mytest-# FROM t_test1104_pl_steam

mytest-# GROUP BY col2;

CREATE VIEW

 

你可能感兴趣的:(Postgresql)