它几乎拥有GP的所有功能,在保有GP所有优势的基础上,Deepgreen对原查询处理引擎进行了优化,新一代查询处理引擎扩展了:
优越的连接和聚合算法
新的溢出处理子系统
基于JIT的查询优化、矢量扫描和数据路径优化
下面简单介绍一下Deepgreen的主要特性(主要与Greenplum对比):
1. 100% GPDB
Deepgreen与Greenplum几乎100%一致,这里说几乎,是因为Deepgreen也剔除了一些Greenplum上的鸡肋功能,例如MapReduce支持,可以说保有的都是精华。从SQL语法、存储过程语法,到数据存储格式,再到像gpstart/gpfdist等组件,Deepgreen为想要从Greenplum迁移过来的用户将迁移影响降到最低。尤其是在下面这些方面:
除了以quicklz方式压缩的数据需要修改外,其他数据无需重新装载
DML和DDL语句没有任何改变
UDF(用户定义函数)语法没有任何改变
存储过程语法没有任何改变
JDBC/ODBC等连接和授权协议没有任何改变
运行脚本没有任何改变(例如备份脚本)
那么Deepgreen和Greenplum的不同之处在哪呢?总结成一个词就是:快!快!快!(重要的事情说三遍)。因为大部分的OLAP工作都与CPU的性能有关,所以针对CPU优化后的Deepgreen在性能测试中,可以达到比原Greenplum快3~5倍的性能。
2.更快的Decimal类型
Deepgreen提供了两个更精确的Decimal类型:Decimal64和Decimal128,它们比Greenplum原有的Decimal类型(Numeric)更有效。因为它们更精确,相比于fload/double类型,更适合用在银行等对数据准确性要求高的业务场景。
安装:
这两个数据类型需要在数据库初始化以后,通过命令加载到需要的数据库中: dgadmin@flash:~$ source deepgreendb/greenplum_path.sh dgadmin@flash:~$ cd $GPHOME/share/postgresql/contrib/ dgadmin@flash:~/deepgreendb/share/postgresql/contrib$ psql postgres -f pg_decimal.sql
测试一把:
使用语句:select avg(x), sum(2*x) from table 数据量:100万 dgadmin@flash:~$ psql -d postgres psql (8.2.15) Type "help" for help. postgres=# drop table if exists tt; NOTICE: table "tt" does not exist, skipping DROP TABLE postgres=# create table tt( postgres(# ii bigint, postgres(# f64 double precision, postgres(# d64 decimal64, postgres(# d128 decimal128, postgres(# n numeric(15, 3)) postgres-# distributed randomly; CREATE TABLE postgres=# insert into tt postgres-# select i, postgres-# i + 0.123, postgres-# (i + 0.123)::decimal64, postgres-# (i + 0.123)::decimal128, postgres-# i + 0.123 postgres-# from generate_series(1, 1000000) i; INSERT 0 1000000 postgres=# \timing on Timing is on. postgres=# select count(*) from tt; count --------- 1000000 (1 row) Time: 161.500 ms postgres=# set vitesse.enable=1; SET Time: 1.695 ms postgres=# select avg(f64),sum(2*f64) from tt; avg | sum ------------------+------------------ 500000.622996815 | 1000001245993.63 (1 row) Time: 45.368 ms postgres=# select avg(d64),sum(2*d64) from tt; avg | sum ------------+------------------- 500000.623 | 1000001246000.000 (1 row) Time: 135.693 ms postgres=# select avg(d128),sum(2*d128) from tt; avg | sum ------------+------------------- 500000.623 | 1000001246000.000 (1 row) Time: 148.286 ms postgres=# set vitesse.enable=1; SET Time: 11.691 ms postgres=# select avg(n),sum(2*n) from tt; avg | sum ---------------------+------------------- 500000.623000000000 | 1000001246000.000 (1 row) Time: 154.189 ms postgres=# set vitesse.enable=0; SET Time: 1.426 ms postgres=# select avg(n),sum(2*n) from tt; avg | sum ---------------------+------------------- 500000.623000000000 | 1000001246000.000 (1 row) Time: 296.291 ms
结果列表:
45ms - 64位float 136ms - decimal64 148ms - decimal128 154ms - deepgreen numeric 296ms - greenplum numeric
通过上面的测试,decimal64(136ms)类型比deepgreen numeric(154ms)类型快,比greenplum numeric快两倍,生产环境中快5倍以上。
3.支持JSON
Deepgreen支持JSON类型,但是并不完全支持。不支持的函数有:json_each,json_each_text,json_extract_path,json_extract_path_text, json_object_keys, json_populate_record, json_populate_recordset, json_array_elements, and json_agg.
安装:
执行下面命令扩展json支持:
dgadmin@flash:~$ psql postgres -f $GPHOME/share/postgresql/contrib/json.sql
测试一把:
dgadmin@flash:~$ psql postgres psql (8.2.15) Type "help" for help. postgres=# select '[1,2,3]'::json->2; ?column? ---------- 3 (1 row) postgres=# create temp table mytab(i int, j json) distributed by (i); CREATE TABLE postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]'); INSERT 0 3 postgres=# postgres=# insert into mytab values (1, null), (2, '[2,3,4]'), (3, '[3000,4000,5000]'); INSERT 0 3 postgres=# select i, j->2 from mytab; i | ?column? ---+---------- 2 | 4 2 | 4 1 | 3 | 5000 1 | 3 | 5000 (6 rows)
4.高效压缩算法
Deepgreen延续了Greenplum的zlib压缩算法用于存储压缩。除此之外,Deepgreen还提供两种对数据库负载更优的压缩格式:zstd和lz4.
如果客户在列存或者只追加堆表存储时要求更优的压缩比,请选择zstd压缩算法。相比于zlib,zstd有更好的压缩比,并且能更有效利用CPU。
如果客户有大量读取需求,那么可以选择lz4压缩算法,因为它有着惊人的解压速度。虽然在压缩比上lz4并没有zlib和zstd那么出众,但是为了满足高读取负载作出一些牺牲还是值得的。
有关于这两种压缩算法的具体内容,详见其主页:
zstd主页 http://facebook.github.io/zstd/
lz4主页 http://lz4.github.io/lz4/
测试一把:
这里只针对 不压缩/zlib/zstd/lz4四种,进行简单的测试,我的机器性能并不高,所有结果仅供参考:
postgres=# create temp table ttnone ( postgres(# i int, postgres(# t text, postgres(# default column encoding (compresstype=none)) postgres-# with (appendonly=true, orientation=column) postgres-# distributed by (i); CREATE TABLE postgres=# \timing on Timing is on. postgres=# create temp table ttzlib( postgres(# i int, postgres(# t text, postgres(# default column encoding (compresstype=zlib, compresslevel=1)) postgres-# with (appendonly=true, orientation=column) postgres-# distributed by (i); CREATE TABLE Time: 762.596 ms postgres=# create temp table ttzstd ( postgres(# i int, postgres(# t text, postgres(# default column encoding (compresstype=zstd, compresslevel=1)) postgres-# with (appendonly=true, orientation=column) postgres-# distributed by (i); CREATE TABLE Time: 827.033 ms postgres=# create temp table ttlz4 ( postgres(# i int, postgres(# t text, postgres(# default column encoding (compresstype=lz4)) postgres-# with (appendonly=true, orientation=column) postgres-# distributed by (i); CREATE TABLE Time: 845.728 ms postgres=# insert into ttnone select i, 'user '||i from generate_series(1, 100000000) i; INSERT 0 100000000 Time: 104641.369 ms postgres=# insert into ttzlib select i, 'user '||i from generate_series(1, 100000000) i; INSERT 0 100000000 Time: 99557.505 ms postgres=# insert into ttzstd select i, 'user '||i from generate_series(1, 100000000) i; INSERT 0 100000000 Time: 98800.567 ms postgres=# insert into ttlz4 select i, 'user '||i from generate_series(1, 100000000) i; INSERT 0 100000000 Time: 96886.107 ms postgres=# select pg_size_pretty(pg_relation_size('ttnone')); pg_size_pretty ---------------- 1708 MB (1 row) Time: 83.411 ms postgres=# select pg_size_pretty(pg_relation_size('ttzlib')); pg_size_pretty ---------------- 374 MB (1 row) Time: 4.641 ms postgres=# select pg_size_pretty(pg_relation_size('ttzstd')); pg_size_pretty ---------------- 325 MB (1 row) Time: 5.015 ms postgres=# select pg_size_pretty(pg_relation_size('ttlz4')); pg_size_pretty ---------------- 785 MB (1 row) Time: 4.483 ms postgres=# select sum(length(t)) from ttnone; sum ------------ 1288888898 (1 row) Time: 4414.965 ms postgres=# select sum(length(t)) from ttzlib; sum ------------ 1288888898 (1 row) Time: 4500.671 ms postgres=# select sum(length(t)) from ttzstd; sum ------------ 1288888898 (1 row) Time: 3849.648 ms postgres=# select sum(length(t)) from ttlz4; sum ------------ 1288888898 (1 row) Time: 3160.477 ms
5.数据采样
从Deepgreen 16.16版本开始,内建支持通过SQL进行数据真实采样,您可以通过定义行数或者定义采样比两种方式进行采样:
SELECT {select-clauses} LIMIT SAMPLE {n} ROWS;
SELECT {select-clauses} LIMIT SAMPLE {n} PERCENT;
测试一把:
postgres=# select count(*) from ttlz4; count ----------- 100000000 (1 row) Time: 903.661 ms postgres=# select * from ttlz4 limit sample 0.00001 percent; i | t ----------+--------------- 3442917 | user 3442917 9182620 | user 9182620 9665879 | user 9665879 13791056 | user 13791056 15669131 | user 15669131 16234351 | user 16234351 19592531 | user 19592531 39097955 | user 39097955 48822058 | user 48822058 83021724 | user 83021724 1342299 | user 1342299 20309120 | user 20309120 34448511 | user 34448511 38060122 | user 38060122 69084858 | user 69084858 73307236 | user 73307236 95421406 | user 95421406 (17 rows) Time: 4208.847 ms postgres=# select * from ttlz4 limit sample 10 rows; i | t ----------+--------------- 78259144 | user 78259144 85551752 | user 85551752 90848887 | user 90848887 53923527 | user 53923527 46524603 | user 46524603 31635115 | user 31635115 19030885 | user 19030885 97877732 | user 97877732 33238448 | user 33238448 20916240 | user 20916240 (10 rows) Time: 3578.031 ms
6.TPC-H性能
Deepgreen与Greenplum的性能对比,请参考我另外两个帖子:
《Deepgreen与Greenplum TPC-H性能测试对比(使用德哥脚本)》
《Deepgreen与Greenplum TPC-H性能测试对比(使用VitesseData脚本)》
另外Deepgreen自身搭载的高性能组件Xdrive,在后期会另行分享~
End~