PG系列数据库TPCH测试文档

PG系列数据库TPCH测试文档

该文档适用于使用postgresql语法的数据库

一、机器环境

此次测试只为走通该测试的测试流程,准备的服务器是本地虚拟化的机器,故性能不具备参考价值

主机名 ip 内存 磁盘 带宽
ym01 192.168.110.221 8G 40G 共享千兆
ym02 192.168.110.222 8G 40G 共享千兆
ym03 192.168.110.223 8G 40G 共享千兆

二、TPCH环境准备

1、下载TPCH

这里由于网络问题,我将https://github.com/tpch-dbgen.git 同步至gitee上后进行拉取

git clone https://gitee.com/mklzl/tpch-dbgen.git

2、编译TPCH

  1. 修改makefile

    ################
    ## CHANGE NAME OF ANSI COMPILER HERE
    ################
    CC      = gcc
    # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
    #                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
    # Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS, 
    #                                  SGI, SUN, U2200, VMS, LINUX, WIN32 
    # Current values for WORKLOAD are:  TPCH
    DATABASE=ORACLE
    MACHINE = LINUX
    WORKLOAD = TPCH
    
  2. 编译

    yum install -y gcc
    make
    

3、准备数据

  1. 生成数据

    ./dbgen -s 10 -f
    

    准备大约10G数据

  2. 修改数据

    for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; done
    
  3. 生成软链接(方便执行数据导入脚本)

    ln -s /home/pgtpch10/ /tmp/dss-data
    

    这里会将生成的 .tbl 的文件所在的目录软链接至 /tmp/dss-data

4、 查询语句生成

使用该tpch的查询语句的生成,是无法适用于pg系列的查询的,需要进行改写,也可以使用下面的模板生成查询语句

  1. 模板链接:
https://github.com/digoal/pg_tpch/tree/master/dss/templates

需要将这个链接中templates中所有的文件下载下来,放在一个固定目录,例如就放在tpch的目录下

  1. 编写脚本
#!/bin/bash
for ((i=1;i<=22;i++)); do
     ./qgen -v -c -s $1 ${i} > $2/tpch-q${i}.sql
done
  1. export模板变量
 export DSS_QUERY=./templates
  1. 执行脚本

该脚本传入两个参数,参数1为scale,和生成数据时传入的数量保持一致,参数二为生成的查询语句对应的目录,该目录需要提前创建

执行该脚本后,会在对应目录下生成查询语句,如下:

[root@ym01 tpch-dbgen]# sh create_query.sh 10 ./querysql
[root@ym01 tpch-dbgen]# cd querysql/
[root@ym01 querysql]# ll
总用量 88
-rw-r--r-- 1 root root  647 1110 10:39 10.sql
-rw-r--r-- 1 root root  631 1110 10:39 11.sql
-rw-r--r-- 1 root root  720 1110 10:39 12.sql
-rw-r--r-- 1 root root  470 1110 10:39 13.sql
-rw-r--r-- 1 root root  442 1110 10:39 14.sql
-rw-r--r-- 1 root root  641 1110 10:39 15.sql
-rw-r--r-- 1 root root  609 1110 10:39 16.sql
-rw-r--r-- 1 root root  476 1110 10:39 17.sql
-rw-r--r-- 1 root root  582 1110 10:39 18.sql
-rw-r--r-- 1 root root 1090 1110 10:39 19.sql
-rw-r--r-- 1 root root  664 1110 10:39 1.sql
-rw-r--r-- 1 root root  865 1110 10:39 20.sql
-rw-r--r-- 1 root root  805 1110 10:39 21.sql
-rw-r--r-- 1 root root  797 1110 10:39 22.sql
-rw-r--r-- 1 root root  818 1110 10:39 2.sql
-rw-r--r-- 1 root root  519 1110 10:39 3.sql
-rw-r--r-- 1 root root  476 1110 10:39 4.sql
-rw-r--r-- 1 root root  607 1110 10:39 5.sql
-rw-r--r-- 1 root root  365 1110 10:39 6.sql
-rw-r--r-- 1 root root  926 1110 10:39 7.sql
-rw-r--r-- 1 root root  900 1110 10:39 8.sql
-rw-r--r-- 1 root root  741 1110 10:39 9.sql

三、建表

  • part

    CREATE TABLE PART (
                          P_PARTKEY               SERIAL8,
                          P_NAME                  VARCHAR(55),
                          P_MFGR                  CHAR(25),
                          P_BRAND                 CHAR(10),
                          P_TYPE                  VARCHAR(25),
                          P_SIZE                  INTEGER,
                          P_CONTAINER             CHAR(10),
                          P_RETAILPRICE   DECIMAL,
                          P_COMMENT               VARCHAR(23)
    ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (p_partkey);
    
  • region

    CREATE TABLE REGION (
                            R_REGIONKEY     SERIAL8,
                            R_NAME          CHAR(25),
                            R_COMMENT       VARCHAR(152)
    )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (r_regionkey);
    
  • partsupp

    CREATE TABLE PARTSUPP (
                                  PS_PARTKEY              BIGINT NOT NULL, -- references P_PARTKEY
                                  PS_SUPPKEY              BIGINT NOT NULL, -- references S_SUPPKEY
                                  PS_AVAILQTY             INTEGER,
                                  PS_SUPPLYCOST   DECIMAL,
                                  PS_COMMENT              VARCHAR(199)
        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (ps_partkey,ps_suppkey);
    
  • nation

    CREATE TABLE NATION (
                            N_NATIONKEY             SERIAL8,
                            N_NAME                  CHAR(25),
                            N_REGIONKEY             BIGINT NOT NULL,  -- references R_REGIONKEY
                            N_COMMENT               VARCHAR(152)
    )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (n_nationkey);
    
  • supplier

    CREATE TABLE SUPPLIER (
                              S_SUPPKEY               SERIAL8,
                              S_NAME                  CHAR(25),
                              S_ADDRESS               VARCHAR(40),
                              S_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                              S_PHONE                 CHAR(15),
                              S_ACCTBAL               DECIMAL,
                              S_COMMENT               VARCHAR(101)
    )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (s_suppkey);
    
  • customer

    CREATE TABLE CUSTOMER (
                              C_CUSTKEY               SERIAL8,
                              C_NAME                  VARCHAR(25),
                              C_ADDRESS               VARCHAR(40),
                              C_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                              C_PHONE                 CHAR(15),
                              C_ACCTBAL               DECIMAL,
                              C_MKTSEGMENT    CHAR(10),
                              C_COMMENT               VARCHAR(117)
    )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (c_custkey);
    
  • lineitem

    CREATE TABLE LINEITEM (
                              L_ORDERKEY              BIGINT NOT NULL, -- references O_ORDERKEY
                              L_PARTKEY               BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP)
                              L_SUPPKEY               BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP)
                              L_LINENUMBER    INTEGER,
                              L_QUANTITY              DECIMAL,
                              L_EXTENDEDPRICE DECIMAL,
                              L_DISCOUNT              DECIMAL,
                              L_TAX                   DECIMAL,
                              L_RETURNFLAG    CHAR(1),
                              L_LINESTATUS    CHAR(1),
                              L_SHIPDATE              DATE,
                              L_COMMITDATE    DATE,
                              L_RECEIPTDATE   DATE,
                              L_SHIPINSTRUCT  CHAR(25),
                              L_SHIPMODE              CHAR(10),
                              L_COMMENT               VARCHAR(44)
    )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (l_orderkey, l_linenumber);
    
  • orders

    CREATE TABLE ORDERS (
                            O_ORDERKEY              SERIAL8,
                            O_CUSTKEY               BIGINT NOT NULL, -- references C_CUSTKEY
                            O_ORDERSTATUS   CHAR(1),
                            O_TOTALPRICE    DECIMAL,
                            O_ORDERDATE             DATE,
                            O_ORDERPRIORITY CHAR(15),
                            O_CLERK                 CHAR(15),
                            O_SHIPPRIORITY  INTEGER,
                            O_COMMENT               VARCHAR(79)
    )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (o_orderkey);
    

四、导入数据

可以通过登录psql的命令行,执行copy命令,逐表进行数据导入,如下:

登录数据库:

psql -d tpch -h 192.168.110.221 -p 5432 -U mxadmin

打开计时:

tpch=# \timing on
Timing is on.

执行导入:

COPY lineitem FROM '/tmp/dss-data/lineitem.csv' WITH csv DELIMITER '|';
COPY orders FROM '/tmp/dss-data/orders.csv' WITH csv DELIMITER '|';
COPY partsupp FROM '/tmp/dss-data/partsupp.csv' WITH csv DELIMITER '|';
COPY customer FROM '/tmp/dss-data/customer.csv' WITH csv DELIMITER '|';
COPY supplier FROM '/tmp/dss-data/supplier.csv' WITH csv DELIMITER '|';
COPY nation FROM '/tmp/dss-data/nation.csv' WITH csv DELIMITER '|';
COPY region FROM '/tmp/dss-data/region.csv' WITH csv DELIMITER '|';
COPY part FROM '/tmp/dss-data/part.csv' WITH csv DELIMITER '|';

也可以通过编写下面的脚本进行导入

将上述语句放入一个名为copy.sql的文件中,执行下面的语句:

psql -f copy.sql -U user_name -d db_name -h host -p port

五、查询

1、收集统计信息

在SQL中执行:

analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;

2、执行查询

可以手动执行上面生成的查询语句,也可以使用下面的脚本进行执行

执行脚本如下:

total_cost=0

for i in {1..2}
do
        echo "begin run Q${i}, /home/tpch/tpch-dbgen/querysql/tpch-q$i.sql , `date`"
        begin_time=`date +%s.%N`
        psql -d tpch -h 192.168.110.221 -p 5432 -U mxadmin -f /home/tpch/tpch-dbgen/querysql/tpch-q${i}.sql > ./log/log_q${i}.out
        rc=$?
        end_time=`date +%s.%N`
        cost=`echo "$end_time-$begin_time"|bc`
        total_cost=`echo "$total_cost+$cost"|bc`
        if [ $rc -ne 0 ] ; then
              printf "run Q%s fail, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
        else
              printf "run Q%s succ, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
        fi
done

请自行修改脚本中对应数据连接和执行查询SQL路径的参数

执行后会有以下输出:

begin run Q1, /home/tpch/tpch-dbgen/querysql/tpch-q1.sql , 20221110日 星期四 10:44:28 CST
run Q1 succ, cost: 16.02, totalCost: 16.02, 20221110日 星期四 10:44:44 CST
begin run Q2, /home/tpch/tpch-dbgen/querysql/tpch-q2.sql , 20221110日 星期四 10:44:44 CST
run Q2 succ, cost: 0.23, totalCost: 16.25, 20221110日 星期四 10:44:44 CST
......

你可能感兴趣的:(mpp,大数据,数仓,数据库,postgresql)