greenplum tpch测试记录

1、到tpc组织下载dbgen
http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp
需要注册一下,tpc会发生一封邮件给出一个下载地址
2、编译tpch和生成测试数据


[gpadmin@gp-node0 dbgen]$ unzip c1ac817c-d2f5-4a24-b7bd-c1f84e75949e-tpc-h-tool.zip
[gpadmin@gp-node0 tpch]$ cd tpch_2_17_0/
[gpadmin@gp-node0 tpch_2_17_0]$ ll
total 8
drwxrwxr-x. 8 gpadmin gpadmin 4096 May 5 2014 dbgen
drwxrwxr-x. 2 gpadmin gpadmin 33 May 5 2014 dev-tools
drwxrwxr-x. 10 gpadmin gpadmin 4096 May 5 2014 ref_data
[gpadmin@gp-node0 tpch_2_17_0]$ cd dbgen/
[gpadmin@gp-node0 dbgen]$ ll

[gpadmin@gp-node0 dbgen]$cp makefile.suite Makefile

[gpadmin@gp-node0 dbgen]$vi Makefile
第109行,设置
CC = gcc

DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH

[gpadmin@gp-node0 dbgen]$ make
生成sf=100的数据
[gpadmin@gp-node0 dbgen]$ ./dbgen -s 100 -f
TPC-H Population Generator (Version 2.17.0)
Copyright Transaction Processing Performance Council 1994 - 2010


3、将测试数据转换为GP识别的格式,删除末尾的分隔符|。

$for i in ls *.tbl; do sed 's/|$//' $i > ${i/tbl/csv}; done

4、下载tpch,采用digoal修改过适应GP的脚本

git clone https://github.com/digoal/pg_tpch.git
cd pg_tpch

修改tpch-load.sql中指向的文件路径
采用vi查找替换命令
:%s+/tmp/dss-data+/data/tpch/csv100+

[gpadmin@gp-node0 dss]$ more tpch-load.sql
BEGIN;

    CREATE TABLE PART (
            P_PARTKEY               SERIAL8,
            P_NAME                  VARCHAR(55),
            P_MFGR                  CHAR(25),
            P_BRAND                 CHAR(10),
            P_TYPE                  VARCHAR(25),
            P_SIZE                  INTEGER,
            P_CONTAINER             CHAR(10),
            P_RETAILPRICE   DECIMAL,
            P_COMMENT               VARCHAR(23)
    ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (p_partkey);

    COPY part FROM '/data/tpch/csv100/part.csv' WITH csv DELIMITER '|';

COMMIT;
......


将pg_tpch的文件都拷贝到dbgen所在的目录:

$cp -r pg_tpch/* tpch/tpch_2_17_0/dbgen/
$cd tpch/tpch_2_17_0/dbgen

创建一个queries目录,用于存放转换后的tpc-h 测试SQL。

[gpadmin@gp-node0 dbgen]$ SF=100
[gpadmin@gp-node0 dbgen]$ mkdir dss/queries
mkdir: cannot create directory ‘dss/queries’: File exists
[gpadmin@gp-node0 dbgen]$ for q in seq 1 22

do
DSS_QUERY=dss/templates ./qgen -s $SF $q > dss/queries/$q.sql
sed 's/^select/explain select/' dss/queries/$q.sql > dss/queries/$q.explain.sql
done
[gpadmin@gp-node0 dbgen]$


5、创建数据库用户

[gpadmin@gp-node0 pg_tpch]$ psql tpch
psql (8.2.15)
Type "help" for help.

tpch=# CREATE ROLE "tpchuser" login PASSWORD 'tpchuser' SUPERUSER NOINHERIT;


设置几个参数:

[gpadmin@gp-node0 pg_tpch]$ gpconfig -c enable_nestloop -v off
20160819:17:13:44:021871 gpconfig:gp-node0:gpadmin-[INFO]:-completed successfully
[gpadmin@gp-node0 pg_tpch]$ gpconfig -c work_mem -v 256MB
20160819:17:13:57:022043 gpconfig:gp-node0:gpadmin-[INFO]:-completed successfully
[gpadmin@gp-node0 pg_tpch]$ gpstop -u
20160819:17:14:00:022187 gpstop:gp-node0:gpadmin-[INFO]:-Starting gpstop with args: -u
20160819:17:14:00:022187 gpstop:gp-node0:gpadmin-[INFO]:-Gathering information and validating the environment...
20160819:17:14:00:022187 gpstop:gp-node0:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20160819:17:14:00:022187 gpstop:gp-node0:gpadmin-[INFO]:-Obtaining Segment details from master...
20160819:17:14:02:022187 gpstop:gp-node0:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.9.0 build 1'
20160819:17:14:02:022187 gpstop:gp-node0:gpadmin-[INFO]:-Signalling all postmaster processes to reload
..
[gpadmin@gp-node0 pg_tpch]$

6、执行测试

[gpadmin@gp-node0 dbgen]$ ./tpch.sh ./results tpch tpchtest
vmstat: /usr/local/greenplum-db/./lib/libz.so.1: no version information available (required by /lib64/libdw.so.1)

./tpch.sh: line 166: kill: (30294) - No such process
./tpch.sh: line 189: 24564 Terminated vmstat $DELAY >> $RESULTS/vmstat.log


7、使用以下方法生成CSV报告。

[gpadmin@gp-node0 dbgen]$ php process.php ./results output.csv
input directory: ./results
output file: output.csv
[root@gp-node0 dbgen]# vi output.csv
tpch_load;tpch_pkeys;tpch_fkeys;tpch_indexes;tpch_analyze;tpch_total;query_1;query_2;query_3;query_4;query_5;query_6;query_7;query_8;query_9;query_10;query_11;query_12;query_13;query_14;query_15;query_16;query_17;query_18;query_19;query_20;query_21;query_2
2;query_1_hash;query_2_hash;query_3_hash;query_4_hash;query_5_hash;query_6_hash;query_7_hash;query_8_hash;query_9_hash;query_10_hash;query_11_hash;query_12_hash;query_13_hash;query_14_hash;query_15_hash;query_16_hash;query_17_hash;query_18_hash;query_19_ha
sh;query_20_hash;query_21_hash;query_22_hash;db_cache_hit_ratio
4.00;0.00;1.00;0.00;127.00;763.00;40.55;10.04;34.40;2.32;3.09;0.02;89.88;27.56;100.19;3.64;5.09;1.33;16.33;1.02;1.04;8.66;87.57;131.51;38.19;5.01;128.12;11.84;afc07662ad7b9f84497d46ec3e2982ca;58c9a2cadeac374f8d4f140f99edb6aa;2fc3f1557173580fea23723865a4b68
e;cd1b53d233e808cdcdedbcaaa1878596;65363ff4ff7557e250b34f7600b28c03;064cb1b93292905a03b1ae773255c930;9af0a7cf3c4836428ea37bc04f73f9a5;51c9d36bdece8ebf910c55547c07a6cf;fc1f609fda8b3c83013ebc33e7daeb68;1d90d4fdce96f68f67ef2ca88376f969;c2994eb3044e9ad1668608a
ef389f655;70eee67cbec1a9e64f3b55902c498dfb;509f9f92f0c59402422c8c6f82195134;fabf21ebb069cabf9d211643fe29bedf;734d0ea2c20114d51cc7b2f27c96b687;fe286ea7839d129719aff3afd4bb94b4;90dd1b04700d9154a2d741e090c4c709;146b5bc30d8f7ba184b7a81c360676b1;660bd16ff9a6f85
1d853e1d5530cd61f;12558d5227c3d83a0043255f46b351a2;6ec86cb3d0faa18fbae06cc8dcd5a245;7a6756e65d19c578f988901faa5ef3b4;0.00;

8、php安装


[root@gp-node0 dbgen]# yum -y install php

如果“ImportError: No module named site”的错误,是因为greenplum修改了Python的PATH
设置一下环境变量:
export PYTHONPATH=/usr/lib64/python2.7:$PYTHONPATH

你可能感兴趣的:(greenplum tpch测试记录)