下载TPC-DS源码:
http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp
选择合适的版本,本次选择:tpc-ds-tool-v2.1.0.zip
编译安装:
make前需要检测依赖:gcc gcc-c++ libstdc+±devel bison byacc flex
如果没有安装的话,yum install
[adb3.1@intel176 ~]$ unzip tpc-ds-tool-v2.1.0.zip
[adb3.1@intel176 ~]$ cd tpc-ds-tool-v2.1.0/tools/
[adb3.1@intel176 tools]$ make
[adb3.1@intel176 tools]$ ll | grep dsd
-rwxrwxr-x 1 adb3.1 adb3.1 455504 Mar 4 14:59 dsdgen
以上TPC-DS测试工具就准备好了。
1、元数据:
tpcds.sql 创建25张表的sql语句
tpcds_ri.sql 创建表与表之间关系的sql语句
[adb3.1@intel176 tools]$ ll | grep tpcds
-rw-rw-r-- 1 adb3.1 adb3.1 1722 Nov 11 2015 tpcds_20080910.sum
-rw-rw-r-- 1 adb3.1 adb3.1 20097 Nov 11 2015 tpcds.dst
-rw-rw-r-- 1 adb3.1 adb3.1 640583 Mar 4 14:59 tpcds.idx
-rw-rw-r-- 1 adb3.1 adb3.1 7125 Mar 4 14:59 tpcds.idx.h
-rw-rw-r-- 1 adb3.1 adb3.1 13869 Nov 11 2015 tpcds_ri.sql
-rw-rw-r-- 1 adb3.1 adb3.1 22153 Nov 11 2015 tpcds_source.sql
-rw-rw-r-- 1 adb3.1 adb3.1 30001 Nov 11 2015 tpcds.sql
-rw-rw-r-- 1 adb3.1 adb3.1 233068 Nov 11 2015 tpcds.wam
2、元数据导入AntDB中:
[adb3.1@intel176 tools]$ coord1
psql (4.0.0 7beda81237 based on PG 10.7)
Type "help" for help.
postgres=# create database tpcds;
CREATE DATABASE
postgres=# \c tpcds
You are now connected to database "tpcds" as user "adb3.1".
tpcds=# \i ~/tpc-ds-tool-v2.1.0/tools/tpcds.sql
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
1、生成数据
-scale 1代表1G数据
-parallel 10使用10个线程
[adb3.1@intel176 tools]$ ./dsdgen -h
DBGEN2 Population Generator (Version 2.0.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
USAGE: DBGEN2 [options]
Note: When defined in a parameter file (using -p), parmeters should
use the form below. Each option can also be set from the command
line, using a form of '-param [optional argument]'
Unique anchored substrings of options are also recognized, and
case is ignored, so '-sc' is equivalent to '-SCALE'
General Options
===============
ABREVIATION = <s> -- build table with abreviation <s>
DIR = <s> -- generate tables in directory <s>
HELP = <n> -- display this message
PARAMS = <s> -- read parameters from file <s>
QUIET = [Y|N] -- disable all output to stdout/stderr
SCALE = <n> -- volume of data to generate in GB
TABLE = <s> -- build only table <s>
UPDATE = <n> -- generate update data set <n>
VERBOSE = [Y|N] -- enable verbose output
PARALLEL = <n> -- build data in <n> separate chunks
CHILD = <n> -- generate <n>th chunk of the parallelized data
RELEASE = [Y|N] -- display the release information
_FILTER = [Y|N] -- output data to stdout
VALIDATE = [Y|N] -- produce rows for data validation
Advanced Options
===============
DELIMITER = <s> -- use <s> as output field separator
DISTRIBUTIONS = <s> -- read distributions from file <s>
FORCE = [Y|N] -- over-write data files without prompting
SUFFIX = <s> -- use <s> as output file suffix
TERMINATE = [Y|N] -- end each record with a field delimiter
VCOUNT = <n> -- set number of validation rows to be produced
VSUFFIX = <s> -- set file suffix for data validation
RNGSEED = <n> -- set RNG seed
[adb3.1@intel176 tools]$ ./dsdgen -scale 1
DBGEN2 Population Generator (Version 2.0.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
Warning: This scale factor is valid for QUALIFICATION ONLY
[adb3.1@intel176 tools]$
[adb3.1@intel176 tools]$ ll -lth
total 1.2G
-rw-rw-r-- 1 adb3.1 adb3.1 37 Mar 4 15:45 dbgen_version.dat
-rw-rw-r-- 1 adb3.1 adb3.1 8.6K Mar 4 15:45 web_site.dat
-rw-rw-r-- 1 adb3.1 adb3.1 141M Mar 4 15:45 web_sales.dat
-rw-rw-r-- 1 adb3.1 adb3.1 9.4M Mar 4 15:45 web_returns.dat
如果需要使用并行生成,可参考如下脚本:
[ms@intel175 tpcds]$ vim tpcds-dsdgen.sh
cur_dir=`pwd`
tpcds_home="/ssd/zgy/tpcds"
datadir=$1
scale=$2
parallel=$3
logfile=${tpcds_home}/log-dbgen.log
if [ ! -d "$datadir" ] ;then
echo "please input correct data dir"
exit 1
fi
cd ${tpcds_home}
for i in $(seq 1 $parallel)
do
echo `date "+%Y-%m-%d %H:%M:%S"` "start to gendate child $i" >> ${logfile}
${tpcds_home}/dsdgen -TERMINATE N -VERBOSE -dir $datadir -scale $scale -parallel $parallel -child $i>> ${logfile}
done
这里遇到一个问题,不知各位有没有遇到,当生成数据指定文件路径时,产生coredump:
[adb3.1@intel176 tools]$ ./dsdgen -scale 1 -dir /data/zgy/csv
DBGEN2 Population Generator (Version 2.0.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
Warning: This scale factor is valid for QUALIFICATION ONLY
ERROR: Failed to open output file!
File: print.c
Line: 490
Segmentation fault (core dumped)
[adb3.1@intel176 tools]$ gdb dsdgen core.82407
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /data/adb3.1/tpc-ds-tool-v2.1.0/tools/dsdgen...done.
[New LWP 82407]
Core was generated by `./dsdgen -scale 1 -dir /data/zgy/csv'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002b697f75f4ad in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.3.x86_64
(gdb) bt
#0 0x00002b697f75f4ad in vfprintf () from /lib64/libc.so.6
#1 0x00002b697f76a287 in fprintf () from /lib64/libc.so.6
#2 0x0000000000419d3a in print_key (nColumn=1, val=1, sep=1) at print.c:347
#3 0x0000000000407c35 in pr_w_call_center (row=0x0) at w_call_center.c:245
#4 0x00000000004154a0 in gen_tbl (tabid=0, kFirstRow=1, kRowCount=6) at driver.c:321
#5 0x0000000000415b27 in main (ac=5, av=0x7fffbe22c178) at driver.c:563
(gdb)
2、生成的数据每一行多了一个“|” ,需要进行替换:
[adb3.1@intel176 tools]$ mkdir csv
[adb3.1@intel176 tools]$ mv *.dat csv
[adb3.1@intel176 csv]$ vim data_process.sh
#!/bin/bash
for i in `ls *.dat`
do
name=$i
echo $name
sed -i 's#|$##g' $name
done
[adb3.1@intel176 csv]$ sh data_process.sh
3、导入到AntDB中:
[adb3.1@intel176 csv]$ vim copy.sql
copy call_center from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/call_center.dat' with delimiter as '|' NULL '';
copy catalog_page from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/catalog_page.dat' with delimiter as '|' NULL '';
copy catalog_returns from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/catalog_returns.dat' with delimiter as '|' NULL '';
copy catalog_sales from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/catalog_sales.dat' with delimiter as '|' NULL '';
copy customer from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/customer.dat' with delimiter as '|' NULL '';
copy customer_address from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/customer_address.dat' with delimiter as '|' NULL '';
copy customer_demographics from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/customer_demographics.dat' with delimiter as '|' NULL '';
copy date_dim from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/date_dim.dat' with delimiter as '|' NULL '';
copy dbgen_version from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/dbgen_version.dat' with delimiter as '|' NULL '';
copy household_demographics from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/household_demographics.dat' with delimiter as '|' NULL '';
copy income_band from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/income_band.dat' with delimiter as '|' NULL '';
copy inventory from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/inventory.dat' with delimiter as '|' NULL '';
copy item from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/item.dat' with delimiter as '|' NULL '';
copy promotion from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/promotion.dat' with delimiter as '|' NULL '';
copy reason from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/reason.dat' with delimiter as '|' NULL '';
copy ship_mode from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/ship_mode.dat' with delimiter as '|' NULL '';
copy store from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/store.dat' with delimiter as '|' NULL '';
copy store_returns from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/store_returns.dat' with delimiter as '|' NULL '';
copy store_sales from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/store_sales.dat' with delimiter as '|' NULL '';
copy time_dim from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/time_dim.dat' with delimiter as '|' NULL '';
copy warehouse from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/warehouse.dat' with delimiter as '|' NULL '';
copy web_page from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/web_page.dat' with delimiter as '|' NULL '';
copy web_returns from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/web_returns.dat' with delimiter as '|' NULL '';
copy web_sales from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/web_sales.dat' with delimiter as '|' NULL '';
copy web_site from '/data/adb3.1/tpc-ds-tool-v2.1.0/tools/csv/web_site.dat' with delimiter as '|' NULL '';
导入到AntDB中:
[adb3.1@intel176 csv]$ coord1
psql (4.0.0 7beda81237 based on PG 10.7)
Type "help" for help.
postgres=# \c tpcds
You are now connected to database "tpcds" as user "adb3.1".
tpcds=# \timing
Timing is on.
tpcds=# \i copy.sql
COPY 6
Time: 107.000 ms
COPY 11718
Time: 224.743 ms
COPY 144067
Time: 2566.562 ms (00:02.567)
tpcds=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+--------+----------+---------+-------+-----------------------+---------+------------+--------------------------------------------
postgres | adb3.1 | UTF8 | C | C | | 16 MB | pg_default | default administrative connection database
template0 | adb3.1 | UTF8 | C | C | =c/"adb3.1" +| 16 MB | pg_default | unmodifiable empty database
| | | | | "adb3.1"=CTc/"adb3.1" | | |
template1 | adb3.1 | UTF8 | C | C | =c/"adb3.1" +| 16 MB | pg_default | default template for new databases
| | | | | "adb3.1"=CTc/"adb3.1" | | |
tpcds | adb3.1 | UTF8 | C | C | | 2338 MB | pg_default |
(4 rows)
导入完成后,记得vacuum ,更新下统计信息。
1、修改query_template下query1-99模板,在行尾加define _END = “”,否则执行生成命令会出错;
[adb3.1@intel176 ~]$ cd tpc-ds-tool-v2.1.0/query_templates
[adb3.1@intel176 query_templates]$ vim alter_query.sh
#!/bin/bash
i=1
while [ $i -lt 100 ]
do
echo $i
echo "define _END = \"\";">>query$i.tpl
i=`expr $i + 1`
done
[adb3.1@intel176 query_templates]$ chmod +x alter_query.sh
[adb3.1@intel176 query_templates]$ sh alter_query.sh
2、使用oracle 语法生成99个sql 查询语句:(dialect支持oracle,db2,sqlserver,netezza,ansi)
尴尬:没有postgres语法,不过好在AntDB支持Oracle 语法,可以使用oracle语法生成。
[adb3.1@intel176 tools]$ cd ~/tpc-ds-tool-v2.1.0/tools/
[adb3.1@intel176 tools]$ ./dsqgen -output_dir /data/adb3.1/csv -input ../query_templates/templates.lst -scale 1 -dialect oracle -directory ../query_templates
qgen2 Query Generator (Version 2.0.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
Warning: This scale factor is valid for QUALIFICATION ONLY
postgres语法的99个查询sql,可以使用github修改的:
[adb3.1@intel176 ~]$ git clone https://github.com/arikarchmer/TPC-DS-qry-plan-to-execution-time.git
Cloning into 'TPC-DS-qry-plan-to-execution-time'...
remote: Enumerating objects: 106, done.
remote: Total 106 (delta 0), reused 0 (delta 0), pack-reused 106
Receiving objects: 100% (106/106), 41.20 KiB | 141.00 KiB/s, done.
Resolving deltas: 100% (24/24), done.
[adb3.1@intel176 TPC-DS-qry-plan-to-execution-time]$ vim tpcds_query.sh
#!/bin/bash
RESULTS_DIR=$1
DBNAME=$2
USER=$3
PORT=$4
for n in `seq 1 99`
do
q="/data/adb3.1/TPC-DS-qry-plan-to-execution-time/sqlFiles/query_$n.sql"
if [ -f "$q" ]; then
echo "======= query $n ======="
/usr/bin/time -a -f "$n = %e" -o results.log psql -h localhost -U $USER -d $DBNAME -p $PORT < $q > $RESULTS_DIR/results/$n 2> $RESULTS_DIR/errors/$n
fi
sleep 3
done
[adb3.1@intel176 TPC-DS-qry-plan-to-execution-time]$ sh tpcds_query.sh /data/adb3.1/TPC-DS-qry-plan-to-execution-time tpcds adb3.1 6603
======= query 1 =======
======= query 2 =======
======= query 3 =======
======= query 4 =======
======= query 5 =======
======= query 6 =======
======= query 7 =======
======= query 8 =======
======= query 9 =======