Greenplum TPC-H 造数据
Greenplum 集群已安装搭建好,但是测试数据没有,所以利用基准测试工具TPC-H添加测试数据,在此之前文章就有提过TPC-DS数据仓库针对于Oracles生成Oracle生成测试数据并查询,参考文章链接:https://www.cndba.cn/Marvinn/categoryId/201 ,现在利用TPC-H工具生成测试数据,为后续数据库备份、恢复做数据准备,因为环境只有一套,但当前Greenplum集群环境中有两套数据库,所以只对其中一个数据库生成测数据,并备份,恢复到另外一套数据库以此走一遍GPL的备份、恢复
一、TPC-H介绍
TPC-H 基准测试是由 TPC-D(由 TPC 组织于 1994 年指定的标准,用于决策支持系统方面的测试基准)发展而来的.TPC-H 用 3NF 实现了一个数据仓库,共包含 8 个基本关系,其数据量可以设定从 1G~3T 不等。TPC-H 基准测试包括 22 个查询(Q1~Q22),其主要评价指标是各个查询的响应时间,即从提交查询到结果返回所需时间.TPC-H 基准测试的度量单位是每小时执行的查询数( QphH@size),其中 H 表示每小时系统执行复杂查询的平均次数,size 表示数据库规模的大小,它能够反映出系统在处理查询时的能力.TPC-H 是根据真实的生产运行环境来建模的,这使得它可以评估一些其他测试所不能评估的关键性能参数.总而言之,TPC 组织颁布的TPC-H 标准满足了数据仓库领域的测试需求,并且促使各个厂商以及研究机构将该项技术推向极限
二、TPC-H软件环境搭建
2.1、下载安装
官方TPC软件下载:http://www.tpc.org/information/current_specifications.asp
当前使用官方下载链接tpch_2_14_3,并上传主机Master节点
2.2、编译安装TPC-H软件
编译安装步骤参考链接:https://github.com/digoal/pg_tpch?spm=a2c4e.11153940.blogcont93.5.df104cd2p9viqS
2.2.1、授权
[root@master ~]# chown gpadmin.gpadmin tpch_2_14_3.tgz
gpadmin用户创建软件存放目录
[root@master gp]# su - gpadmin
[gpadmin@master ~]$ cd /gp/
[gpadmin@master gp]$ mkdir tpch
[gpadmin@master gp]$ exit
logout
[root@master gp]# mv /root/tpch_2_14_3.tgz /gp/tpch/
2.2.2、解压
[gpadmin@master tpch]$ cd /gp/tpch
[gpadmin@master tpch]$ tar -xvf tpch_2_14_3.tgz
[gpadmin@master tpch]$ ll
total 21916
drwxr-xr-x 8 gpadmin gpadmin 4096 Nov 4 2011 dbgen
drwxr-xr-x 10 gpadmin gpadmin 4096 Nov 4 2011 ref_data
-rw-r--r-- 1 gpadmin gpadmin 22431994 Oct 30 11:19 tpch_2_14_3.tgz
2.2.3、生成Makefile文件
[gpadmin@master tpch]$ cd dbgen
[gpadmin@master dbgen]$ cp makefile.suite Makefile
2.2.4、编辑Makefile
[gpadmin@master dbgen]$ vi Makefile
编辑修改文件第103-111行,修改如下:
103 CC = gcc
104 # Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
105 # SQLSERVER, SYBASE, ORACLE, VECTORWISE
106 # Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
107 # SGI, SUN, U2200, VMS, LINUX, WIN32
108 # Current values for WORKLOAD are: TPCH
109 DATABASE= ORACLE
110 MACHINE = LINUX
111 WORKLOAD = TPCH
112 #
2.2.5、make命令编译TPC-H
[gpadmin@master dbgen]$ make
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c
gcc -g -DDBNAME=/"dss/" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o
当前目录生成可执行文件dbgen以及qgen
2.2.6、TPC-H查看帮助
[gpadmin@master dbgen]$ ./dbgen -help
TPC-H Population Generator (Version 2.14.3 build 0)
Copyright Transaction Processing Performance Council 1994 - 2010
USAGE:
dbgen [-{vf}][-T {pcsoPSOL}]
[-s <scale>][-C <procs>][-S <step>]
dbgen [-v] [-O m] [-s <scale>] [-U <updates>]
Basic Options
===========================
-C <n> -- separate data set into <n> chunks (requires -S, default: 1)
-f -- force. Overwrite existing files
-h -- display this message
-q -- enable QUIET mode
-s <n> -- set Scale Factor (SF) to <n> (default: 1)
-S <n> -- build the <n>th step of the data/update set (used with -C or -U)
-U <n> -- generate <n> update sets
-v -- enable VERBOSE mode
Advanced Options
===========================
-b <s> -- load distributions for <s> (default: dists.dss)
-d <n> -- split deletes between <n> files (requires -U)
-i <n> -- split inserts between <n> files (requires -U)
-T c -- generate cutomers ONLY
-T l -- generate nation/region ONLY
-T L -- generate lineitem ONLY
-T n -- generate nation ONLY
-T o -- generate orders/lineitem ONLY
-T O -- generate orders ONLY
-T p -- generate parts/partsupp ONLY
-T P -- generate parts ONLY
-T r -- generate region ONLY
-T s -- generate suppliers ONLY
-T S -- generate partsupp ONLY
To generate the SF=1 (1GB), validation database population, use:
dbgen -vf -s 1
To generate updates for a SF=1 (1GB), use:
dbgen -v -U 1 -s 1
以上常用的参数为:
-s 使用dbgen产生一些测试数据, -s 表示scale(单位为GB),根据需要测试的数据量指定
三、Greenplum 测试步骤
参考博客链接: https://yq.aliyun.com/articles/93?spm=a2c4e.11153940.blogcont98613.10.6e386c3eB5OOZf&commentId=29
3.1、生成测试数据
3.1.1、生成2G第一批测试数据
[gpadmin@master dbgen]$ ./dbgen -s 2 -f
TPC-H Population Generator (Version 2.14.3)
Copyright Transaction Processing Performance Council 1994 - 2010
查看生成的数据文件( CSV 格式)
[gpadmin@master dbgen]$ ll *tbl
-rw-rw-r-- 1 gpadmin gpadmin 48803120 Oct 30 13:47 customer.tbl
-rw-rw-r-- 1 gpadmin gpadmin 1532344491 Oct 30 13:47 lineitem.tbl
-rw-rw-r-- 1 gpadmin gpadmin 2224 Oct 30 13:47 nation.tbl
-rw-rw-r-- 1 gpadmin gpadmin 345760490 Oct 30 13:47 orders.tbl
-rw-rw-r-- 1 gpadmin gpadmin 239178674 Oct 30 13:47 partsupp.tbl
-rw-rw-r-- 1 gpadmin gpadmin 48380536 Oct 30 13:47 part.tbl
-rw-rw-r-- 1 gpadmin gpadmin 389 Oct 30 13:47 region.tbl
-rw-rw-r-- 1 gpadmin gpadmin 2826802 Oct 30 13:47 supplier.tbl
3.1.2、生成3G第二批测试数据 (添加time 命令查看生成数据花费时间)
[gpadmin@master dbgen]$ time ./dbgen -s 3 -f
TPC-H Population Generator (Version 2.14.3)
Copyright Transaction Processing Performance Council 1994 - 2010
real 1m0.570s
user 0m44.379s
sys 0m1.708s
以上命令式生成3G的测试数据,用时1m0.570s
3.2、查看测试数据
3.2.1、查看每个表的含义信息
表的含义详细说明,详细请查看该篇博客文章:https://blog.csdn.net/woshisap/article/details/44427899
8张表(表上有些约束等需哟啊满足,参见TPC-H规范),如下:
1、PART:表示零件的信息,如表4-1所示
2、SUPPLIER: 表示供货商信息,如表4-2所示
3、PARTSUPP: 表示供货商的零件信息,如表4-3所示
4、CUSTOMER: 表示消费者的信息,如表4-4所示
5、ORDERS: 表示订单信息,如表4-5所示
6、LINEITEM: 表示在线商品信息,如表4-6所示
7、NATION: 表示国家信息,如表4-7所示
8、REGION: 表示地区信息,如表4-8所示
3.2.2、查看消费者信息
[gpadmin@master dbgen]$ head -n 2 customer.tbl
1|Customer#000000001|IVhzIApeRb ot,c,E|15|25-989-741-2988|711.56|BUILDING|to the even, regular platelets. regular, ironic epitaphs nag e|
2|Customer#000000002|XSTf4,NCwDVaWNe6tEgvwfmRchLXak|13|23-768-687-3665|121.65|AUTOMOBILE|l accounts. blithely ironic theodolites integrate boldly: caref|
[gpadmin@master dbgen]$
3.2.3、查看零件信息
[gpadmin@master dbgen]$ head -n 2 part.tbl
1|goldenrod lavender spring chocolate lace|Manufacturer#1|Brand#13|PROMO BURNISHED COPPER|7|JUMBO PKG|901.00|ly. slyly ironi|
2|blush thistle blue yellow saddle|Manufacturer#1|Brand#13|LARGE BRUSHED BRASS|1|LG CASE|902.00|lar accounts amo|
[gpadmin@master dbgen]$
3.2.4、查看供货商信息
[gpadmin@master dbgen]$ head -n 2 supplier.tbl
1|Supplier#000000001| N kD4on9OM Ipw3,gf0JBoQDd7tgrzrddZ|17|27-918-335-1736|5755.94|each slyly above the careful|
2|Supplier#000000002|89eJ5ksX3ImxJQBvxObC,|5|15-679-861-2259|4032.68| slyly bold instructions. idle dependen|
[gpadmin@master dbgen]$
3.2.5、查看供货商的零件信息
[gpadmin@master dbgen]$ head -n 2 partsupp.tbl
1|2|3325|771.64|, even theodolites. regular, final theodolites eat after the carefully pending foxes. furiously regular deposits sleep slyly. carefully bold realms above the ironic dependencies haggle careful|
1|7502|8076|993.49|ven ideas. quickly even packages print. pending multipliers must have to are fluff|
[gpadmin@master dbgen]$
3.2.6、查看查看订单的信息
[gpadmin@master dbgen]$ head -n 2 orders.tbl
1|110701|O|212804.02|1996-01-02|5-LOW|Clerk#000002852|0|nstructions sleep furiously among |
2|234005|O|60987.15|1996-12-01|1-URGENT|Clerk#000002638|0| foxes. pending accounts at the pending, silent asymptot|
[gpadmin@master dbgen]$
3.2.7、 查看在线商品的信息
[gpadmin@master dbgen]$ head -n 2 lineitem.tbl
1|465569|23085|1|17|26087.18|0.04|0.02|N|O|1996-03-13|1996-02-12|1996-03-22|DELIVER IN PERSON|TRUCK|egular courts above the|
1|201928|21929|2|36|65876.76|0.09|0.06|N|O|1996-04-12|1996-02-28|1996-04-20|TAKE BACK RETURN|MAIL|ly final dependencies: slyly bold |
[gpadmin@master dbgen]$
3.2.8、查看国家的信息
[gpadmin@master dbgen]$ head -n 2 nation.tbl
0|ALGERIA|0| haggle. carefully final deposits detect slyly agai|
1|ARGENTINA|1|al foxes promise slyly according to the regular accounts. bold requests alon|
[gpadmin@master dbgen]$
3.2.9、查看地区信息
[gpadmin@master dbgen]$ head -n 2 region.tbl
0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
1|AMERICA|hs use ironic, even requests. s|
[gpadmin@master dbgen]$
3.3、创建测试表与加载测试数据
3.3.1、转化测试数据格式
将测试数据**.tbl转换为postgresql识别的CSV格式,删除末尾的分隔符|
[gpadmin@master dbgen]$ for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; done
[gpadmin@master dbgen]$
3.3.2、自动创建表,加载数据。详见tpch.sh脚本
注意: tpch.sh脚本需要借助github上的一个文件
下载地址:
wget https://github.com/tvondra/pg_tpch/archive/master.zip
解压安装
[root@master home]# ll
total 28
drwx------ 4 gpadmin gpadmin 4096 Oct 31 11:30 gpadmin
-rw-r--r-- 1 root root 21812 Oct 30 14:41 pg_tpch-master.zip
[root@master home]# unzip pg_tpch-master.zip
[root@master home]# chown -R gpadmin.gpadmin pg_tpch-master
[root@master home]# mv pg_tpch-master /gp/
在他的dss目录下面有加载tpch数据到gp的脚本
其中tpch-load.sql是列式存储,tpch-load_pg.sql是行存储,具体的优化熟悉gp用法之后自行修改优化。其他几个脚本是创建表的脚本
将pg_tpch的文件逗拷贝到dbgen下面:
[root@master home]# su - gpadmin
[gpadmin@master gp]$ cp -r /gp/pg_tpch-master/* /gp/tpch/dbgen
[gpadmin@master gp]$
3.3.3、创建一个queries目录,用于存放转换后的tpc-h 测试SQL。
[gpadmin@master gp]$ mkdir dss/queries
3.3.4、删除tbl文件(释放空间)
[gpadmin@master dbgen]$ rm -rf *.tbl
[gpadmin@master dbgen]$
3.3.5、生成测试SQL , SF= dbgen指定的scale值 因为之前创建了两次测试数据一次2G,另一次5G,总共5G
[gpadmin@master dbgen]$ vi qsql.sh
#! /bin/bash
SF=?
for q in `seq 1 22`
do
DSS_QUERY=dss/templates ./qgen -s $SF $q > dss/queries/$q.sql
sed 's/^select/explain select/' dss/queries/$q.sql > dss/queries/$q.explain.sql
done
[gpadmin@master dbgen]$ sh qsql.sh
3.3.6、在greenplum数据库中创建数据库和用户(也可以不创建,只要有就可以)
$ psql
postgres=# create role marvin with login;
CREATE ROLE
postgres=# create database marvin template template0 owner marvin;
CREATE DATABASE
因为之前GPL是有创建数据库PG,用户ccdba,此步省略
3.3.7、配置pg_hba.conf(之前已配置mg5 密码验证)
$ vi $MASTER_DATA_DIRECTORY/pg_hba.conf
host all all 127.0.0.1/32 trust
或者
host all ccdba 172.16.10.21/32 md5
重新加载配置文件
$ gpstop -u
3.3.8、设置几个参数:
gpconfig -c enable_nestloop -v off
gpconfig -c work_mem -v 256MB
gpstop -u
运行结果如下:
[gpadmin@master queries]$ gpconfig -c enable_nestloop -v off
20181030:14:55:17:021285 gpconfig:master:gpadmin-[INFO]:-completed successfully
[gpadmin@master queries]$
[gpadmin@master queries]$ gpconfig -c work_mem -v 256MB
20181030:14:55:28:021345 gpconfig:master:gpadmin-[INFO]:-completed successfully
[gpadmin@master queries]$ gpstop -u
20181030:14:55:36:021403 gpstop:master:gpadmin-[INFO]:-Starting gpstop with args: -u
20181030:14:55:36:021403 gpstop:master:gpadmin-[INFO]:-Gathering information and validating the environment...
20181030:14:55:36:021403 gpstop:master:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20181030:14:55:36:021403 gpstop:master:gpadmin-[INFO]:-Obtaining Segment details from master...
20181030:14:55:36:021403 gpstop:master:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.7.3 build 2'
20181030:14:55:36:021403 gpstop:master:gpadmin-[INFO]:-Signalling all postmaster processes to reload
.
3.3.9、测试,使用ccdba用户连接到PG数据库,结果输出到./results目录:
---------------------------------------------------------------------------------------------------------------
自动创建表,加载数据。详见tpch.sh脚本
命令格式如下:
./tpch.sh ./results tpch-db tpch-user
因为githb脚本默认是数据导入postgres数据库并且使用本地localhost登录,并且使用COPY命令需要超集管理员,所以需要修改tpch.sh脚本以及对应的dss目录下sql文件tpch-load.sql,这样才可以指定ccdba用户下导入PG库,具体修改的tpch.sh文件以及dss/tpch-load.sql文件见附件
-------------------------------------------------------------------------------------------------------------------------
注意:修改后的tpch.sh脚本不能按照之前的执行方式执行,而是需要指定库名、用户名、服务器主机IP
[gpadmin@master dbgen]$ ./tpch.sh ./results PG ccdba 172.16.10.21 // 等待一段数据加载时间即可
查看PG数据库数据量
PG=> select pg_size_pretty(pg_database_size('PG'));
pg_size_pretty
----------------
6981 MB
(1 row)
3.4、附件
tpch.sh 修改如下:
注意:修改后的tpch.sh脚本不能按照之前的执行方式执行,而是需要指定库名、用户名、服务器主机IP
[gpadmin@master dbgen]$ ./tpch.sh ./results PG ccdba 172.16.10.21
[gpadmin@master dbgen]$ cat tpch.sh
#!/bin/sh
RESULTS=$1
DBNAME=$2
USER=$3
localhost=$4
# delay between stats collections (iostat, vmstat, ...)
DELAY=15
# DSS queries timeout (5 minutes or something like that)
DSS_TIMEOUT=300 # 5 minutes in seconds
# log
LOGFILE=bench.log
function benchmark_run() {
mkdir -p $RESULTS
# store the settings
psql -h $localhost PG -c "select name,setting from pg_settings" > $RESULTS/settings.log 2> $RESULTS/settings.err
print_log "preparing TPC-H database"
# create database, populate it with data and set up foreign keys
# psql -h $localhost tpch < dss/tpch-create.sql > $RESULTS/create.log 2> $RESULTS/create.err
print_log " loading data"
psql -h $localhost -U $USER $DBNAME < dss/tpch-load.sql > $RESULTS/load.log 2> $RESULTS/load.err
print_log " creating primary keys"
psql -h $localhost -U $USER $DBNAME < dss/tpch-pkeys.sql > $RESULTS/pkeys.log 2> $RESULTS/pkeys.err
print_log " creating foreign keys"
psql -h $localhost -U $USER $DBNAME < dss/tpch-alter.sql > $RESULTS/alter.log 2> $RESULTS/alter.err
print_log " creating indexes"
psql -h $localhost -U $USER $DBNAME < dss/tpch-index.sql > $RESULTS/index.log 2> $RESULTS/index.err
print_log " analyzing"
psql -h $localhost -U $USER $DBNAME -c "analyze" > $RESULTS/analyze.log 2> $RESULTS/analyze.err
print_log "running TPC-H benchmark"
benchmark_dss $RESULTS
print_log "finished TPC-H benchmark"
}
function benchmark_dss() {
mkdir -p $RESULTS
mkdir $RESULTS/vmstat-s $RESULTS/vmstat-d $RESULTS/explain $RESULTS/results $RESULTS/errors
# get bgwriter stats
psql PG -c "SELECT * FROM pg_stat_bgwriter" > $RESULTS/stats-before.log 2>> $RESULTS/stats-before.err
psql PG -c "SELECT * FROM pg_stat_database WHERE datname = '$DBNAME'" >> $RESULTS/stats-before.log 2>> $RESULTS/stats-before.err
vmstat -s > $RESULTS/vmstat-s-before.log 2>&1
vmstat -d > $RESULTS/vmstat-d-before.log 2>&1
print_log "running queries defined in TPC-H benchmark"
for n in `seq 1 22`
do
q="dss/queries/$n.sql"
qe="dss/queries/$n.explain.sql"
if [ -f "$q" ]; then
print_log " running query $n"
echo "======= query $n =======" >> $RESULTS/data.log 2>&1;
# run explain
psql -h $localhost -U $USER $DBNAME < $qe > $RESULTS/explain/$n 2>> $RESULTS/explain.err
vmstat -s > $RESULTS/vmstat-s/before-$n.log 2>&1
vmstat -d > $RESULTS/vmstat-d/before-$n.log 2>&1
# run the query on background
/usr/bin/time -a -f "$n = %e" -o $RESULTS/results.log psql -h $localhost -U $USER $DBNAME < $q > $RESULTS/results/$n 2> $RESULTS/errors/$n &
# wait up to the given number of seconds, then terminate the query if still running (don't wait for too long)
for i in `seq 0 $DSS_TIMEOUT`
do
# the query is still running - check the time
if [ -d "/proc/$!" ]; then
# the time is over, kill it with fire!
if [ $i -eq $DSS_TIMEOUT ]; then
print_log " killing query $n (timeout)"
# echo "$q : timeout" >> $RESULTS/results.log
psql -h $localhost PG -c "SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE datname = 'tpch'" >> $RESULTS/queries.err 2>&1;
# time to do a cleanup
sleep 10;
# just check how many backends are there (should be 0)
psql -h $localhost PG -c "SELECT COUNT(*) AS tpch_backends FROM pg_stat_activity WHERE datname = 'tpch'" >> $RESULTS/queries.err 2>&1;
else
# the query is still running and we have time left, sleep another second
sleep 1;
fi;
else
# the query finished in time, do not wait anymore
print_log " query $n finished OK ($i seconds)"
break;
fi;
done;
vmstat -s > $RESULTS/vmstat-s/after-$n.log 2>&1
vmstat -d > $RESULTS/vmstat-d/after-$n.log 2>&1
fi;
done;
# collect stats again
psql PG -c "SELECT * FROM pg_stat_bgwriter" > $RESULTS/stats-after.log 2>> $RESULTS/stats-after.err
psql PG -c "SELECT * FROM pg_stat_database WHERE datname = '$DBNAME'" >> $RESULTS/stats-after.log 2>> $RESULTS/stats-after.err
vmstat -s > $RESULTS/vmstat-s-after.log 2>&1
vmstat -d > $RESULTS/vmstat-d-after.log 2>&1
}
function stat_collection_start()
{
local RESULTS=$1
# run some basic monitoring tools (iotop, iostat, vmstat)
for dev in $DEVICES
do
iostat -t -x /dev/$dev $DELAY >> $RESULTS/iostat.$dev.log &
done;
vmstat $DELAY >> $RESULTS/vmstat.log &
}
function stat_collection_stop()
{
# wait to get a complete log from iostat etc. and then kill them
sleep $DELAY
for p in `jobs -p`; do
kill $p;
done;
}
function print_log() {
local message=$1
echo `date +"%Y-%m-%d %H:%M:%S"` "["`date +%s`"] : $message" >> $RESULTS/$LOGFILE;
}
mkdir $RESULTS;
# start statistics collection
stat_collection_start $RESULTS
# run the benchmark
benchmark_run $RESULTS $DBNAME $USER $localhost
# stop statistics collection
stat_collection_stop
tpch-load.sql 修改如下:
[gpadmin@master dss]$ cat tpch-load.sql
BEGIN;
CREATE TABLE PART (
P_PARTKEY SERIAL,
P_NAME VARCHAR(55),
P_MFGR CHAR(25),
P_BRAND CHAR(10),
P_TYPE VARCHAR(25),
P_SIZE INTEGER,
P_CONTAINER CHAR(10),
P_RETAILPRICE DECIMAL,
P_COMMENT VARCHAR(23)
);
/COPY part FROM '/tmp/dss-data/part.csv' WITH csv DELIMITER '|';
COMMIT;
BEGIN;
CREATE TABLE REGION (
R_REGIONKEY SERIAL,
R_NAME CHAR(25),
R_COMMENT VARCHAR(152)
);
/COPY region FROM '/tmp/dss-data/region.csv' WITH csv DELIMITER '|';
COMMIT;
BEGIN;
CREATE TABLE NATION (
N_NATIONKEY SERIAL,
N_NAME CHAR(25),
N_REGIONKEY BIGINT NOT NULL, -- references R_REGIONKEY
N_COMMENT VARCHAR(152)
);
/COPY nation FROM '/tmp/dss-data/nation.csv' WITH csv DELIMITER '|';
COMMIT;
BEGIN;
CREATE TABLE SUPPLIER (
S_SUPPKEY SERIAL,
S_NAME CHAR(25),
S_ADDRESS VARCHAR(40),
S_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY
S_PHONE CHAR(15),
S_ACCTBAL DECIMAL,
S_COMMENT VARCHAR(101)
);
/COPY supplier FROM '/tmp/dss-data/supplier.csv' WITH csv DELIMITER '|';
COMMIT;
BEGIN;
CREATE TABLE CUSTOMER (
C_CUSTKEY SERIAL,
C_NAME VARCHAR(25),
C_ADDRESS VARCHAR(40),
C_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY
C_PHONE CHAR(15),
C_ACCTBAL DECIMAL,
C_MKTSEGMENT CHAR(10),
C_COMMENT VARCHAR(117)
);
/COPY customer FROM '/tmp/dss-data/customer.csv' WITH csv DELIMITER '|';
COMMIT;
BEGIN;
CREATE TABLE PARTSUPP (
PS_PARTKEY BIGINT NOT NULL, -- references P_PARTKEY
PS_SUPPKEY BIGINT NOT NULL, -- references S_SUPPKEY
PS_AVAILQTY INTEGER,
PS_SUPPLYCOST DECIMAL,
PS_COMMENT VARCHAR(199)
);
/COPY partsupp FROM '/tmp/dss-data/partsupp.csv' WITH csv DELIMITER '|';
COMMIT;
BEGIN;
CREATE TABLE ORDERS (
O_ORDERKEY SERIAL,
O_CUSTKEY BIGINT NOT NULL, -- references C_CUSTKEY
O_ORDERSTATUS CHAR(1),
O_TOTALPRICE DECIMAL,
O_ORDERDATE DATE,
O_ORDERPRIORITY CHAR(15),
O_CLERK CHAR(15),
O_SHIPPRIORITY INTEGER,
O_COMMENT VARCHAR(79)
);
/COPY orders FROM '/tmp/dss-data/orders.csv' WITH csv DELIMITER '|';
COMMIT;
BEGIN;
CREATE TABLE LINEITEM (
L_ORDERKEY BIGINT NOT NULL, -- references O_ORDERKEY
L_PARTKEY BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP)
L_SUPPKEY BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP)
L_LINENUMBER INTEGER,
L_QUANTITY DECIMAL,
L_EXTENDEDPRICE DECIMAL,
L_DISCOUNT DECIMAL,
L_TAX DECIMAL,
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
L_SHIPINSTRUCT CHAR(25),
L_SHIPMODE CHAR(10),
L_COMMENT VARCHAR(44)
);
/COPY lineitem FROM '/tmp/dss-data/lineitem.csv' WITH csv DELIMITER '|';
COMMIT;
版权声明:本文为博主原创文章,未经博主允许不得转载。



