IMDB数据集导入PostgreSQL和join order benchmark(JOB)查询生成:
join order benchmark(JOB)-github-含有安装教程
进入github,需要查询语句直接下载即可:
注意,代码里有给出IMDB数据集的下载,但是第二步的网站链接失效了,所以用其它方法导入:
数据集 TPC-H、TPC-DS、IMDB的导入使用
(1)下载CSV等文件
下载 imdb.tgz,放置到到某个路径,记住该路径,后面有用。作者这里放置在/var/lib/pgsql/benchmark
接着,解压imdb.tgz:
tar -zxvf imdb.tgz
以下命令都需要进入psql后运行:
(2)psql
进入PG,创建数据库:
CREATE DATABASE imdbload;
使用imdbload数据库:
\c imdbload
(2)执行sql脚本创建表,注意讲前面的路径修改为imdb.tgz的放置路径:
\i /var/lib/pgsql/benchmark/schematext.sql;
(3)导入数据,注意讲前面的路径修改为imdb.tgz的放置路径:
\copy aka_name from '/var/lib/pgsql/benchmark/aka_name.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy aka_title from '/var/lib/pgsql/benchmark/aka_title.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy cast_info from '/var/lib/pgsql/benchmark/cast_info.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy char_name from '/var/lib/pgsql/benchmark/char_name.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy comp_cast_type from '/var/lib/pgsql/benchmark/comp_cast_type.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy company_name from '/var/lib/pgsql/benchmark/company_name.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy company_type from '/var/lib/pgsql/benchmark/company_type.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy complete_cast from '/var/lib/pgsql/benchmark/complete_cast.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy info_type from '/var/lib/pgsql/benchmark/info_type.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy keyword from '/var/lib/pgsql/benchmark/keyword.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy kind_type from '/var/lib/pgsql/benchmark/kind_type.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy link_type from '/var/lib/pgsql/benchmark/link_type.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy movie_companies from '/var/lib/pgsql/benchmark/movie_companies.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy movie_info from '/var/lib/pgsql/benchmark/movie_info.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy movie_info_idx from '/var/lib/pgsql/benchmark/movie_info_idx.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy movie_keyword from '/var/lib/pgsql/benchmark/movie_keyword.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy movie_link from '/var/lib/pgsql/benchmark/movie_link.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy name from '/var/lib/pgsql/benchmark/name.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy person_info from '/var/lib/pgsql/benchmark/person_info.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy role_type from '/var/lib/pgsql/benchmark/role_type.csv' with delimiter as ',' csv quote '"' escape as '\';
\copy title from '/var/lib/pgsql/benchmark/title.csv' with delimiter as ',' csv quote '"' escape as '\';
(4)检验数据(可有可无)
导入后我们并不知道是否导入成功,可以写个shell脚本检验下。当然,如果嫌麻烦可以跳过,检查一两个表即可。
bash命令显示imdbload的所有表:
echo "\dt" | psql -t -A -d imdbload
如果显示的结果是:
public|aka_name|table|postgres
public|aka_title|table|postgres
public|cast_info|table|postgres
public|char_name|table|postgres
public|comp_cast_type|table|postgres
public|company_name|table|postgres
public|company_type|table|postgres
public|complete_cast|table|postgres
public|info_type|table|postgres
public|keyword|table|postgres
public|kind_type|table|postgres
public|link_type|table|postgres
public|movie_companies|table|postgres
public|movie_info|table|postgres
public|movie_info_idx|table|postgres
public|movie_keyword|table|postgres
public|movie_link|table|postgres
public|name|table|postgres
public|person_info|table|postgres
public|role_type|table|postgres
public|title|table|postgres
那么脚本需要分割|
:
#!/bin/bash
# 获取所有表格名称
TABLES=$(echo "\dt" | psql -t -A -d imdbload)
# 遍历每个表格并获取其记录数
for table in $TABLES; do
table=$(echo "${table}" | cut -d '|' -f 2)
count=$(echo "SELECT COUNT(*) FROM $table" | psql -t -A -d imdbload)
echo "$table: $count"
done
如果只是movie_info
,那么去掉第八行table=$(echo "${table}" | cut -d '|' -f 2)
测试:
vim test_imdb.sh
,将完整的脚本写入,wq
退出。然后sh test_imdb.sh
,如果结果:
发现都有数据,那么说明导入数据成功!