此处,我简单的写一个例子,供参考(在Postgresql9.1下面做的):
(1)准备知识
针对 Postgres生成数据方法
生成序列
SELECT * FROM generate_series(1,5);
生成date
SELECT date(generate_series(now(), now() + '1 week', '1 day'));
生成integer 随机数
SELECT (random()*(2*10^9))::integer;
生成numeric 随机数
random()*100.)::numeric(4,2)
生成字符串
substr('abcdefghijklmnopqrstuvwxyz',1,(random()*26)::integer)
生成重复串
repeat('1',(random()*40)::integer)
举例:
SELECT generate_series(1,10) as key,(random()*100.)::numeric(4,2),repeat('1',(random()*25)::integer) ORDER BY random();
结果:
(2)使用存储过程生成
看完上面的数据 我们有必要写一个Fucntion去生成数据
CREATE OR REPLACE FUNCTION generate_test_data(IN num integer, IN table_name text) RETURNS void AS $BODY$ DECLARE query_sql text :=''; insert_sql text :=''; insert_sql_1 text :=''; insert_sql_2 text :=''; column_info record; type_name name :=''; column_name name :=''; BEGIN -- 根据表名查出各列的类型和列名 query_sql := 'select t.typname as type, a.attname as name from pg_class c, pg_attribute a, pg_type t where c.relname = '''|| table_name ||''' and a.attnum > 0 and a.attrelid = c.oid and a.atttypid = t.oid'; --组成插入语句 insert_sql_1 := 'insert into '||table_name||'('; insert_sql_2 := 'values('; for column_info in execute(query_sql) loop type_name := column_info.type; column_name := column_info.name; --根据类型写表达式拼insert语句 if substring(type_name from 1 for 3) = 'int' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || '(random()*(2*10^9))::integer' ||','; elsif type_name = 'text' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'substr(''abcdefghijklmnopqrstuvwxyz'',1, (random()*26)::integer)'||','; elsif type_name = 'char' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'substr(''abcdefghijklmnopqrstuvwxyz'',1, (random()*26)::integer)::char'||','; elsif substring(type_name from 1 for 5) = 'float' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || '(random()*(2*10^9))::float' ||','; elsif type_name = 'varchar' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'substr(''abcdefghijklmnopqrstuvwxyz'',1, (random()*26)::integer)'||','; elsif type_name = 'date' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'date(generate_series(now(), now() + ''1 week'', ''1 day''))' ||','; elsif type_name = 'time' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'generate_series(now(), now() + ''1 week'', ''1 day'')::time' ||','; elsif type_name = 'timestamp' then insert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'generate_series(now(), now() + ''1 week'', ''1 day'')::timestamp' ||','; end if; end loop; --insert语句生成OK insert_sql_1 := substring(insert_sql_1 from 1 for char_length(insert_sql_1)-1)||')'; insert_sql_2 := substring(insert_sql_2 from 1 for char_length(insert_sql_2)-1)||')'; insert_sql := insert_sql_1 ||' '||insert_sql_2; --raise notice '%',insert_sql; for i in 1..num loop --执行insert语句 --raise notice '%',i; execute(insert_sql); end loop; END; $BODY$ LANGUAGE 'plpgsql';
之后执行
select generate_test_data(1000000,'test');
不过这样Insert语句会很慢,之后放弃,采用insert into select写法
(3)采用Insert into select生成一千万条测试数据
如果您想知道执行该sql的时间,请在执行上述命令前设置:
postgres=# \timing on
Timing is on.
最后说明一点的是,这是在我的笔记本上测试(内存8G,SSD 硬盘),插入一千万条记录要25秒 ,faint:
主要参考:
[1]PostgreSQL 之 生成测试数据,http://hi.baidu.com/hjzheng/blog/item/c78ad09320eb5297a877a4f6.html,
[2]http://www.depesz.com/2010/02/02/waiting-for-9-0-table-and-index-sizes/
若查看其中的index的空间或整个relation的空间,请参考:
http://www.postgresql.org/docs/9.1/static/functions-admin.html ,
或:http://www.postgresql.org/docs/9.1/static/functions-admin.html。
[postgres@localhost ~]$ cd /home/postgres/db/master/pgsql/data/base/16384 注释:这里是pgsql的data目录
[postgres@localhost 16384]$ du -sh
6.3M(6)如果想用delete清空该表,然后真正清空硬盘空间
postgres=# delete from foo;
DELETE 10000
然后在命令行:
[postgres@localhost ~]$ /home/postgres/db/master/pgsql/bin/vacuumdb mydb
然后再用第3步中的看看,是不是又减少到最初的几兆(我的大约是6M)空间了。
(7)通过Shell脚本执行测试数据生成
result=`psql -U postgres -h localhost -d postgres << EOF
\timing on
insert into test(id, name) select i, 'text:' || i from generate_series(1,1) i;
EOF`
echo ${result}