前篇学习了Oracle add column default 在各版本的优化,顺便也再研究了下pg对于add column default的优化及实现原理,记录一下。
Oracle的优化关注点在于新增default时是否有not null约束,而pg则在于新增的default值是否是volatile的。具体而言:
pg 11开始,pg_attribute系统表中新增了两个字段:atthasmissing和attmissingval
因此在add column default时,对于常量默认值,并不对表进行rewrite,只更新pg_attribute系统表这两个字段,因此效率非常高。且此时会发现新加列后,表大小没有增长。
这两个pg_attribute字段仅在必要时使用。如果表在任何时候触发了重写,pg会趁机为每一行插入默认值并取消设置atthasmissing和 attmissingval。
后续插入的行及对该列进行更新时,其值会直接写入元组中。
例如:alter table t add credits int default 0;
既然实际上业务表是没存默认值的,查询到的时候怎么办?
例如设置DEFAULT now()会将事务的当前值now()放入atthasmissingval ,所有现有行查询将继承它,而新插入的行将返回该行实际插入的时间。
创建一个1000万行的表,验证一下上面的说法。
postgres=# create table t(a int);
CREATE TABLE
postgres=# insert into t select generate_series(1,10000000);
INSERT 0 10000000
postgres=#
postgres=# select pg_size_pretty(pg_total_relation_size('t'));
pg_size_pretty
----------------
346 MB
(1 row)
不像Oracle,pg中加不加not null对default是否需要重写表没有影响,有兴趣可以去掉试试,效果是一样的。这里主要有3个关注点:
postgres=# alter table t add b varchar(20) default '13888888888' not null;
ALTER TABLE
Time: 12.858 ms
postgres=# select pg_size_pretty(pg_total_relation_size('t'));
pg_size_pretty
----------------
346 MB
(1 row)
postgres=# select * from pg_attribute where attrelid='t'::regclass and attname='b';
-[ RECORD 1 ]--+--------------
attrelid | 49919
attname | b
atttypid | 1043
attstattarget | -1
attlen | -1
attnum | 2
attndims | 0
attcacheoff | -1
atttypmod | 24
attbyval | f
attalign | i
attstorage | x
attcompression |
attnotnull | t
atthasdef | t
atthasmissing | t
attidentity |
attgenerated |
attisdropped | f
attislocal | t
attinhcount | 0
attcollation | 100
attacl |
attoptions |
attfdwoptions |
attmissingval | {13888888888}
Time: 0.694 ms
同样有3个关注点:
postgres=# alter table t add d timestamp default timeofday()::timestamp;
ALTER TABLE
Time: 13616.290 ms (00:13.616)
postgres=# select pg_size_pretty(pg_total_relation_size('t'));
pg_size_pretty
----------------
651 MB
(1 row)
postgres=# select * from pg_attribute where attrelid='t'::regclass and attname='d';
-[ RECORD 1 ]--+------
attrelid | 49919
attname | d
atttypid | 1114
attstattarget | -1
attlen | 8
attnum | 4
attndims | 0
attcacheoff | -1
atttypmod | -1
attbyval | t
attalign | d
attstorage | p
attcompression |
attnotnull | f
atthasdef | t
atthasmissing | f
attidentity |
attgenerated |
attisdropped | f
attislocal | t
attinhcount | 0
attcollation | 0
attacl |
attoptions |
attfdwoptions |
attmissingval |
Time: 4.199 ms
还记得前面提到过,如果表在任何时候触发了重写,pg会趁机为每一行插入默认值并取消设置atthasmissing和attmissingval。此时再去看b字段的值:发现也变为atthasmissing=f,attmissingval为空
postgres=# select * from pg_attribute where attrelid='t'::regclass and attname='b';
-[ RECORD 1 ]--+------
attrelid | 49919
attname | b
atttypid | 1043
attstattarget | -1
attlen | -1
attnum | 2
attndims | 0
attcacheoff | -1
atttypmod | 24
attbyval | f
attalign | i
attstorage | x
attcompression |
attnotnull | t
atthasdef | t
atthasmissing | f
attidentity |
attgenerated |
attisdropped | f
attislocal | t
attinhcount | 0
attcollation | 100
attacl |
attoptions |
attfdwoptions |
attmissingval |
Time: 0.338 ms
说明在d字段触发表rewrite时,把b字段一起处理了。这一点从表大小也可以看出来,如果只加需要rewrite的字段,实际表只有400多M
postgres=# create table t2(a int);
CREATE TABLE
Time: 4.362 ms
postgres=# insert into t2 select generate_series(1,10000000);
INSERT 0 10000000
Time: 4942.546 ms (00:04.943)
postgres=# alter table t2 add d timestamp default timeofday()::timestamp;
ALTER TABLE
postgres=# select pg_size_pretty(pg_total_relation_size('t2'));
pg_size_pretty
----------------
422 MB
(1 row)
参考
https://www.modb.pro/db/80151
https://www.postgresql.org/docs/current/catalog-pg-attribute.html
https://stackoverflow.com/questions/19525083/does-adding-a-null-column-to-a-postgres-table-cause-a-lock
https://github.com/postgres/postgres/commit/16828d5c0273b4fe5f10f42588005f16b415b2d8
https://github.com/wyphandy/postgresql/blob/master/201805/20180518_01.md
https://dataegret.com/2018/03/waiting-for-postgresql-11-pain-free-add-column-with-non-null-defaults/
https://brandur.org/postgres-default