PostgreSQL——对一列做全表更新——update还是alter

在 PostgreSQL 上,如果我们想对一张表的一个列做全表更新,我们有两种方法可选择。一种方法是通过DML修改,即使用 update 语句;另一种方法是通过DDL 修改,即使用 alter语句。

现在我们通过一个简单的例子,比较这两种方法的性能。

 

我们分别创建了两张结构相同的表student1和 student2,并分别插入1百万条测试数据。

 

CREATE table student1

(

    id int,

    name varchar(32),

    gender int,

    grade int,

    class int,

    sno   varchar(32),

    birthday date,

    phone varchar(32),

    created_at timestamptz

);

insert into student1 (id, name, gender, grade, class,sno, birthday, phone, created_at)

select no, 'jim', 1, 1, 1, to_char(no, 'FM0000000'), to_timestamp('2000-01-01', 'YYYY-MM-DD'), '12345678901', now()  from (select generate_series(1,1000000) as no) a;

 

CREATE table student2

(

    id int,

    name varchar(32),

    gender int,

    grade int,

    class int,

    sno   varchar(32),

    birthday date,

    phone varchar(32),

    created_at timestamptz

);

insert into student2 (id, name, gender, grade, class,sno, birthday, phone, created_at)

select no, 'jim', 1, 1, 1, to_char(no, 'FM0000000'), to_timestamp('2000-01-01', 'YYYY-MM-DD'), '12345678901', now()  from (select generate_series(1,1000000) as no) a;

 

接下来,我们为这两张表分别增加两个字段,created_at_utc 和 created_at_tmz。

alter table student1

    add column created_at_utc bigint,

    add column created_at_tmz varchar(8);

 

alter table student2

    add column created_at_utc bigint,

    add column created_at_tmz varchar(8);

 

我们要为这两个字段赋值。其中,create_at_utc 的值是 created_at 转化为以毫秒为单位的UTC时间,而 created_at_tmz 的值是 “+08:00”。

 

update student1 set created_at_utc =  (extract(epoch from created_at)*1000), created_at_tmz = '+08:00';

时间消耗: 16.6s

 

alter table student2

    alter column created_at_utc type bigint using (extract(epoch from created_at)*1000),

    alter  column created_at_tmz type varchar(8) using '+08:00';

时间消耗:6.9

 

接下来,我们要查看两张表各自的大小:

select pg_size_pretty(pg_total_relation_size('student1'));

结果:177MB

 

select pg_size_pretty(pg_total_relation_size('student2'));

结果:96MB

 

最后,我们要查看两张表中生活的行和死亡的行的数量;

Select n_live_tup,n_dead_tup from pg_class where relname  = 'student1';

结果:1000000,1000000

 

Select n_live_tup,n_dead_tup from pg_class where relname  = 'student1';

结果:1000000,0

 

从比较结果来看,使用alter修改的性能更好。

 

为什么第二种方法的性能更好呢?

原来,PostgreSQL 实现了一种独特的基于行的多版本并发控制(MVCC)机制。在这种机制下,更新一个元组的实质是先将这一元组标记为对未来的事务不可用,然后向表中插入修改后的元组。因此,在全表更新时,实际修改和insert的元组的数量是表中活跃元组数量的两倍,修改后表的实际元组数也是原来的两倍。而使用 alter 修改字段值时,修改后的表的实际行的数量不会有变化。

 

因此,如果你希望修改整张表上某个字段的值,而目标的值是固定的,或者只取决于另一列,那么通过alter修改是一个更好的选择。

 

-- English

PostgreSQL——full-table update on a column ——update or alter

On PostgreSQL, if we want to update a column of a table, we have two options. One is to modify it by DML, that is, using UPDATE statement; the other is to modify it by DDL, that is using the ALTER statement.

Now we use a simple example to compare the performance of these two methods.

We created two tables student1 and student2 with the same structure, and inserted 1 million rows of test data to each of then.

 

CREATE table student1

(

    id int,

    name varchar(32),

    gender int,

    grade int,

    class int,

    sno   varchar(32),

    birthday date,

    phone varchar(32),

    created_at timestamptz

);

insert into student1 (id, name, gender, grade, class,sno, birthday, phone, created_at)

select no, 'jim', 1, 1, 1, to_char(no, 'FM0000000'), to_timestamp('2000-01-01', 'YYYY-MM-DD'), '12345678901', now()  from (select generate_series(1,1000000) as no) a;

 

CREATE table student2

(

    id int,

    name varchar(32),

    gender int,

    grade int,

    class int,

    sno   varchar(32),

    birthday date,

    phone varchar(32),

    created_at timestamptz

);

insert into student2 (id, name, gender, grade, class,sno, birthday, phone, created_at)

select no, 'jim', 1, 1, 1, to_char(no, 'FM0000000'), to_timestamp('2000-01-01', 'YYYY-MM-DD'), '12345678901', now()  from (select generate_series(1,1000000) as no) a;

 

Next, we added two columns, created_at_utc and created_at_tmz to each of the the two tables.

alter table student1

    add column created_at_utc bigint,

    add column created_at_tmz varchar(8);

 

alter table student2

    add column created_at_utc bigint,

    add column created_at_tmz varchar(8);

 

We needed to assign values to these two columns. Among them, the value of create_at_utc is created_at converted to UTC time in milliseconds, and the value of created_at_tmz is "+08:00".

 

update student1 set created_at_utc =  (extract(epoch from created_at)*1000), created_at_tmz = '+08:00';

time spent:  16.6s

 

alter table student2

    alter column created_at_utc type bigint using (extract(epoch from created_at)*1000),

    alter  column created_at_tmz type varchar(8) using '+08:00';

time spent: 6.9

 

Then, we would check the sizes of the two tables:

select pg_size_pretty(pg_total_relation_size('student1'));

result: 177MB

 

select pg_size_pretty(pg_total_relation_size('student2'));

result: 96MB

 

Finally, we would check the number of live and dead rows in the two tables:

Select n_live_tup,n_dead_tup from pg_class where relname  = 'student1';

result: 1000000,1000000

 

Select n_live_tup,n_dead_tup from pg_class where relname  = 'student1';

result: 1000000,0

 

From the results, it is clear that the performance of modify by ALTER  is better.

 

Why does the second method perform better?

PostgreSQL implements a  row-based multi-version concurrency-control (MVCC) mechanism. Under this mechanism, the essence of a program updating a tuple is to mark it as unavailable for future transactions, and then insert a modified tuple into the table. Therefore, when the full table is updated, the number of actually modified and inserted tuples is twice the number of active tuples in the table, and the actual number of tuples in the modified table is twice as large. When you modify the column values with ALTER, the actual number of rows in the modified table won't change.

 

Therefore, if you want to do a full-table update on a column, and the target value is fixed or only depends on another column,  modification by alter is a better choice.

 

 

你可能感兴趣的:(postgresql,postgresql,数据库)