对PostgreSQL xmax的理解

xmax

The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. It is possible for this column to be nonzero in a visible row version. That usually indicates that the deleting transaction hasn't committed yet, or that an attempted deletion was rolled back.

http://www.postgresql.org/docs/9.1/static/ddl-system-columns.html

作一个实验:

我数据库关闭前,最后一条transaction的id是 1874。

我开一个终端A,此时终端A的当前transactionId为 1875。

[pgsql@localhost bin]$ ./psql

psql (9.1.2)

Type "help" for help.



pgsql=# begin;

BEGIN



pgsql=# select xmin,xmax,* from tab01;

 xmin | xmax | id | cd 

------+------+----+----

 1866 |    0 |  3 | 3

 1867 |    0 |  4 | 4

 1868 |    0 |  5 | 5

 1870 |    0 |  6 | 6

 1872 |    0 |  7 | 7

 1873 |    0 |  8 | 8

 1874 |    0 |  9 | 9

(7 rows)



pgsql=# 

我再开一个终端B,此时,终端B的transactionId为:1876。

[pgsql@localhost bin]$ ./psql

psql (9.1.2)

Type "help" for help.



pgsql=# begin;

BEGIN

pgsql=# select xmin,xmax,* from tab01;

 xmin | xmax | id | cd 

------+------+----+----

 1866 |    0 |  3 | 3

 1867 |    0 |  4 | 4

 1868 |    0 |  5 | 5

 1870 |    0 |  6 | 6

 1872 |    0 |  7 | 7

 1873 |    0 |  8 | 8

 1874 |    0 |  9 | 9

(7 rows)



pgsql=# 

回到终端A,执行 delete 操作:

pgsql=# delete from tab01 where id=9;

DELETE 1

pgsql=# 

此时,在终端A中,已经看不到删除后的数据:

pgsql=# select xmin,xmax,* from tab01;

 xmin | xmax | id | cd 

------+------+----+----

 1866 |    0 |  3 | 3

 1867 |    0 |  4 | 4

 1868 |    0 |  5 | 5

 1870 |    0 |  6 | 6

 1872 |    0 |  7 | 7

 1873 |    0 |  8 | 8

(6 rows)



pgsql=# 

此时,由于终端A尚未提交,所以,可以在终端B中看到如下的情形:

pgsql=# select xmin,xmax,* from tab01;

 xmin | xmax | id | cd 

------+------+----+----

 1866 |    0 |  3 | 3

 1867 |    0 |  4 | 4

 1868 |    0 |  5 | 5

 1870 |    0 |  6 | 6

 1872 |    0 |  7 | 7

 1873 |    0 |  8 | 8

 1874 | 1875 |  9 | 9

(7 rows)



pgsql=# 

也就是说,id为9的那条记录,其xmax为1875,表明其为 transactionid为 1875的事务所删除。

回到终端A,进行提交:

pgsql=# commit;

COMMIT

pgsql=# select xmin,xmax,* from tab01;

 xmin | xmax | id | cd 

------+------+----+----

 1866 |    0 |  3 | 3

 1867 |    0 |  4 | 4

 1868 |    0 |  5 | 5

 1870 |    0 |  6 | 6

 1872 |    0 |  7 | 7

 1873 |    0 |  8 | 8

(6 rows)



pgsql=# 

再回到终端B,查看:

pgsql=# select xmin,xmax,* from tab01;

 xmin | xmax | id | cd 

------+------+----+----

 1866 |    0 |  3 | 3

 1867 |    0 |  4 | 4

 1868 |    0 |  5 | 5

 1870 |    0 |  6 | 6

 1872 |    0 |  7 | 7

 1873 |    0 |  8 | 8

(6 rows)



pgsql=# 

让我来再进一步,看一看:

重新开两个终端:

终端A和终端B。

在终端A中:

[pgsql@localhost bin]$ ./psql

psql (9.1.2)

Type "help" for help.



pgsql=# select xmin,xmax,cmin,cmax,* from tab01;

 xmin | xmax | cmin | cmax | id | cd 

------+------+------+------+----+----

 1878 |    0 |    0 |    0 |  1 | 1

 1879 |    0 |    0 |    0 |  2 | 2

(2 rows)



pgsql=# \q

可以看到两条由不同的事务提交所形成的记录。

然后再次使用psql: 经过update 后,自己所看到的是 xmin的变化,这时尚未提交,别的终端看到就不一样了。

终端A:

[pgsql@localhost bin]$ ./psql

psql (9.1.2)

Type "help" for help.



pgsql=# begin;

BEGIN

pgsql=# update tab01 set id=3 where cd='2';

UPDATE 1

pgsql=# select xmin,xmax,cmin,cmax,* from tab01;

 xmin | xmax | cmin | cmax | id | cd 

------+------+------+------+----+----

 1878 |    0 |    0 |    0 |  1 | 1

 1880 |    0 |    0 |    0 |  3 | 2

(2 rows)



pgsql=# 

此时,进入终端B:

[pgsql@localhost bin]$ ./psql

psql (9.1.2)

Type "help" for help.



pgsql=# select xmin,xmax,cmin,cmax,* from tab01;

 xmin | xmax | cmin | cmax | id | cd 

------+------+------+------+----+----

 1878 |    0 |    0 |    0 |  1 | 1

 1879 | 1880 |    0 |    0 |  2 | 2

(2 rows)



pgsql=# 

我推测,update的时候,自身事务所看到的是内存中的影像。同时它也已经提交到了物理文件上。而别的事务是从物理文件来读取的。

在A终端提交以后:

pgsql=# commit;

COMMIT

pgsql=# 

B终端所看到的:

pgsql=# select xmin,xmax,cmin,cmax,* from tab01;

 xmin | xmax | cmin | cmax | id | cd 

------+------+------+------+----+----

 1878 |    0 |    0 |    0 |  1 | 1

 1880 |    0 |    0 |    0 |  3 | 2

(2 rows)



pgsql=# 

 继续从代码上进行分析:

 

/*

 *    heap_update - replace a tuple

 *

 * NB: do not call this directly unless you are prepared to deal with

 * concurrent-update conditions.  Use simple_heap_update instead.

 *

 *    relation - table to be modified (caller must hold suitable lock)

 *    otid - TID of old tuple to be replaced

 *    newtup - newly constructed tuple data to store

 *    ctid - output parameter, used only for failure case (see below)

 *    update_xmax - output parameter, used only for failure case (see below)

 *    cid - update command ID (used for visibility test, and stored into

 *        cmax/cmin if successful)

 *    crosscheck - if not InvalidSnapshot, also check old tuple against this

 *    wait - true if should wait for any conflicting update to commit/abort

 *

 * Normal, successful return value is HeapTupleMayBeUpdated, which

 * actually means we *did* update it.  Failure return codes are

 * HeapTupleSelfUpdated, HeapTupleUpdated, or HeapTupleBeingUpdated

 * (the last only possible if wait == false).

 *

 * On success, the header fields of *newtup are updated to match the new

 * stored tuple; in particular, newtup->t_self is set to the TID where the

 * new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set iff a HOT

 * update was done.  However, any TOAST changes in the new tuple's

 * data are not reflected into *newtup.

 *

 * In the failure cases, the routine returns the tuple's t_ctid and t_xmax.

 * If t_ctid is the same as otid, the tuple was deleted; if different, the

 * tuple was updated, and t_ctid is the location of the replacement tuple.

 * (t_xmax is needed to verify that the replacement tuple matches.)

 */

HTSU_Result

heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,

            ItemPointer ctid, TransactionId *update_xmax,

            CommandId cid, Snapshot crosscheck, bool wait)

{

    HTSU_Result result;

    TransactionId xid = GetCurrentTransactionId();

    Bitmapset  *hot_attrs;

    ItemId        lp;

    HeapTupleData oldtup;

    HeapTuple    heaptup;

    Page        page;

    Buffer        buffer,

                newbuf;

    bool        need_toast,

                already_marked;

    Size        newtupsize,

                pagefree;

    bool        have_tuple_lock = false;

    bool        iscombo;

    bool        use_hot_update = false;

    bool        all_visible_cleared = false;

    bool        all_visible_cleared_new = false;



    ...



    ////////////////First Phase marked by gaojian



    newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK);

    newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);

    newtup->t_data->t_infomask |= (HEAP_XMAX_INVALID | HEAP_UPDATED);

    HeapTupleHeaderSetXmin(newtup->t_data, xid);

    HeapTupleHeaderSetCmin(newtup->t_data, cid);

    HeapTupleHeaderSetXmax(newtup->t_data, 0);    /* for cleanliness */

    newtup->t_tableOid = RelationGetRelid(relation);



    ...
if (!already_marked) { /* Clear obsolete visibility flags ... */ oldtup.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID | HEAP_XMAX_IS_MULTI | HEAP_IS_LOCKED | HEAP_MOVED); /* ... and store info about transaction updating this tuple */ ///HeapTupleHeaderSetXmax(oldtup.t_data, xid); /////>>>>>added by gaojian for testing. ////xid = (TransactionId)8888; fprintf(stderr,"x-----1,xid is :%d \n",(int)xid); HeapTupleHeaderSetXmax(oldtup.t_data, xid); HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo); sleep(60); } ... return HeapTupleMayBeUpdated; }

 

可以看到,第一段的 :

    newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK);

    newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);

    newtup->t_data->t_infomask |= (HEAP_XMAX_INVALID | HEAP_UPDATED);

    HeapTupleHeaderSetXmin(newtup->t_data, xid);

    HeapTupleHeaderSetCmin(newtup->t_data, cid);

    HeapTupleHeaderSetXmax(newtup->t_data, 0);    /* for cleanliness */

    newtup->t_tableOid = RelationGetRelid(relation);

写的很明白, HeapTupleHeaderSetXmin(newtup->t_data, xid);

而第二段的 :

    if (!already_marked)

    {

        /* Clear obsolete visibility flags ... */

        oldtup.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED |

                                       HEAP_XMAX_INVALID |

                                       HEAP_XMAX_IS_MULTI |

                                       HEAP_IS_LOCKED |

                                       HEAP_MOVED);

        /* ... and store info about transaction updating this tuple */



        HeapTupleHeaderSetXmax(oldtup.t_data, xid);

        HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);

    }

然后,我再来验证一下,加点调试代码,这样,我执行sql时会出错:

    if (!already_marked)

    {

        /* Clear obsolete visibility flags ... */

        oldtup.t_data->t_infomask &= ~(HEAP_XMAX_COMMITTED |

                                       HEAP_XMAX_INVALID |

                                       HEAP_XMAX_IS_MULTI |

                                       HEAP_IS_LOCKED |

                                       HEAP_MOVED);

        /* ... and store info about transaction updating this tuple */



        ///HeapTupleHeaderSetXmax(oldtup.t_data, xid);

        /////>>>>>added by gaojian for testing.

        ////xid = (TransactionId)8888;



        fprintf(stderr,"x-----1,xid is :%d \n",(int)xid);



        HeapTupleHeaderSetXmax(oldtup.t_data, xid);

        HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);



    }

执行结果:

pgsql=# begin;

BEGIN

pgsql=# update tab01 set id=2 where cd = '2';

ERROR:  could not access status of transaction 8888

DETAIL:  Could not read from file "pg_subtrans/0000" at offset 32768: Success.

pgsql=# \q

[pgsql@loca

再次登陆进来看看,哈,这证明我所看到的代码部分,就是实现写入xmax的部分。

总结来说,就是,在update(可能delete也是),用向oldtup写入transaction id的方式,来记录哪个transaction改动了记录。

[pgsql@localhost bin]$ ./psql

psql (9.1.2)

Type "help" for help.



pgsql=# select xmin,xmax,cmin,cmax, * from tab01;

 xmin | xmax | cmin | cmax |    id     | cd 

------+------+------+------+-----------+----

 1878 |    0 |    0 |    0 |         1 | 1

 1884 | 8888 |    0 |    0 | 999888777 | 2

 1885 |    0 |    0 |    0 |         2 | 2

(3 rows)



pgsql=# 

 

 

 

 

你可能感兴趣的:(PostgreSQL)