SQLSTAT -964 导致apply失败

环境:
apply运行在非target数据库上面
apply instance:dpapsort@g03edzrdb001
target instance:a3insort@g03edzrdb002


大概思路:
1.发现有问题的SET
2.查看apply log
3.查看相对应的source,target,table
4.查看snapshot


首先是以下SET出现问题,数据没有得到及时更新:
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 "select activate, SET_NAME, STATUS, LASTSUCCESS, SYNCHTIME from asn.IBMSNAP_SUBS_SET where status=1"

ACTIVATE SET_NAME           STATUS LASTSUCCESS                SYNCHTIME                 
-------- ------------------ ------ -------------------------- --------------------------
       1 RPTREP2WEB              1 2011-10-06-16.59.35.330525 2011-10-06-16.56.09.000000

dpapsort@g03edzrdb001:/home/dpapsort/torun
=> date
Fri Oct  7 19:48:46 MDT 2011


查询information center asn.IBMSNAP_SUBS_SET 表中的status为1的意思
引用

-1
    The replication failed. The Apply program backed out the entire set of rows it had applied, and no data was committed. If the startup parameter SQLERRCONTINUE = Y, the SQLSTATE that is returned to the Apply program during the last cycle is not one of the acceptable errors you indicated in the input file for SQLERRCONTINUE (apply qualifier.SQS).
0
    The Apply program processed the subscription set successfully. If the startup parameter SQLERRCONTINUE = Y, the Apply program did not encounter any SQL errors that you indicated for the SQLERRCONTINUE startup parameter (in apply_qualifier.SQS) and did not reject any rows.
2
    The Apply program is processing the subscription set in multiple cycles. It successfully processed a single logical subscription that was divided according to the MAX_SYNCH_MINUTES control column.


没有1的解释,询问L2,解释道是正在等待数据传送完成,类似于2

按照习惯,先检查apply的输出日志:
2011-10-07-17.24.48.797176 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".


显示SQLCODE is "-964", 查询其含义
=> db2 ? sql964

SQL0964C  The transaction log for the database is full.


transition log full, 就是说active log full

注意,这里的log是指apply在target数据库应用数据更新的日志,所以就是目标数据库log full。
往往容易搞混,很多人去查apply目前机器上的日志

当然有另外的情况,稍后说明

查询该SET的具体信息

dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 "select * from asn.IBMSNAP_SUBS_SET where SET_NAME='RPTREP2WEB'"

APPLY_QUAL         SET_NAME           SET_TYPE WHOS_ON_FIRST ACTIVATE SOURCE_SERVER      SOURCE_ALIAS TARGET_SERVER      TARGET_ALIAS STATUS LASTRUN                    REFRESH_TYPE SLEEP_MINUTES EVENT_NAME         LASTSUCCESS                SYNCHPOINT              SYNCHTIME                  CAPTURE_SCHEMA                 TGT_CAPTURE_SCHEMA             FEDERATED_SRC_SRVR FEDERATED_TGT_SRVR JRN_LIB    JRN_NAME   OPTION_FLAGS COMMIT_COUNT MAX_SYNCH_MINUTES AUX_STMTS ARCH_LEVEL
------------------ ------------------ -------- ------------- -------- ------------------ ------------ ------------------ ------------ ------ -------------------------- ------------ ------------- ------------------ -------------------------- ----------------------- -------------------------- ------------------------------ ------------------------------ ------------------ ------------------ ---------- ---------- ------------ ------------ ----------------- --------- ----------
RPTAPPLY2          RPTREP2WEB         R        S                    1 SRTSTG31           SRTSTG31     SORTPW31           SORTPW31         -1 2011-10-09-00.48.42.566513 R                        5 -                  2011-10-06-16.59.35.330525 x'4E8E3209000000040000' 2011-10-06-16.56.09.000000 ASN                            -                              -                  -                  -          -          TNNN                    -                 5         0 0801      



根据TARGET_SERVER和TARGET_ALIAS可以通过catalog查询目标机器的node和具体数据库,这里省略

在target db查看数据库的配置和相关的snapshot:
a3insort@g03edzrdb002:/home/a3insort
=> db2 get db cfg |grep -i log
 Log retain for recovery status                          = RECOVERY
 User exit for logging status                            = YES
 Catalog cache size (4KB)              (CATALOGCACHE_SZ) = (MAXAPPLS*5)
 Log buffer size (4KB)                        (LOGBUFSZ) = 8
 Log file size (4KB)                         (LOGFILSIZ) = 8196
 Number of primary log files                (LOGPRIMARY) = 56
 Number of secondary log files               (LOGSECOND) = 200
 
--这里的log空间设置不小了,共有(56+200)*8196*4k=8G空间

a3insort@g03edzrdb002:/home/a3insort
=> db2 get snapshot for all on SORTPW31 > snapshot.guoyanxi
a3insort@g03edzrdb002:/home/a3insort
=> grep -i ava snapshot.guoyanxi
Log space available to the database (Bytes)= 8545509551

--看到实际空闲的也有7.9G



为何发生这种问题呢?

尝试查看问题表(REPORT_REPOSITORY)结构,连到目标数据库查看:
a3insort@g03edzrdb002:/home/a3insort
=> db2 connect to SORTPW31

   Database Connection Information

 Database server        = DB2/LINUXZ64 9.5.6
 SQL authorization ID   = A3INSORT
 Local database alias   = SORTPW31

a3insort@g03edzrdb002:/home/a3insort
=> db2 list tables for all |grep -i REPORT_REPOSITORY
REPORT_REPOSITORY               SORT            T     2005-09-06-13.38.33.588317
REPORT_REPOSITORY_NEW           SORT            T     2005-09-10-16.51.12.699525
a3insort@g03edzrdb002:/home/a3insort
=> db2 describe table SORT.REPORT_REPOSITORY

                                Data type                     Column
Column name                     schema    Data type name      Length     Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
PUBLISH_TS                      SYSIBM    TIMESTAMP                   10     0 No    
SEQUENCE_ID                     SYSIBM    CHARACTER                    3     0 No    
REPORT_ID                       SYSIBM    CHARACTER                   30     0 No    
REPORT_NAME                     SYSIBM    VARCHAR                    512     0 Yes   
REPORT                          SYSIBM    BLOB                  62914560     0 No    

  5 record(s) selected.
  



留意到有BLOB类型,看来使用大量空间的就是它了。

查看apply tail 表,看是否有对应的错误:
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 "select whos_on_first concat set_name,status,SQLCODE,SQLSTATE,lastrun,full_refresh,set_inserted,set_updated,set_deleted,set_reworked from asn.ibmsnap_applytrail where APPLY_QUAL='RPTAPPLY2' order by lastrun desc fetch first 30 rows only"                                                        <

1                   STATUS SQLCODE     SQLSTATE LASTRUN                    FULL_REFRESH SET_INSERTED SET_UPDATED SET_DELETED SET_REWORKED
------------------- ------ ----------- -------- -------------------------- ------------ ------------ ----------- ----------- ------------
SRPTREP2WEB              2           - -        2011-10-07-17.26.33.116850 -                   56557           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-07-17.26.33.101000 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-07-17.26.33.060036 N                       0           0           0            0
SRPTREP2WEB              2           - -        2011-10-07-14.34.22.673597 -                   56477           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-07-11.42.11.645470 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-07-11.42.11.627289 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-07-08.34.54.296089 -                   56516           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-07-08.34.54.285864 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-07-08.34.54.263251 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-07-05.30.35.934544 -                   56559           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-07-05.30.35.911078 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-07-05.30.35.885605 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-07-01.58.42.982568 -                   56451           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-07-01.58.42.971046 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-07-01.58.42.831020 N                       0           0           0            0
SRPTREP2WEB             -1        -911 40001    2011-10-07-00.13.11.902757 -                       0           0           0            0
SRPTDEL2WEB              0           - -        2011-10-07-00.13.11.865380 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-07-00.13.11.763378 N                       0           0           0            0
SRPTREP2WEB             -1        -452 428A1    2011-10-06-21.19.25.525620 -                       0           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-06-21.19.25.504677 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-06-21.19.24.942895 N                       0           0           0            0
SRPTREP2WEB             -1        -452 428A1    2011-10-06-20.04.51.612274 -                   15078           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-06-20.04.51.597841 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-06-20.04.51.575335 N                       0           0           0            0
SRPTREP2WEB              2           - -        2011-10-06-17.04.36.117466 -                   56452           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-06-17.04.27.029707 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-06-17.04.18.935458 N                       0           0           0            0
SRPTREP2WEB              0           - -        2011-10-06-16.59.35.843988 N                       0           0           0            0
SRPTDEL2WEB              0           - -        2011-10-06-16.59.26.774660 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-06-16.59.18.619988 N                       0           0           0            0


奇怪的是,status一直是好的(0和2),表示apply正在尝试追赶数据

尝试手动重启APPLY,尝试再同步该问题表,并观察以上各个参数
重启途中尝试更新MAX_SYNCH_MINUTES,从10分钟改成5分钟,希望能强制apply提交数据,避免-964重复发生
--启动时间约为机器的10-08 00:00
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> date
Fri Oct  7 23:57:12 MDT 2011
--停apply
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> nohup /home/a3insort/sqllib/bin/asnacmd apply_qual=RPTAPPLY2 control_server=SRTSTG31 stop &

--update
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 "update ASN.IBMSNAP_SUBS_SET set MAX_SYNCH_MINUTES=5 where APPLY_QUAL='RPTAPPLY2'"
DB20000I  The SQL command completed successfully.
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 "select APPLY_QUAL ,SET_NAME ,STATUS ,LASTRUN ,MAX_SYNCH_MINUTES from ASN.IBMSNAP_SUBS_SET where APPLY_QUAL='RPTAPPLY2'"

APPLY_QUAL         SET_NAME           STATUS LASTRUN                    MAX_SYNCH_MINUTES
------------------ ------------------ ------ -------------------------- -----------------
RPTAPPLY2          RPTDEF2WEB              0 2011-10-07-17.26.33.060036                 5
RPTAPPLY2          RPTDEL2WEB              0 2011-10-07-17.26.33.101000                 5
RPTAPPLY2          RPTREP2WEB              1 2011-10-07-17.26.33.116850                 5

  3 record(s) selected.

--启动apply
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> nohup /home/a3insort/sqllib/bin/asnapply CONTROL_SERVER=SRTSTG31 APPLY_QUAL=RPTAPPLY2 APPLY_PATH="/home/dpapsort/torun" PWDFILE=asnpwd.aut inamsg=n &



查看apply tail表:

dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 "select whos_on_first concat set_name,status,SQLCODE,SQLSTATE,lastrun,full_refresh,set_inserted,set_updated,set_deleted,set_reworked from asn.ibmsnap_applytrail where >"

1                   STATUS SQLCODE     SQLSTATE LASTRUN                    FULL_REFRESH SET_INSERTED SET_UPDATED SET_DELETED SET_REWORKED
------------------- ------ ----------- -------- -------------------------- ------------ ------------ ----------- ----------- ------------
SRPTDEL2WEB              0           - -        2011-10-08-00.00.34.823373 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-00.00.34.809366 N                       0           0           0            0
SRPTREP2WEB              2           - -        2011-10-07-17.26.33.116850 -                   56557           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-07-17.26.33.101000 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-07-17.26.33.060036 N                       0           0           0            0


--一切正常



这时候发现apply运行的目录马上产生大量的LOB文件
--查询10月8日产生的
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> date
Sat Oct  8 00:15:55 MDT 2011
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> ls -atrl |grep LOB |grep "10-08" |wc -l
44952

--最终停止在8万6千个文件,共27G中
dpapsort@g03edzrdb001:/home/dpapsort/torun
=>  ls -atrl |grep LOB |grep "10-08" |du -sm
27715   .
dpapsort@g03edzrdb001:/home/dpapsort/torun
=>  ls -atrl |grep LOB |grep "10-08" |wc -l       
86588


这时候,可以预见问题的发生了,如果LOB也是写log的话,目标数据库只有8G日志空间,就是LOB这个列,也远远不够用
(一般生产数据库,都是保持默认的日志写策略。同时,没有人会这么热心给你一个个列去优化,除非出了问题)

在这时观察目标数据库的日志使用情况:
a3insort@g03edzrdb002:/home/a3insort
=> db2 get snapshot for all on SORTPW31 > snapshot.guoyanxi
a3insort@g03edzrdb002:/home/a3insort
=> grep -i avail snapshot.guoyanxi
Log space available to the database (Bytes)= 6748851456
--迅速减少

--最终
a3insort@g03edzrdb002:/home/a3insort
=> grep "Log space available to the database" snapshot.guoyanxi
Log space available to the database (Bytes)= 33466620
You have new mail in /var/spool/mail/a3insort
a3insort@g03edzrdb002:/home/a3insort
=> db2 get snapshot for all on SORTPW31 > snapshot.guoyanxi
a3insort@g03edzrdb002:/home/a3insort
=> grep "Log space available to the database" snapshot.guoyanxi
Log space available to the database (Bytes)= 8545958228

--UOW失败,全部rollback
--经历约3小时,途中CPU,IO情况良好


再看看snapshot得到的其他信息:
a3insort@g03edzrdb002:/home/a3insort
=> db2 list application

Auth Id  Application    Appl.      Application Id                                                 DB       # of
         Name           Handle                                                                    Name    Agents
-------- -------------- ---------- -------------------------------------------------------------- -------- -----
SORTWAPP db2jcc_applica 46940      9.17.246.92.46156.111007170931                                 SORTPW31 1    
DPSORTNE asnapply       19709      9.63.48.130.59173.111008064111                                 SORTPW31 1    
SORTWAPP db2jcc_applica 29768      9.17.246.106.37012.111007105832                                SORTPW31 1    
SORTWAPP db2jcc_applica 33393      9.17.246.154.41820.111006125748                                SORTPW31 1    
DPCPSORT asncap         5814       *LOCAL.a3insort.111003050005                                   SORTPW31 1    
DPCPSORT asncap         5794       *LOCAL.a3insort.111003050006                                   SORTPW31 1    
SORTWAPP db2jcc_applica 11168      9.17.246.65.49996.110904033955                                 SORTPW31 1    
DPCPSORT asncap         20957      *LOCAL.a3insort.111008070737                                   SORTPW31 1    
DPCPSORT asncap         5825       *LOCAL.a3insort.111003050007                                   SORTPW31 1    
SORTWAPP db2jcc_applica 45542      9.17.246.94.45198.111007163718                                 SORTPW31 1    
DPAPSORT asnapply       7574       *LOCAL.a3insort.111003055906                                   SORTPW31 1    
DPCPSORT asncap         5830       *LOCAL.a3insort.111003050004                                   SORTPW31 1    

--asnapply使用的app id是19709

a3insort@g03edzrdb002:/home/a3insort
=> db2 get snapshot for application agentid 19709 > app.snap.guoyanxi

UOW log space used (Bytes)                 = 1710859471				--大量的log space被占用
Previous UOW completion timestamp          = 10/08/2011 00:41:11.575385
Elapsed time of last completed uow (sec.ms)= 0.002934
UOW start timestamp                        = 10/08/2011 00:41:11.575846

--CPU时间
Total User CPU Time used by agent (s)      = 438.752110
Total System CPU Time used by agent (s)    = 0.000000
Host execution elapsed time                = 1557.234473

--可以看到该死的程序员居然不commit!
Number of SQL requests since last commit   = 102113
Commit statements                          = 1



这时候,可以预见apply log将出现-964了:
2011-10-08-02.54.29.936390 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".


再看看apply tail 表
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 select whos_on_first concat set_name,status,SQLCODE,SQLSTATE,lastrun,full_refresh,set_inserted,set_updated,set_deleted,set_reworked from asn.ibmsnap_applytrail where > 


1                   STATUS SQLCODE     SQLSTATE LASTRUN                    FULL_REFRESH SET_INSERTED SET_UPDATED SET_DELETED SET_REWORKED
------------------- ------ ----------- -------- -------------------------- ------------ ------------ ----------- ----------- ------------
SRPTDEL2WEB              0           - -        2011-10-08-02.56.07.347312 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-02.56.07.329110 N                       0           0           0            0
SRPTREP2WEB              2           - -        2011-10-08-00.00.34.832285 -                   56626           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-00.00.34.823373 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-00.00.34.809366 N                       0           0           0            0


居然正常!这个也是L2也不能理解的地方。


SYNCHTIME当然也不会正常
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 "select activate, SET_NAME, STATUS, LASTSUCCESS, SYNCHTIME from asn.IBMSNAP_SUBS_SET where status=1"

ACTIVATE SET_NAME           STATUS LASTSUCCESS                SYNCHTIME                 
-------- ------------------ ------ -------------------------- --------------------------
       1 RPTREP2WEB              1 2011-10-06-16.59.35.330525 2011-10-06-16.56.09.000000



################################一天过去了###############################################
一天后再看:
apply log:
2011-10-08-02.54.29.936390 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
2011-10-08-06.01.09.191652 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
2011-10-08-09.12.04.227249 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
2011-10-08-12.11.30.560552 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
2011-10-08-15.16.24.835641 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
2011-10-08-18.19.40.147862 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
2011-10-08-21.15.56.355534 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
2011-10-09-00.47.02.106943 <CPDOINS(C9/02)> ASN1001E  APPLY "RPTAPPLY2" : "WorkerThread". The Apply program encountered an SQL error. The ERRCODE is "C90102". The SQLSTATE is "57011". The SQLCODE is "-964". The SQLERRM is "". The SQLERRP is "SQLRI039". The server name is "". The table name is "REPORT_REPOSITORY".
--一直失败,每次相隔约3个小时



时间间隔如此有规律,可以想象每次都是把log space吃完就返回错误

再看看apply tail表:
dpapsort@g03edzrdb001:/home/dpapsort/torun
=> db2 select whos_on_first concat set_name,status,SQLCODE,SQLSTATE,lastrun,full_refresh,set_inserted,set_updated,set_deleted,set_reworked from asn.ibmsnap_applytrail where APPLY_QUAL=

1                   STATUS SQLCODE     SQLSTATE LASTRUN                    FULL_REFRESH SET_INSERTED SET_UPDATED SET_DELETED SET_REWORKED
------------------- ------ ----------- -------- -------------------------- ------------ ------------ ----------- ----------- ------------
SRPTREP2WEB             -1           - -        2011-10-09-00.48.42.566513 -                   56268           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-09-00.48.42.551125 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-09-00.48.42.526091 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-08-21.23.31.937023 -                   56626           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-21.23.31.927683 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-21.23.31.906231 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-08-18.21.21.394640 -                   56559           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-18.21.21.379054 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-18.21.21.354030 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-08-15.18.01.192603 -                   56551           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-15.18.01.179879 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-15.18.01.161890 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-08-12.13.11.878571 -                   56408           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-12.13.11.863796 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-12.13.11.838547 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-08-09.13.37.238808 -                   56559           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-09.13.37.225786 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-09.13.37.160247 N                       0           0           0            0
SRPTREP2WEB             -1           - -        2011-10-08-06.02.56.674972 -                   56547           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-06.02.56.662221 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-06.02.56.643343 N                       0           0           0            0
SRPTREP2WEB              2           - -        2011-10-08-02.56.07.362930 -                   56557           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-02.56.07.347312 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-02.56.07.329110 N                       0           0           0            0
SRPTREP2WEB              2           - -        2011-10-08-00.00.34.832285 -                   56626           0       86483            0
SRPTDEL2WEB              0           - -        2011-10-08-00.00.34.823373 N                       0           0           0            0
SRPTDEF2WEB              0           - -        2011-10-08-00.00.34.809366 N                       0           0           0            0


与apply log时间吻合

结论:
经过与L2的讨论之后,判定应用在 2011-10-06-16.59.35.330525 这个时间点对问题表有大批量的操作
同时,在这个期间,UOW没有得到有效、频繁的commit
导致单个UOW过大,从而导致目标数据库的活动日志空间撑满

解决方案:
当然不能为了单个UOW而增加30G的日志空间。
更不能姑息程序员的懒惰!(出了故障还说是5*12的支持)


PS:
之前说过大量的数据,特别是LOB数据,会导致目标数据库的日志空间撑满。
而在apply得到capture的数据后,apply则会把LOB数据以文件的方式存放在apply所在的目录。
所以要确保这个目录空间足够大。

PPS:
所有的DBA都有一个共识,就是无论你使用的是oracle还是DB2,还是什么RDBMS,反正多commit,对大家都好

PPPS:
还是老话,看上去以上的过程顺理成章,其实经过了整整10个小时的尝试和观察。
不过在这里做了梳理(也用了差不多2个小时),希望日后方便查看。

你可能感兴趣的:(apply)