在窗口1,设置非自动提交
db2set db2options=+c
在窗口1,执行一条insert,但是不提交
$ db2 +c "insert into t1 values('leo1','leo2')" DB20000I The SQL command completed successfully. $ db2 "select * from t1" NAME1 NAME2 ---------- ---------- leo1 leo2 leo1 leo2 2 record(s) selected
在窗口2是看不到这条的,因为没有提交
$ db2 "select * from t1" NAME1 NAME2 ---------- ---------- leo1 leo2 1 record(s) selected.
在窗口2,编写脚本,并执行,脚本内容如下:
db2 connect to sample >> /dev/null i=1 while [ $i -lt 10000 ] do db2 "insert into t1 values('3','3')" >> /dev/null db2 commit >> /dev/null i=$i+1 done
在窗口1,查询插入条数,发现一段时间后,数据不增长了
$ db2 "select count(*) from t1" 1 ----------- 778
在窗口1,查看db2diag,其中报错如下:
2013-07-15-15.29.01.020342+120 I8184848A550 LEVEL: Error PID : 19857580 TID : 9983 PROC : db2sysc 0 INSTANCE: db2test NODE : 000 DB : SAMPLE APPHDL : 0-10490 APPID: *LOCAL.db2test.130715132615 AUTHID : DB2TEST EDUID : 9983 EDUNAME: db2agent (SAMPLE) 0 FUNCTION: DB2 UDB, data protection services, sqlpWriteLR, probe:6680 MESSAGE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE "Log File has reached its saturation point" DIA8309C Log file was full. 2013-07-15-15.29.01.046535+120 E8185399A590 LEVEL: Error PID : 19857580 TID : 9983 PROC : db2sysc 0 INSTANCE: db2test NODE : 000 DB : SAMPLE APPHDL : 0-10490 APPID: *LOCAL.db2test.130715132615 AUTHID : DB2TEST EDUID : 9983 EDUNAME: db2agent (SAMPLE) 0 FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:2860 MESSAGE : ADM1823E The active log is full and is held by application handle "10409..". Terminate this application by COMMIT, ROLLBACK or FORCE APPLICATION.
查看最早未提交的事务
db2 "select db.DBPARTITIONNUM,ai.agent_id, substr(ai.appl_status,1,20) as Status,substr(ai.primary_auth_id,1,10) as Authid,substr(ai.appl_name,1,15) as Appl_Name,int(ap.UOW_LOG_SPACE_USED/1024/1024) as Log_Used_M,int(ap.appl_idle_time/60) as Idle_for_min,ap.appl_con_time as Connected_Since from sysibmadm.snapdb db,sysibmadm.snapappl ap,sysibmadm.snapappl_info ai where ai.agent_id=db.APPL_ID_OLDEST_XACT and ap.agent_id=ai.agent_id" DBPARTITIONNUM AGENT_ID STATUS AUTHID APPL_NAME LOG_USED_M IDLE_FOR_MIN CONNECTED_SINCE -------------- -------------------- -------------------- ---------- --------------- ----------- ------------ -------------------------- 0 10409 UOWWAIT DB2TEST db2bp 0 0 2013-07-15-14.47.21.274710 1 record(s) selected
通过快照确定
db2 get snapshot for database on sample Appl id holding the oldest transaction = 10409
查看此application的状态
db2 list applications show detail CONNECT Auth Id Application Name Appl. Application Id Seq# Number of Coordinating DB Coordinator Status Status Change Time DB Name DB Path Handle Agents partition number pid/thread -------------------------------------------------------------------------------------------------------------------------------- -------------------- ---------- -------------------------------------------------------------- ----- ---------- ---------------- --------------- ------------------------------ -------------------------- -------- -------------------- DB2TEST db2fw7 10421 *LOCAL.DB2.130715124733 00001 1 0 9211 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw0 10414 *LOCAL.DB2.130715124726 00001 1 0 7156 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw6 10420 *LOCAL.DB2.130715124732 00001 1 0 8954 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2lused 10413 *LOCAL.DB2.130715124725 00001 1 0 6899 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw5 10419 *LOCAL.DB2.130715124731 00001 1 0 8697 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2wlmd 10412 *LOCAL.DB2.130715124724 00001 1 0 6642 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw4 10418 *LOCAL.DB2.130715124730 00001 1 0 8440 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2taskd 10411 *LOCAL.DB2.130715124723 00001 1 0 6385 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw3 10417 *LOCAL.DB2.130715124729 00001 1 0 8183 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2stmm 10410 *LOCAL.DB2.130715124722 00001 1 0 6128 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw2 10416 *LOCAL.DB2.130715124728 00001 1 0 7670 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2bp 10409 *LOCAL.db2test.130715124721 00006 1 0 1759 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2bp 10455 *LOCAL.db2test.130715130654 00003 1 0 9726 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2evmg_DB2DETAILDEA 10422 *LOCAL.DB2.130715124734 00001 1 0 9468 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw1 10415 *LOCAL.DB2.130715124727 00001 1 0 7413 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/
处于uow waiting 状态;
此时,尝试插入数据
$ db2 "insert into t1 values ('4','4')" DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL0964C The transaction log for the database is full. SQLSTATE=57011
查看此时的活动日志所在的目录空间
$ df -g /home/db2test/db2test/NODE0000/SQL00001/SQLOGDIR/ Filesystem GB blocks Free %Used Iused %Iused Mounted on /dev/hd1 5.00 4.10 18% 470 1% /home
查看此时活动日志中的日志文件
$ ls /home/db2test/db2test/NODE0000/SQL00001/SQLOGDIR/ S0000001.LOG S0000002.LOG S0000003.LOG S0000004.LOG S0000005.LOG
查看此时归档日志中的日志文件以及归档情况
$ ls /home/db2test/arch_log/db2test/SAMPLE/NODE0000/C0000000/ S0000000.LOG S0000001.LOG S0000002.LOG S0000003.LOG S0000004.LOG $ db2pd -d sample -logs Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 01:01:57 -- Date 07/15/2013 15:49:18 Logs: Current Log Number 5 Pages Written 2 Cur Commit Disk Log Reads 0 Cur Commit Total Log Reads 0 Method 1 Archive Status Success Method 1 Next Log to Archive 5 Method 1 First Failure n/a Method 2 Archive Status n/a Method 2 Next Log to Archive n/a Method 2 First Failure n/a Log Chain ID 0 Current LSN 0x0000000004A53F21 Address StartLSN State Size Pages Filename 0x0A00020010068650 0000000004A3D010 0x00000000 5 5 S0000001.LOG 0x0A00020010068EB0 0000000004A42010 0x00000000 5 5 S0000002.LOG 0x0A00020010042330 0000000004A47010 0x00000000 5 5 S0000003.LOG 0x0A00020013EE8BB0 0000000004A4C010 0x00000000 5 5 S0000004.LOG 0x0A00020013EE9410 0000000004A51010 0x00000000 5 5 S0000005.LOG
因为有事务hold住日志,导致数据库在再次分配LSN时,此LSN被不能被分配新的LSN,造成日志空间无法使用,解决此问题
如果是uow waiting且已经等待很多天,force application,
$ db2 force application (10409)db2 "force application (10409)"force application (10409)" DB20000I The FORCE APPLICATION command completed successfully. DB21024I This command is asynchronous and may not be effective immediately. $ db2 list applications show detail CONNECT Auth Id Application Name Appl. Application Id Seq# Number of Coordinating DB Coordinator Status Status Change Time DB Name DB Path Handle Agents partition number pid/thread -------------------------------------------------------------------------------------------------------------------------------- -------------------- ---------- -------------------------------------------------------------- ----- ---------- ---------------- --------------- ------------------------------ -------------------------- -------- -------------------- DB2TEST db2fw7 10421 *LOCAL.DB2.130715124733 00001 1 0 9211 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw0 10414 *LOCAL.DB2.130715124726 00001 1 0 7156 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw6 10420 *LOCAL.DB2.130715124732 00001 1 0 8954 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2lused 10413 *LOCAL.DB2.130715124725 00001 1 0 6899 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw5 10419 *LOCAL.DB2.130715124731 00001 1 0 8697 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2wlmd 10412 *LOCAL.DB2.130715124724 00001 1 0 6642 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw4 10418 *LOCAL.DB2.130715124730 00001 1 0 8440 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2taskd 10411 *LOCAL.DB2.130715124723 00001 1 0 6385 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw3 10417 *LOCAL.DB2.130715124729 00001 1 0 8183 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2stmm 10410 *LOCAL.DB2.130715124722 00001 1 0 6128 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw2 10416 *LOCAL.DB2.130715124728 00001 1 0 7670 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2bp 10455 *LOCAL.db2test.130715130654 00003 1 0 9726 UOW Waiting Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2evmg_DB2DETAILDEA 10422 *LOCAL.DB2.130715124734 00001 1 0 9468 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/ DB2TEST db2fw1 10415 *LOCAL.DB2.130715124727 00001 1 0 7413 Connect Completed Not Collected SAMPLE /home/db2test/db2test/NODE0000/SQL00001/
检查hold日志的应用是否还存在
db2 "select db.DBPARTITIONNUM,ai.agent_id, substr(ai.appl_status,1,20) as Status,substr(ai.primary_auth_id,1,10) as Authid,substr(ai.appl_name,1,15) as Appl_Name,int(ap.UOW_LOG_SPACE_USED/1024/1024) as Log_Used_M,int(ap.appl_idle_time/60) as Idle_for_min,ap.appl_con_time as Connected_Since from sysibmadm.snapdb db,sysibmadm.snapappl ap,sysibmadm.snapappl_info ai where ai.agent_id=db.APPL_ID_OLDEST_XACT and ap.agent_id=ai.agent_id" DBPARTITIONNUM AGENT_ID STATUS AUTHID APPL_NAME LOG_USED_M IDLE_FOR_MIN CONNECTED_SINCE -------------- -------------------- -------------------- ---------- --------------- ----------- ------------ -------------------------- 0 record(s) selected.
查看归档日志和活动日志状况
归档日志
$ db2pd -d sample -logs Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 01:06:22 -- Date 07/15/2013 15:53:43 Logs: Current Log Number 5 Pages Written 2 Cur Commit Disk Log Reads 0 Cur Commit Total Log Reads 0 Method 1 Archive Status Success Method 1 Next Log to Archive 5 Method 1 First Failure n/a Method 2 Archive Status n/a Method 2 Next Log to Archive n/a Method 2 First Failure n/a Log Chain ID 0 Current LSN 0x0000000004A53F6B Address StartLSN State Size Pages Filename 0x0A00020013EE9410 0000000004A51010 0x00000000 5 5 S0000005.LOG 0x0A00020010068650 0000000004A56010 0x00000000 5 5 S0000006.LOG 0x0A00020010068EB0 0000000004A5B010 0x00000000 5 5 S0000007.LOG 0x0A00020010042330 0000000004A60010 0x00000000 5 5 S0000008.LOG 0x0A00020013E099B0 0000000004A65010 0x00000000 5 5 S0000009.LOG $ ls /home/db2test/arch_log/db2test/SAMPLE/NODE0000/C0000000/ S0000000.LOG S0000001.LOG S0000002.LOG S0000003.LOG S0000004.LOG 活动日志 $ ls /home/db2test/db2test/NODE0000/SQL00001/SQLOGDIR/ S0000005.LOG S0000006.LOG S0000007.LOG S0000008.LOG S0000009.LOG
(被hold的日志被释放了,LSN继续分配)
查询数据库在窗口1
$ db2 "select count(*) from t1" SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032
因为已经被force掉了,重新连接再查询
$ db2 connect to sample Database Connection Information Database server = DB2/AIX64 9.7.5 SQL authorization ID = DB2TEST Local database alias = SAMPLE $ db2 "select count(*) from t1" 1 ----------- 777 1 record(s) selected.
有一条因为hold住事务,在被杀掉后,回滚了。
结论:在生产系统中,遇到事务日志所在文件系统满的情况微乎其微,但是,遇到日志被hold住的可能却很高,这个时候的表象是文件系统有空间,但是日志使用率很高,此时,就可以考虑看一下是否需要检查一下是否有日志被hold了。