问题现象:
最近遇到过一个DB2的问题,db2 10.5 ,文件系统为 EXT3,在对一个自动存储的表空间发出 alter tablespace reduce max命令后,表空间状态一直为 0x10080000 ,即 Move in Progress + DMS Rebalance in Progress, 会持续很长一段时间。这段期间,后台 reduce 线程会一直 holding住一个latch,名子为 SQLO_LT_SQLB_POOL_CB__readLotch, 而其他需要这个latch的应用(比如mon_get_database)会一直等着这个Latch.
遗憾的是只重现了一次,后续无论如何也不能再重现(正常表空间状态是0x00080000)。开了Case问IBM,IBM以数据不足(其实stack写的很明白了,IBM非要两遍Stack)为理由没能给出解释。
TBS1表空间缩容之前
db2inst1@node01:~> db2pd -d sample -tab
Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:01:04 -- Date 2020-06-03-09.11.35.827486
Tablespace Configuration:
Address Id Type Content PageSz ExtentSz Auto Prefetch BufID BufIDDisk FSC NumCntrs MaxStripe LastConsecPg RSE Name
0x00007FD7E2253D40 0 DMS Regular 4096 4 Yes 4 1 1 Def 1 0 3 Yes SYSCATSPACE
0x00007FD7E2260EE0 1 SMS SysTmp 4096 32 Yes 32 1 1 On 1 0 31 No TEMPSPACE1
0x00007FD7E22A40C0 2 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes USERSPACE1
0x00007FD7E22B1260 3 DMS Large 4096 4 Yes 4 1 1 Def 1 0 3 Yes SYSTOOLSPACE
0x00007FD7E226E080 4 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes TBS1
0x00007FD7E227B220 5 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes TBS2
0x00007FD7E22883C0 6 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes TBS3
Tablespace Statistics:
Address Id TotalPgs UsablePgs UsedPgs PndFreePgs FreePgs HWM Max HWM State MinRecTime NQuiescers PathsDropped TrackmodState
0x00007FD7E2253D40 0 32768 32764 29420 0 3344 29420 29420 0x00000000 0 0 No n/a
0x00007FD7E2260EE0 1 1 1 1 0 0 - - 0x00000000 0 0 No n/a
0x00007FD7E22A40C0 2 1196032 1196000 96 820480 375424 820576 820576 0x00000000 0 0 No n/a
0x00007FD7E22B1260 3 8192 8188 152 0 8036 152 152 0x00000000 0 0 No n/a
0x00007FD7E226E080 4 9715712 9715680 6919008 1302656 1494016 9715040 9715040 0x00000000 0 0 No n/a
0x00007FD7E227B220 5 8192 8160 160 64 7936 224 224 0x00000000 0 0 No n/a
0x00007FD7E22883C0 6 8192 8160 160 64 7936 288 288 0x00000000 0 0 No n/a
Tablespace Autoresize Statistics:
Address Id AS AR InitSize IncSize IIP MaxSize LastResize LRF
0x00007FD7E2253D40 0 Yes Yes 33554432 -1 No None None No
0x00007FD7E2260EE0 1 Yes No 0 0 No 0 None No
0x00007FD7E22A40C0 2 Yes Yes 33554432 -1 No None None No
0x00007FD7E22B1260 3 Yes Yes 33554432 -1 No None None No
0x00007FD7E226E080 4 Yes Yes 33554432 -1 No None None No
0x00007FD7E227B220 5 Yes Yes 33554432 -1 No None None No
0x00007FD7E22883C0 6 Yes Yes 33554432 -1 No None None No
Tablespace Storage Statistics:
Address Id DataTag Rebalance SGID SourceSGID
0x00007FD7E2253D40 0 0 No 0 -
0x00007FD7E2260EE0 1 0 No 0 -
0x00007FD7E22A40C0 2 -1 No 0 -
0x00007FD7E22B1260 3 -1 No 0 -
0x00007FD7E226E080 4 -1 No 0 -
0x00007FD7E227B220 5 -1 No 0 -
0x00007FD7E22883C0 6 -1 No 0 -
Containers:
Address TspId ContainNum Type TotalPgs UseablePgs PathID StripeSet Container
0x00007FD7E224D8A0 0 0 File 32768 32764 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000000/C0000000.CAT
0x00007FD7E22BF000 1 0 Path 1 1 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000001/C0000000.TMP
0x00007FD7E222D980 2 0 File 1196032 1196000 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000002/C0000000.LRG
0x00007FD7E222E8C0 3 0 File 8192 8188 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000003/C0000000.LRG
0x00007FD7E21F2B80 4 0 File 9715712 9715680 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000004/C0000000.LRG
0x00007FD7E21ECBA0 5 0 File 8192 8160 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000005/C0000000.LRG
0x00007FD7E21E6BA0 6 0 File 8192 8160 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000006/C0000000.LRG
发出缩容命令之后,查看表空间状态一直为0x10080000(持续了至少40s+):
db2inst1@node01:~> date; db2 "alter tablespace tbs1 reduce max"; date
Wed Jun 3 09:19:12 EDT 2020
DB20000I The SQL command completed successfully.
Wed Jun 3 09:19:13 EDT 2020
db2inst1@node01:~> db2pd -d sample -tab
Database Member 0 -- Database SAMPLE -- Active -- Up 0 days 00:09:23 -- Date 2020-06-03-09.19.54.216004
Tablespace Configuration:
Address Id Type Content PageSz ExtentSz Auto Prefetch BufID BufIDDisk FSC NumCntrs MaxStripe LastConsecPg RSE Name
0x00007FD7E2253D40 0 DMS Regular 4096 4 Yes 4 1 1 Def 1 0 3 Yes SYSCATSPACE
0x00007FD7E2260EE0 1 SMS SysTmp 4096 32 Yes 32 1 1 On 1 0 31 No TEMPSPACE1
0x00007FD7E22A40C0 2 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes USERSPACE1
0x00007FD7E22B1260 3 DMS Large 4096 4 Yes 4 1 1 Def 1 0 3 Yes SYSTOOLSPACE
0x00007FD7E226E080 4 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes TBS1
0x00007FD7E227B220 5 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes TBS2
0x00007FD7E22883C0 6 DMS Large 4096 32 Yes 32 1 1 Def 1 0 31 Yes TBS3
Tablespace Statistics:
Address Id TotalPgs UsablePgs UsedPgs PndFreePgs FreePgs HWM Max HWM State MinRecTime NQuiescers PathsDropped TrackmodState
0x00007FD7E2253D40 0 32768 32764 29448 0 3316 29448 29448 0x00000000 0 0 No n/a
0x00007FD7E2260EE0 1 1 1 1 0 0 - - 0x00000000 0 0 No n/a
0x00007FD7E22A40C0 2 1196032 1196000 96 820480 375424 820576 820576 0x00000000 0 0 No n/a
0x00007FD7E22B1260 3 8192 8188 152 0 8036 152 152 0x00000000 0 0 No n/a
0x00007FD7E226E080 4 192 160 96 0 64 96 9715040 0x10080000 0 0 No n/a
0x00007FD7E227B220 5 8192 8160 96 64 8000 160 224 0x00000000 0 0 No n/a
0x00007FD7E22883C0 6 8192 8160 96 64 8000 288 288 0x00000000 0 0 No n/a
Tablespace Autoresize Statistics:
Address Id AS AR InitSize IncSize IIP MaxSize LastResize LRF
0x00007FD7E2253D40 0 Yes Yes 33554432 -1 No None None No
0x00007FD7E2260EE0 1 Yes No 0 0 No 0 None No
0x00007FD7E22A40C0 2 Yes Yes 33554432 -1 No None None No
0x00007FD7E22B1260 3 Yes Yes 33554432 -1 No None None No
0x00007FD7E226E080 4 Yes Yes 33554432 -1 No None None No
0x00007FD7E227B220 5 Yes Yes 33554432 -1 No None None No
0x00007FD7E22883C0 6 Yes Yes 33554432 -1 No None None No
Tablespace Storage Statistics:
Address Id DataTag Rebalance SGID SourceSGID
0x00007FD7E2253D40 0 0 No 0 -
0x00007FD7E2260EE0 1 0 No 0 -
0x00007FD7E22A40C0 2 -1 No 0 -
0x00007FD7E22B1260 3 -1 No 0 -
0x00007FD7E226E080 4 -1 Yes 0 -
0x00007FD7E227B220 5 -1 No 0 -
0x00007FD7E22883C0 6 -1 No 0 -
Containers:
Address TspId ContainNum Type TotalPgs UseablePgs PathID StripeSet Container
0x00007FD7E224D8A0 0 0 File 32768 32764 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000000/C0000000.CAT
0x00007FD7E22BF000 1 0 Path 1 1 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000001/C0000000.TMP
0x00007FD7E222D980 2 0 File 1196032 1196000 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000002/C0000000.LRG
0x00007FD7E222E8C0 3 0 File 8192 8188 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000003/C0000000.LRG
0x00007FD8364334C0 4 0 File 192 160 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000004/C0000000.LRG
0x00007FD7E21ECBA0 5 0 File 8192 8160 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000005/C0000000.LRG
0x00007FD7E21E6BA0 6 0 File 8192 8160 0 0 /home/db2inst1/db2inst1/NODE0000/SAMPLE/T0000006/C0000000.LRG
此时如果运行监控命令就会Hang住,比如db2 -v "select TBSP_ID,substr(TBSP_NAME,1,20) as TBSP_NAME, substr(TBSP_STATE,1,10) as STATE, TBSP_TOTAL_SIZE_KB/1024/1024 as TOTAL_GB,(TBSP_PAGE_TOP*TBSP_PAGE_SIZE)/1024/1024/1024 as top_GB,TBSP_USED_SIZE_KB/1024/1024 as USED_GB,TBSP_UTILIZATION_PERCENT from sysibmadm.tbsp_utilization where TBSP_NAME='TBS1'"
有Latch等待现象, SQLO_LT_SQLB_POOL_CB__readLotch
Latch信息
Database Member 0 -- Active -- Up 0 days 00:12:00 -- Date 2020-06-03-09.21.07.837897
Latches:
Address Holder Waiter Filename LOC LatchType HoldCount
0x00000002018D0470 14 0 Unknown 1391 SQLO_LT_sqeWLDispatcher__m_tunerLatch 1
0x00007FD7E07156F0 84 0 Unknown 3095 SQLO_LT_SQLB_PTBL__pool_table_latch 1
0x00000002025C6C08 84 0 Unknown 2460 SQLO_LT_sqlmon_dbcb__inSnapshotLatch 1
0x00007FD7E224CEE0 86 84 Unknown 5598 SQLO_LT_SQLB_POOL_CB__readLotch 1
0x00000002025C7C28 86 0 Unknown 14789 SQLO_LT_preventSuspendIOLotch 1
0x00007FD7E224CF60 86 0 Unknown 1157 SQLO_LT_SQLB_POOL_CB__ptfLotch 1
Latch Holder是在做表空间的 rebalance:
-----FUNC-ADDR---- ------FUNCTION + OFFSET------
0x00007FD8DEB12D05 _Z25ossDumpStackTraceInternalmR11OSSTrapFileiP7siginfoPvmm + 0x0385
(/home/db2inst1/sqllib/lib64/libdb2osse.so.1)
0x00007FD8DEB1290C ossDumpStackTraceV98 + 0x002c
(/home/db2inst1/sqllib/lib64/libdb2osse.so.1)
0x00007FD8DEB0E9AD _ZN11OSSTrapFile6dumpExEmiP7siginfoPvm + 0x00fd
(/home/db2inst1/sqllib/lib64/libdb2osse.so.1)
0x00007FD8E517B004 sqlo_trce + 0x0404
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E52057FE sqloDumpDiagInfoHandler + 0x010e
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8EA907850 address: 0x00007FD8EA907850 ; dladdress: 0x00007FD8EA8F8000 ; offset in lib: 0x000000000000F850 ;
(/lib64/libpthread.so.0)
0x00007FD8DDE83807 ftruncate + 0x0007
(/lib64/libc.so.6)
0x00007FD8E318F1B8 sqloSetFileSize + 0x0408
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E4162386 _Z21sqlbPostRebalanceWorkP12SQLB_POOL_CBPK9SQLP_LSN8P12SQLB_GLOBALSbb + 0x14b6
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E414F516 address: 0x00007FD8E414F516 ; dladdress: 0x00007FD8DF039000 ; offset in lib: 0x0000000005116516 ;
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E02EC097 _Z16sqlbAlterPoolActtP9SQLP_LSN8P12SQLB_GLOBALS + 0x1767
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E0410186 address: 0x00007FD8E0410186 ; dladdress: 0x00007FD8DF039000 ; offset in lib: 0x00000000013D7186 ;
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E040F985 _Z8sqldmpndP8sqeAgentiPcP9SQLP_LSN8PmP15SQLD_RECOV_INFO + 0x05a5
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E31EB1FB _Z8sqlptpplP8sqeAgenti + 0x02bb
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E31DE886 _Z8sqlpxcm1P8sqeAgentP15SQLXA_CALL_INFOi + 0x0be6
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E33A0801 _Z12sqlrrcom_dpsP8sqlrr_cbiiP15SQLXA_CALL_INFO + 0x0251
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E339DD64 _Z8sqlrrcomP8sqlrr_cbii + 0x0374
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E40FF828 _Z22sqlbEMReduceContainersP12SQLB_POOL_CBjP9sqeBsuEdu + 0x04a8
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E40FFF6A _Z22sqlbLockAndMoveExtentsP12SQLB_POOL_CBbjP9sqeBsuEdu + 0x044a
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E410265C _Z28sqlbExtentMovementEntryPointP9sqeBsuEduPv + 0x012c
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E21A8A58 _Z26sqleIndCoordProcessRequestP8sqeAgent + 0x0f98
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E21B7E7C _ZN8sqeAgent6RunEDUEv + 0x04cc
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E39F0897 _ZN9sqzEDUObj9EDUDriverEv + 0x00f7
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E31A0A83 sqloEDUEntry + 0x0303
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8EA8FF806 address: 0x00007FD8EA8FF806 ; dladdress: 0x00007FD8EA8F8000 ; offset in lib: 0x0000000000007806 ;
(/lib64/libpthread.so.0)
0x00007FD8DDE8864D clone + 0x006d
(/lib64/libc.so.6)
Holding Latch type: (SQLO_LT_SQLB_POOL_CB__readLotch) - Address: (0x7fd7e224cee0), Line: 5598, File: /view/db2_v105fp8_linuxamd64_s160901/vbs/engn/include/sqlbistorage_inlines.h HoldCount:
1
Holding Latch type: (SQLO_LT_preventSuspendIOLotch) - Address: (0x2025c7c28), Line: 14789, File: sqlbpool.C HoldCount: 1
Holding Latch type: (SQLO_LT_SQLB_POOL_CB__ptfLotch) - Address: (0x7fd7e224cf60), Line: 1157, File: sqlbptf.C HoldCount: 1
Latch waiter就是监控语句:
-----FUNC-ADDR---- ------FUNCTION + OFFSET------
0x00007FD8DEB12D05 _Z25ossDumpStackTraceInternalmR11OSSTrapFileiP7siginfoPvmm + 0x0385
(/home/db2inst1/sqllib/lib64/libdb2osse.so.1)
0x00007FD8DEB1290C ossDumpStackTraceV98 + 0x002c
(/home/db2inst1/sqllib/lib64/libdb2osse.so.1)
0x00007FD8DEB0E9AD _ZN11OSSTrapFile6dumpExEmiP7siginfoPvm + 0x00fd
(/home/db2inst1/sqllib/lib64/libdb2osse.so.1)
0x00007FD8E517B004 sqlo_trce + 0x0404
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E52057FE sqloDumpDiagInfoHandler + 0x010e
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8EA907850 address: 0x00007FD8EA907850 ; dladdress: 0x00007FD8EA8F8000 ; offset in lib: 0x000000000000F850 ;
(/lib64/libpthread.so.0)
0x00007FD8DDE89F07 semop + 0x0007
(/lib64/libc.so.6)
0x00007FD8E61CD659 _ZN17SQLO_SLATCH_CAS6418getConflictComplexEm + 0x0579
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E31034B1 _ZN17SQLO_SLATCH_CAS6411getConflictEm + 0x0051
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E007803C _ZN12sqlpValLotch12getLatchOnlyEmPKcm + 0x011c
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E007619E _Z21sqlbLatchPoolR_inlineP12SQLB_POOL_CBibiPKc + 0x0e4e
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E40F279E _Z30sqlbSnapshotTablespaceEstimatejP16sqeLocalDatabaseb + 0x01ae
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E48DB494 _Z12sqlmonszagntj13sqm_entity_idP6sqlmaiPjP5sqlca + 0x0534
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E4928DBF _Z15sqlmonszbackendP12SQLE_DB2RA_T + 0x093f
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E2194972 _Z8sqlesrvrP14db2UCinterface + 0x1542
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E21B400E _Z19sqleMappingFnServerP5sqldaP5sqlca + 0x04de
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E23DADE7 _Z19sqlerKnownProcedureiPcPiP5sqldaS2_P13sqlerFmpTableP8sqeAgentP5sqlca + 0x0247
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E23DCBFD _Z11sqlerCallDLP14db2UCinterfaceP9UCstpInfo + 0x06dd
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E614A97E _Z19sqljs_ddm_excsqlsttP14db2UCinterfaceP13sqljDDMObject + 0x09be
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E6149A0E _Z21sqljsParseRdbAccessedP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x007e
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E24C537B _Z10sqljsParseP13sqljsDrdaAsCbP14db2UCinterfaceP8sqeAgentb + 0x036b
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E24BF97F address: 0x00007FD8E24BF97F ; dladdress: 0x00007FD8DF039000 ; offset in lib: 0x000000000348697F ;
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E24BDDFC address: 0x00007FD8E24BDDFC ; dladdress: 0x00007FD8DF039000 ; offset in lib: 0x0000000003484DFC ;
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E24BAE49 address: 0x00007FD8E24BAE49 ; dladdress: 0x00007FD8DF039000 ; offset in lib: 0x0000000003481E49 ;
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E24BAA3B _Z17sqljsDrdaAsDriverP18SQLCC_INITSTRUCT_T + 0x00eb
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E21B847F _ZN8sqeAgent6RunEDUEv + 0x0acf
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E39F0897 _ZN9sqzEDUObj9EDUDriverEv + 0x00f7
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8E31A0A83 sqloEDUEntry + 0x0303
(/home/db2inst1/sqllib/lib64/libdb2e.so.1)
0x00007FD8EA8FF806 address: 0x00007FD8EA8FF806 ; dladdress: 0x00007FD8EA8F8000 ; offset in lib: 0x0000000000007806 ;
(/lib64/libpthread.so.0)
0x00007FD8DDE8864D clone + 0x006d
(/lib64/libc.so.6)
Waiting on latch type: (SQLO_LT_SQLB_POOL_CB__readLotch) - Address: (0x7fd7e224cee0), Line: 2498, File: sqlbdbmon.C
Holding Latch type: (SQLO_LT_SQLB_PTBL__pool_table_latch) - Address: (0x7fd7e07156f0), Line: 3095, File: /view/db2_v105fp8_linuxamd64_s160901/vbs/engn/include/sqlbistorage_inlines.h HoldCou
nt: 1
Holding Latch type: (SQLO_LT_sqlmon_dbcb__inSnapshotLatch) - Address: (0x2025c6c08), Line: 2460, File: sqlbdbmon.C HoldCount: 1