默认情况下,数据库crash之后,因为autorestart默认设置为ON,连库时会自动做restart db操作,即会自动做crash recovery;如果设置为OFF的话,连库操作由于不会做crash recovery,数据库不一致,会报错SQL1015N,需要显式地发出restart database命令:
$ db2 "update db cfg for sample using autorestart off"
$ db2 "connect to sample"
$ db2 +c "insert into t1 select * from t1" <--这个事务要足够大,以确保日志缓冲区的数据被刷新到磁盘上了,这才样会有crash recovery.
$ db2_kill <--模仿crash
Application ipclean: Removing DB2 engine and client IPC resources for e105q5a
$ db2start
2017-03-14 08:38:58 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
$ db2 "connect to sample" <--没有做Crash recovery,数据库状态不一致
SQL1015N The database is in an inconsistent state. SQLSTATE=55025
2017-03-14-08.43.04.742052+480 E3152A612 LEVEL: Error
PID : 27852862 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-7 APPID: *LOCAL.e105q5a.170314004304
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:2535
MESSAGE : SQL1015N The database is in an inconsistent state.
DATA #1 : String, 91 bytes
Crash Recovery is needed.
Issue RESTART DATABASE on this node before issuing this command.
$ db2 "restart db sample" <--查看诊断日志的话,会发现有做Crash recovery的操作,完成之后,restart db就会成功。
DB20000I The RESTART DATABASE command completed successfully.
2017-03-14-08.44.29.649628+480 E5852A467 LEVEL: Info
PID : 27852862 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-8 APPID: *LOCAL.e105q5a.170314004429
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:5528
MESSAGE : Crash Recovery is needed.
..
2017-03-14-08.44.29.812231+480 E7352A483 LEVEL: Info
PID : 27852862 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-8 APPID: *LOCAL.e105q5a.170314004429
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, recovery manager, sqlpresr, probe:200
MESSAGE : ADM1530I Crash recovery has been initiated.
..
2017-03-14-08.44.30.992167+480 E26963A492 LEVEL: Info
PID : 27852862 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-8 APPID: *LOCAL.e105q5a.170314004429
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, recovery manager, sqlpresr, probe:3100
MESSAGE : ADM1531I Crash recovery has completed successfully.
$ db2 list applications <--restart db会连接到数据库
Auth Id Application Appl. Application Id DB # of
Name Handle Name Agents
-------- -------------- ---------- -------------------------------------------------------------- -------- -----
E105Q5A db2bp 8 *LOCAL.e105q5a.170314004429 SAMPLE 1
//首先模式了一个crash操作,之后将表空间的容器删掉,则crash recovery会失败:
$ db2 restart db sample <--restart db会失败,失败的原因是表空间TBS1容器被删,无法做Crash recovery
SQL0290N Table space access is not allowed. SQLSTATE=55039
2017-03-14-11.56.28.645793+480 E138213A633 LEVEL: Info
PID : 37093388 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-8 APPID: *LOCAL.e105q5a.170314035628
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbStartPoolsErrorHandling, probe:55
MESSAGE : ADM6023I The table space "TBS1" (ID "6") is in state "0x00000000".
The table space cannot be accessed. Refer to the documentation for
SQLCODE -290.
2017-03-14-11.56.28.646872+480 E138847A1413 LEVEL: Error
PID : 37093388 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-8 APPID: *LOCAL.e105q5a.170314035628
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbStartPools, probe:63
MESSAGE : ADM6049E The database cannot be restarted because one or more table
spaces cannot be brought online. To restart the database specify the
"DROP PENDING TABLESPACES" option on the RESTART DATABASE command.
Putting a table space into the drop pending state means that no
further access to the table space will be allowed. The contents of
the table space will be inaccessible throughout the remainder of the
life of the table space; and the only operation that will be allowed
on that table space is "DROP TABLE SPACE". There is no way in which
it can be brought back. It is important that you consider the
consequences of this action as data can be lost as a result. Before
proceeding consult the DB2 documentation and contact IBM support if
necessary. The table spaces to specify in the DROP PENDING
TABLESPACES list are: "TBS1,".
$ db2 "restart db sample drop pending tablespaces (TBS1)" <--跳过表空间TBS1的crash recovery
DB20000I The RESTART DATABASE command completed successfully.
2017-03-14-11.58.42.738412+480 E175431A535 LEVEL: Warning
PID : 37093388 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-9 APPID: *LOCAL.e105q5a.170314035841
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, recovery manager, sqlpresr, probe:2850
MESSAGE : ADM1533W Database has recovered. However, one or more table spaces
are offline.
2017-03-14-11.58.42.739983+480 E175967A492 LEVEL: Info
PID : 37093388 TID : 3343 PROC : db2sysc 0
INSTANCE: e105q5a NODE : 000 DB : SAMPLE
APPHDL : 0-9 APPID: *LOCAL.e105q5a.170314035841
AUTHID : E105Q5A HOSTNAME: db2b
EDUID : 3343 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, recovery manager, sqlpresr, probe:3100
MESSAGE : ADM1531I Crash recovery has completed successfully.
$ db2 list tablespaces show detail
Tablespaces for Current Database
..
Tablespace ID = 6
Name = TBS1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0xc000
Detailed explanation:
Offline
Drop Pending
同情况3类似,如果某个表上有坏页,crash recovery因为这张表而失败,可以跳过这张表的Crash recovery,成功连库,但该表会被置于drop pending,必须被删除;但这个是隐藏的选项,只有在IBM的支持下才能操作,所以购买IBM的维保服务还是挺重要的。
对于情况3和4,虽然表空间、表处于drop pending的状态,还是可以使用db2dart工具抢救一部分数据的。