在很大程度上,人们所做的选择,源于一种想要构建、保持个人或群体认同的心理动机。
dba最怕什么:最怕数据库起不来,或者数据库数据丢失!
今天,听朋友说自己一套oracle数据库因突然断电起不来,并发来日志:
启动过程报错截图:
alert日志报错:
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/siteps/siteps/trace/siteps_ora_14490.trc:
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Errors in file /u01/app/oracle/diag/rdbms/siteps/siteps/trace/siteps_ora_14490.trc:
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Error 704 happened during db open, shutting down database
USER (ospid: 14490): terminating the instance due to error 704
Instance terminated by USER, pid = 14490
这报错一看,瞬间觉得麻烦了:oracle数据库的引导表启动引导出问题了?
既然问题出现了,就需要解决问题。为了解决这个问题,我们首先要了解这个报错涉及到的原理。
oracle在open阶段,会创建一个BOOTSTRAP$表,该表的结构如下:
SQL> desc BOOTSTRAP$;
Name Null? Type
----------------------------------------- -------- ----------------------------
LINE# NOT NULL NUMBER
OBJ# NOT NULL NUMBER
SQL_TEXT NOT NULL VARCHAR2(4000)
SQL>
在该表中,记录内容如下(由于记录数过多,本处只给出前5条):
SQL> r
1* select * from BOOTSTRAP$ where rownum<=5
LINE# OBJ# SQL_TEXT
---------- ---------- --------------------------------------------------------------------------------
-1 -1 8.0.0.0.0
0 0 CREATE ROLLBACK SEGMENT SYSTEM STORAGE ( INITIAL 112K NEXT 56K MINEXTENTS 1 MAX
EXTENTS 32765 OBJNO 0 EXTENTS (FILE 1 BLOCK 128))
20 20 CREATE TABLE ICOL$("OBJ#" NUMBER NOT NULL,"BO#" NUMBER NOT NULL,"COL#" NUMBER NO
T NULL,"POS#" NUMBER NOT NULL,"SEGCOL#" NUMBER NOT NULL,"SEGCOLLENGTH" NUMBER NO
T NULL,"OFFSET" NUMBER NOT NULL,"INTCOL#" NUMBER NOT NULL,"SPARE1" NUMBER,"SPARE
2" NUMBER,"SPARE3" NUMBER,"SPARE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE
6" DATE) STORAGE ( OBJNO 20 TABNO 4) CLUSTER C_OBJ#(BO#)
42 42 CREATE INDEX I_ICOL1 ON ICOL$(OBJ#) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE (
INITIAL 64K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO
42 EXTENTS (FILE 1 BLOCK 384))
28 28 CREATE TABLE CON$("OWNER#" NUMBER NOT NULL,"NAME" VARCHAR2(30) NOT NULL,"CON#" N
UMBER NOT NULL,"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3" NUMBER,"SPARE4" VARCHAR2
(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) PCTFREE 10 PCTUSED 40 INITRANS 1 M
AXTRANS 255 STORAGE ( INITIAL 64K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 OBJNO 28 EXTENTS (FILE 1 BLOCK 288))
可以看到,BOOTSTRAP$表中涉及到内容的都是oracle数据库后续要用到的基表生成语句。
接下来我通过打开10046事件,去看日志信息:
查看下日志siteps_ora_8435.trac:
WAIT #140366039205736: nam='db file sequential read' ela= 30 file#=1 block#=520 blocks=1 obj#=-1 tim=1565341813593782
=====================
PARSING IN CURSOR #140366039175448 len=188 dep=1 uid=0 oct=1 lid=0 tim=1565341813594930 hv=4006182593 ad='2dde40068' sqlid='32r4f1brckzq1'
create table bootstrap$ (
END OF STMT
PARSE #140366039175448:c=1000,e=994,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1565341813594929
EXEC #140366039175448:c=0,e=266,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=1565341813595275
CLOSE #140366039175448:c=0,e=5,dep=1,type=0,tim=1565341813595356
=====================
PARSING IN CURSOR #140366039175448 len=55 dep=1 uid=0 oct=3 lid=0 tim=1565341813595880 hv=2111436465 ad='2dde3f380' sqlid='6apq2rjyxmxpj'
select line#, sql_text from bootstrap$ where obj# != :1
END OF STMT
PARSE #140366039175448:c=1000,e=505,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1565341813595879
BINDS #140366039175448:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7fa983e74cd0 bln=22 avl=02 flg=05
value=59
EXEC #140366039175448:c=2000,e=43313,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=1565341813639302
WAIT #140366039175448: nam='db file sequential read' ela= 15 file#=1 block#=520 blocks=1 obj#=59 tim=1565341813639416
WAIT #140366039175448: nam='db file scattered read' ela= 32 file#=1 block#=521 blocks=3 obj#=59 tim=1565341813639791
FETCH #140366039175448:c=0,e=565,p=4,cr=5,cu=0,mis=0,r=0,dep=1,og=4,plh=867914364,tim=1565341813639902
STAT #140366039175448 id=1 cnt=0 pid=0 pos=1 obj=59 op='TABLE ACCESS FULL BOOTSTRAP$ (cr=5 pr=4 pw=0 time=569 us)'
*** 2019-08-09 17:10:13.651
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x977B2D8, lmebucp()+24] [flags: 0x0, count: 1]
Incident 26553 created, dump file: /u01/app/oracle/diag/rdbms/siteps/siteps/incident/incdir_26553/siteps_ora_8435_i26553.trc
ORA-07445: exception encountered: core dump [lmebucp()+24] [SIGSEGV] [ADDR:0x0] [PC:0x977B2D8] [Address not mapped to object] []
ssexhd: crashing the process...
Shadow_Core_Dump = partial
ksdbgcra: writing core file to directory '/u01/app/oracle/diag/rdbms/siteps/siteps/cdump'
可以看到,BOOTTRAP$表创建成功了,但是通过table access full表时,在块521执行过程中报错,提示Address not mapped to object。
至此,可以断定是BOOTSTRAP$表内容出现问题,导致无法正常加载里面的内容。
要解决该问题,可以有两种方法:
通过rman备份,通过rman进行恢复
bbed修复
但是很不幸,对方没有备份,那么只能通过bbed进行恢复。
首先,我们必须要知道,11g中bootstrap$表都是从1号数据文件520号块开始,具体我们可以通过查看区信息查看:
SQL> r
1 select segment_name,tablespace_name,extent_id,file_id,block_id,blocks from dba_extents where segment_name='BOOTSTRAP$'
2*
SEGMENT_NAME TABLESPACE_NAME EXTENT_ID FILE_ID BLOCK_ID BLOCKS
------------------------------ ------------------------------ ---------- ---------- ---------- ----------
BOOTSTRAP$ SYSTEM 0 1 520 8
bootstrap$表共占一个区,区开始于510号块。
接下来我们通过查表内容,查看下bootstrap$具体占用几个块。
SQL> select distinct dbms_rowid.rowid_relative_fno(rowid),dbms_rowid.rowid_block_number(rowid) from BOOTSTRAP$;
DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------ ------------------------------------
1 521
1 523
1 522
可以看到,包括前面的520块,可以知道表bootstrap$表共占有520,521,522,523四个块。
重点来了,我知道表bootstrap$的具体块信息后,接着通过bbed工具进行块恢复:
1,从别的同环境数据库中拷贝数据块(操作系统和数据库版本都一致等)
dd if = /u01/app/oracle/data/orcl/system01.dbf of = /tmp/bbed_system.dbf bs=10M count=1
2,配置bbed文件
[oracle@oracle11g oracle]$ cat bbed_1.txt
1 /u01/app/oracle/data/orcl/system01.dbf
20 /u01/app/oracle/tmp/bbed_system.dbf
3,bbed操作
bbed PASSWORD=blockedit mode=edit blocksize=8192 listfile=/u01/app/oracle/bbed_1.txt
BBED> info
BBED> set count 128
BBED> copy file 20 block 520 to file 1 block 520
BBED> copy file 20 block 521 to file 1 block 521
BBED> copy file 20 block 522 to file 1 block 522
BBED> copy file 20 block 523 to file 1 block 523
BBED> sum apply
4,重新启动数据库
[oracle@oracle11g oracle]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Sun Aug 11 22:27:24 2019
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 983449600 bytes
Fixed Size 1340720 bytes
Variable Size 734005968 bytes
Database Buffers 243269632 bytes
Redo Buffers 4833280 bytes
Database mounted.
Database opened.
SQL>
!恭喜,数据库正常起来!至此整个故障得到解决,数据库恢复正常!
欢迎大家关注以下公众号进行数据库方面知识探讨: