数据库坏块是指由于硬件错误等问题导致数据库在写入数据块时出现异常(nologing也可能导致).
通常,我们将坏块分为2类,物理坏块和逻辑坏块。
产生原因:硬件io问题,操作系统问题内存错误,磁盘修复导致例如fsck,oracle本身问题以及nologging操作等等。
Badheader - the beginning of the block (cache header)is corrupt with invalid values
The block is Fractured/Incomplete - headerand footer of the block do not match
tail值
The block checksum is invalid
sum apply
[ora10g@killdb ~]$ dbv file=/home/ora10g/oradata/roger/roger01.dbf
DBVERIFY: Release 10.2.0.5.0 - Production on Sat Apr 6 05:22:48 2013
Copyright (c) 1982, 2007, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /home/ora10g/oradata/roger/roger01.dbf
Page 1948 is influx - most likely media corrupt
Corrupt block relative dba: 0x0080079c (file 2, block 1948)
Fractured block found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x0080079c
last change scn: 0x0000.006bea5f seq: 0x66 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xea5f0602
check value in block header: 0xbf20
computed block checksum: 0x0
//ASM:[oracle@node1 db_1]$ dbv file='+DATA/rac/datafile/users.259.794654745'
要引号
RMAN> backup validate datafile 2; //检测坏块
Starting backup at 06-APR-13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00002 name=/home/ora10g/oradata/roger/roger01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 06-APR-13
要恢复首先要存在Rman的最新备份集,然后执行如下命令:
RMAN>blockrecover datafile 2 block 1948 frombackupset;
该命令执行后即可恢复坏块,并且不会造成数据丢失,但是要求数据库必须要运行在归档模式下,否则RMAN无法发挥作用,而且通过RMAN做过最新的数据库备份
SQL> select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------------------------- ---------
2 1948 1 0 FRACTURED
SQL>desc V$DATABASE_BLOCK_CORRUPTION
Name Null? Type
------------------------------------------------- ----------------------------
FILE# NUMBER
BLOCK# NUMBER
BLOCKS NUMBER
CORRUPTION_CHANGE# NUMBER
CORRUPTION_TYPE VARCHAR2(9)
V$DATABASE_BLOCK_CORRUPTIONdisplays information about database blocks that were corrupted after the lastbackup.这个视图展示了上一次备份后所产生的数据库坏块。
1、exp(有坏块导不出来)
++++++ exp检查
[ora10g@killdb ~]$ exp roger/roger file=a.dmptables=test_corrupt
Export: Release 10.2.0.5.0 - Production onSat Apr 6 06:03:20 2013
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to: Oracle Database 10g EnterpriseEdition Release 10.2.0.5.0 - Production
With the Partitioning, OLAP, Data Miningand Real Application Testing options
Export done in US7ASCII character set andAL16UTF16 NCHAR character set
server uses ZHS16GBK character set(possible charset conversion)
About to export specified tables viaConventional Path ...
. . exporting table TEST_CORRUPT
EXP-00056: ORACLE error1578 encountered
ORA-01578: ORACLE datablock corrupted (file # 2, block # 1948)
ORA-01110: data file 2:'/home/ora10g/oradata/roger/roger01.dbf'
Export terminatedsuccessfully with warnings.
遇到坏块就终止了,这种方式一般不用。
针对以上的提示首先查询那些对象被损坏:
Select tablespace_name,segment_type,owner,segment_name From dba_extentsWhere file_id=4 and 35 between block_id and block_id+blocks-1;
如果被损坏的块是索引,通常可以通过索引重建来解决,如果损坏的是数据(segment_type为table),那么通过设置如下内部事件使得Exp操作跳过坏块。
Alter session set events=’10231 trace name contextforever,level 10’;
然后重新执行导出命令,导出相关的表,然后执行Drop Table命令删除相关表,之后重建表最后导入数据。
2、Analyze命令,但访问的信息不直观
SQL> analyze table T_LOGICAL_CORRUPTION VALIDATESTRUCTURE CASCADE ONLINE;
analyze table T_LOGICAL_CORRUPTION VALIDATESTRUCTURE CASCADE ONLINE
*
ERROR at line 1:
ORA-01499: table/indexcross reference failure - see trace file
看出不坏块的具体信息,要去看trace文件!
Trace到udump去找
ls -ltr|tail -5
3、oracle自己的包——DBMS_REPAIR.CHECK_OBJECT
看实验!
4、bbed
使用bbed恢复时必须有数据文件的拷贝。
bbed就是英文block browse edit的缩写,用来直接查看和修改数据文件数据的一个工具。
在windows和linux上面都有
但在linux下需要编译:
然后把$ORACLE_HOME/rdbms/lib加到环境变量的PATH里面,就可以直接在命令中bbed了。
BBED的缺省口令为blockedit,For Oracle Internal Use only 请谨慎使用Oracle不做技术支持。
[oracle@test oracle]$ cd$ORACLE_HOME/rdbms/lib
[oracle@test lib]$ make -f ins_rdbms.mk$ORACLE_HOME/rdbms/lib/bbed
进入bbed后,可以使用help查看帮助
BBED> help
bbed的详细用法这里不做具体介绍。
【实验】
SQL> create user roger identified by roger;
User created.
SQL> grant connect,resource,dba to roger;
Grant succeeded.
SQL> conn roger/roger
Connected.
SQL> create table tt as select * from dba_objectswhere object_id< 500;
Table created.
SQL> select distinct dbms_rowid.rowid_relative_fno(rowid)rfile# from tt order by 1;
RFILE#
----------
4
SQL> select name from v$datafile where file#=4;
NAME
--------------------------------------------------------------------------------
/u01/oradata2/TSH1/users01.dbf
SQL> select name from v$datafile where rfile#=4;
NAME
--------------------------------------------------------------------------------
/u01/oradata2/TSH1/users01.dbf
SQL> select distinctdbms_rowid.rowid_block_number(rowid) block# from tt order by 1;
BLOCK#
----------
1252
1253
1254 //尝试破坏该块
1255
1256
1257
6 rows selected.
将1254变坏块,再抢救
[oracle@sharqueen ~]$ ddif=/u01/oradata2/TSH1/users01.dbf of=0417.dd skip=1254 bs=8192 count=1 //count=1只拷贝一个块
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000572308seconds, 14.3 MB/s
[oracle@sharqueen ~]$vi 0417.dd
随便写些东西保存
[oracle@sharqueen ~]$ dd if=0417.ddof=/u01/oradata2/TSH1/users01.dbfseek=1254 bs=8192 count=1 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.00041154seconds, 19.9 MB/s
[oracle@sharqueen ~]$ dbvfile=/u01/oradata2/TSH1/users01.dbf
DBVERIFY: Release 10.2.0.1.0 - Productionon Wed Apr 17 21:39:17 2013
Copyright (c) 1982, 2005, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE =/u01/oradata2/TSH1/users01.dbf
Page 1254 is influx -most likely media corrupt
Corrupt block relativedba: 0x010004e6 (file 4, block 1254)
Fractured blockfound during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x010004e6
last change scn: 0x0000.00099af0 seq: 0x1flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x41560539
check value in block header: 0xe4b1
computed block checksum: 0xb5fb
DBVERIFY - Verification complete
Total Pages Examined : 2080
Total Pages Processed (Data) : 904
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 577
Total Pages Failing (Index): 0
Total Pages Processed (Other): 245
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 353
Total Pages Marked Corrupt : 1
Total Pages Influx : 1
Highest block SCN : 629492 (0.629492)
重启下数据库
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 285212672 bytes
Fixed Size 1218992 bytes
Variable Size 100664912 bytes
Database Buffers 180355072 bytes
Redo Buffers 2973696 bytes
SQL> alter database mount;
Database altered.
SQL> alter database open;
Database altered.
SQL> conn roger/roger
Connected.
SQL> select count(1) from tt;
select count(1) from tt
*
ERROR at line 1:
ORA-01578: ORACLE datablock corrupted (file # 4, block # 1254)
ORA-01110: data file 4:'/u01/oradata2/TSH1/users01.dbf'
处理方式:(适用于data block)
1、DBMS_REPAIR.CHECK_OBJECT(SYS用户执行)
SQL>BEGIN
DBMS_REPAIR.ADMIN_TABLES (
TABLE_NAME => 'REPAIR_TABLE',
TABLE_TYPE =>dbms_repair.repair_table,
ACTION =>dbms_repair.create_action,
TABLESPACE => 'USERS');
END;
/
PL/SQL procedure successfully completed.
SQL> set serveroutput on
SQL> DECLARE num_corrupt INT;
2 BEGIN
3 num_corrupt := 0;
4 DBMS_REPAIR.CHECK_OBJECT (
5 SCHEMA_NAME => 'ROGER',
6 OBJECT_NAME => 'TT',
7 REPAIR_TABLE_NAME => 'REPAIR_TABLE',
8 corrupt_count =>num_corrupt);
9 DBMS_OUTPUT.PUT_LINE('number corrupt: ' ||TO_CHAR (num_corrupt));
10 END;
11 /
number corrupt: 1
PL/SQL procedure successfullycompleted.
SQL> set lines 200
SQL> set long 999999
SQL> col CORRUPT_DESCRIPTION for a60
SQL> col object_name for a30
SQL> select OBJECT_ID,RELATIVE_FILE_ID,BLOCK_ID,CORRUPT_TYPE,OBJECT_NAME,CORRUPT_DESCRIPTION from REPAIR_TABLE;
OBJECT_ID RELATIVE_FILE_ID BLOCK_ID CORRUPT_TYPE OBJECT_NAME CORRUPT_DESCRIPTION
---------- ---------------- ---------------------- ------------------------------------------------------------------------------------------
52734 4 1254 6148 TT
SQL> declare
fix_count int;
begin
fix_count := 0;
dbms_repair.fix_corrupt_blocks (
schema_name => 'ROGER',
object_name => 'TT',
object_type =>dbms_repair.table_object,
repair_table_name => 'REPAIR_TABLE',
fix_count =>fix_count);
dbms_output.put_line('fix count: ' ||to_char(fix_count));
end;
/
PL/SQL procedure successfullycompleted.
SQL> select count(1) from roger.tt;
select count(1) from roger.tt
*
ERROR at line 1:
ORA-01578: ORACLE datablock corrupted (file # 4, block # 1254)
ORA-01110: data file 4:'/u01/oradata2/TSH1/users01.dbf'
仍报错!!需要skip。
SQL> begin
dbms_repair.skip_corrupt_blocks(
schema_name => 'ROGER',
object_name => 'TT',
object_type=>dbms_repair.table_object,
flags=>dbms_repair.skip_flag);
end;
/
PL/SQL procedure successfully completed.
SQL> select count(1) from roger.tt;
COUNT(1)
----------
390
SQL> select count(1) from dba_objects whereobject_id<500;
COUNT(1)
----------
476
坏块被跳过了,所以会变少!
skip了坏块,这样repair后,数据会丢失!
2、dbms_rowid
SQL> select object_id from dba_objects where object_name='TT'and owner='ROGER';
OBJECT_ID
----------
52734
SQL> conn roger/roger
Connected.
SQL> create table tt_1 as select * from tt where 1=2; //只复制表结构
Table created.
SQL> select dbms_rowid.rowid_create(1, 52734,4,1254,0) from dual; //52734:object_id;4:数据文件编号;1254:坏块编号
DBMS_ROWID.ROWID_C
------------------
AAAM3+AAEAAAATmAAA
查询坏块后一块的rowid
SQL> select dbms_rowid.rowid_create(1,52734,4,1255,0) from dual;
DBMS_ROWID.ROWID_C
------------------
AAAM3+AAEAAAATnAAA
SQL> insert into tt_1 select * from tt where rowid<CHARTOROWID('AAAM3+AAEAAAATmAAA') or rowid>=CHARTOROWID('AAAM3+AAEAAAATnAAA');
390 rows created.恢复了390行数据
从v$database_block_corruption中可以查处当前数据库中的坏块信息
通过调用dbms_rowid.rowid_create确认出坏块对应的rowid,重新创建表结构相同的表,并以rowid为条件将坏块中的好数据存入到中间表中。
SQL> select count(*) from tt_1;
COUNT(*)
----------
390
或者也可以用SQL语句跳过坏块
使用10231诊断事件,在做全表扫描的时候跳过坏块
SQL> alter session SET EVENTS '10231 trace name contextforever,level 10';
Session altered.–oracle提供的10231:设置在全表扫描时忽略损坏的数据块。只对当前会话有效。
SQL> create table tt_2 as select * from tt where1=1;
Table created.
SQL> select count(*) from tt_2;
COUNT(*)
----------
390
从上面的实验可以看出,这些方法的处理效果是一样的。