昨天下午数据库负载突然升高,最高达每分钟40左右。正常情况下 最高也就每分钟2左右。经过常用的方法检查,也没有找到原因,后来自动好了。为了确保这样的情况不再莫名的发生,需要从更深的层次的找到原因。
今天打开alert日志文件,发现有坏块错误。这时才怀疑是不是由于坏块导致的负载升高,这还有待进一步的确认。不过当前最主要的是把坏块给处理掉。
alert.log错误日志如下:
Hex dump of (file 34, block 43487) in trace file /opt/oracle/admin/mbar/udump/orcl_ora_18143.trc
Corrupt block relative dba: 0x0880a9df (file 34, block 43487)
Fractured block found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x0880a9df
last change scn: 0x09c9.33765b70 seq: 0x2 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x44a90602
check value in block header: 0xf70d
computed block checksum: 0xe59a
Reread of rdba: 0x0880a9df (file 34, block 43487) found same corrupted data
Wed Jun 27 15:11:24 2012
Corrupt Block Found
TSN = 10, TSNAME = BBS_DATA
RFN = 34, BLK = 43487, RDBA = 142649823
OBJN = 97543, BJD = 97543, BJECT = T_USER_LOGON, SUBOBJECT =
SEGMENT WNER = BBS, SEGMENT TYPE = Table Segment
从日志中可以清晰的看出是用户BBS下的BBS_DATA表空间的文件号为34存在坏块,并且表名也给出来了是T_USER_LOGON。
[oracle@db02 ~]$ sqlplus 用户名/密码
SQL>
SELECT COUNT(*) FROM T_USER_LOGON;
执行命令后可以查出来数据,并没有报错。但是执行下面命令后就报错了。
SQL>
SELECT COUNT(*) FROM T_USER_LOGON T WHERE TO_CHAR(T.LOGON_TIME,'YYYY-MM-DD')='2012-06-28';
提示有坏块,为什么加上个where条件就不行,不知道是那里的问题。
以下是解决过程 :
1、根据文件号查找对应的文件:
SQL>
select file#,name from v$datafile where file#=34;
2、使用dbv检查:
[oracle@db02 ~]$
dbv file=/opt/oradata/play03.dbf blocksize=8192
DBVERIFY: Release 10.2.0.1.0 - Production on 星期四 28 10:30:51 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
DBVERIFY - Verification starting : FILE = /opt/oradata/play03.dbf
DBV-00200: Block, dba 30767628, already marked corrupted
DBVERIFY - Verification complete
Total Pages Examined : 4194302
Total Pages Processed (Data) : 3508146
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 605887
Total Pages Failing (Index): 0
Total Pages Processed (Other): 75876
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 4393
Total Pages Marked Corrupt : 1
Total Pages Influx : 0
Highest block SCN : 3781089739 (0.3781089739)
3、其次要确认坏块类型是物理坏块还是逻辑坏块
physical corruption check: backup validate datafile 34;
logical corruption check: backup check logical validate datafile 34;
先进行物理坏块检测
RMAN>
backup validate datafile 34;
Starting backup at 28-6月 -12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=1035 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00034 name=/opt/oradata/play03.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:25
Finished backup at 28-6月 -12
4、查看视图:
SQL>
select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
34 43487 1 0 FRACTURED
确认为物理坏块!
5、
最后也是最重要的一步,进行坏块恢复。注意这一步必须在有备份的情况下执行,否则恢复不了。
RMAN>
BLOCKRECOVER CORRUPTION LIST;
注:
由于我没有备份,所以也不能用这种方式恢复。哭呀!现在终于知道备份的重要性了。所以改用其他方法(exp/imp导出导入)。
6、
恢复步骤:
[oracle@db02 bdump]$ exp 用户名/密码 file='/opt/tlogon-0628.dmp' tables=(T_USER_LOGON)
Export: Release 10.2.0.1.0 - Production on 28 10:08:43 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set
About to export specified tables via Conventional Path ...
. . exporting table T_USER_LOGON
EXP-00056: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 34, block # 43487)
ORA-01110: data file 34: '/opt/oradata/play03.dbf'
Export terminated successfully with warnings.
[oracle@db02 bdump]$
[oracle@db02 bdump]$ sqlplus
用户名/密码
SQL*Plus: Release 10.2.0.1.0 - Production on 星期四 28 10:30:51 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> ALTER SYSTEM SET EVENTS='10231 trace name context forever,level 10';
System altered.
SQL> host
I注:
ALTER SYSTEM SET EVENTS='10231 trace name context forever,level 10' ;
内部事件,设置在全表扫描时跳过损坏的数据块.
[oracle@db02 bdump]$ exp
用户名/密码 file='/opt/tlogon-0628.dmp' tables=(T_USER_LOGON)
Export: Release 10.2.0.1.0 - Production on 星期四 28 10:31:11 2012
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set
About to export specified tables via Conventional Path ...
. . exporting table T_USER_LOGON 768657 rows exported
Export terminated successfully without warnings.
注:这时候成功导出!
然后我们可以drop table,recreate,然后导入数据!
[oracle@db02 bdump]$ exit
SQL> ALTER SYSTEM SET EVENTS='10231 trace name context off';
System altered.
SQL> drop table T_USER_LOGON;
SQL>host
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
[oracle@db02 bdump]$ imp
用户名/密码 file='/opt/tlogon-0628.dmp' ignore=y full=y
7、
完成数据恢复!验证。
[oracle@db02 ~]$ sqlplus 用户名/密码
SQL>
SELECT COUNT(*) FROM T_USER_LOGON;
SQL>
SELECT COUNT(*) FROM T_USER_LOGON T WHERE TO_CHAR(T.LOGON_TIME,'YYYY-MM-DD')='2012-06-28';
都没有报错,正常。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23009281/viewspace-734044/,如需转载,请注明出处,否则将追究法律责任。