准生产环境oracle数据库事故——一次断电引发的血案

  • 背景:
    过了一个周末,周一上班,连接数据库,发现无法连接,再一查,数据库整个服务都挂了,严重事故啊(准生产环境,项目开发库),多少心存侥幸。。。。。。
    了解后才知道机房周末突然断电了,,
    过程:
    首先登录服务器(linux环境),su 到oracle用户,启动数据库监听服务
lsnrctl start

端口1521,正常启动,监听服务器启动
挂载打开数据库,报错


数据库报错.png

分析处理后目前情况就是:机房突然断电,oracle数据库实例只能mount,不能open,recover,也试了,不生效。。。提示只能在数据库实例处于read/write状态才能恢复(即open状态的实例),直接陷入死循环 。

  • 处理:
SQL> recover datafile '/db/app/oracle/oradata/orcl/system01.dbf';
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

看到这就头大了,咋办recover没用。。。归档没开(没有用RMAN做备份),彻底崩溃
稳一下内心的混乱,冷静思考分析一下:
1.首先,造成事故的原因,外因是断电,内因是应该是在数据库内部做事务提交的时候,由于断电,造成事务未提交成功,造成数据库内部和硬盘记录的不一致,SCN不一致(正常关闭数据库的终止SCN应该和启动SCN相同),导致数据库无法打开,这是oracle的安全机制
2.有什么办法可以打开数据库实例呢?只要能打开,就能先把数据和表弄出来(没做备份,吃哑巴亏,想着不是正式库,就心存侥幸,可恨自己)
之后再网上找某度,搜索半天,看到一篇关于oracle数据库强制open的贴子,讲的还比较清楚,但是是实验环境,哎,心存疑虑,拼了吧,打不开库就和数据(主要是表结构,字段,索引,视图)全丢是一样。

  • 具体操作:
    1.新建控制文件
    使用rm命令删除控制文件
rm /db/app/oracle/oradata/jkdata/control01.ctl
rm /db/app/oracle/oradata/jkdata/control02.ctl

首先生成文本格式的参数文件

SQL> create pfile='/home/oracle/pfile' from spfile;
File created.

使用控制文件创建脚本创建控制文件,脚本内容如下

[oracle@DATA1 ~]$ cat createcontrolfile.sql 
CREATE CONTROLFILE SET DATABASE "orcl" RESETLOGS FORCE LOGGING NOARCHIVELOG
    MAXLOGFILES 5
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 1
    MAXLOGHISTORY 226
LOGFILE
  GROUP 2 '/db/app/oracle/oradata/orcl/redo02.log' SIZE 100M,
  GROUP 3 '/db/app/oracle/oradata/orcl/redo03.log' SIZE 50M
DATAFILE
  '/db/app/oracle/oradata/orcl/system01.dbf',
  '/db/app/oracle/oradata/orcl/sysaux01.dbf',
  '/db/app/oracle/oradata/orcl/undotbs01.dbf',
  '/db/app/oracle/oradata/orcl/users01.dbf',
  '/db/app/oracle/oradata/orcl/kthbdata.dbf'
CHARACTER SET UTF8
;

重建控制文件

SQL> @/home/oracle/createcontrolfile.sql 
Control file created

2.配置隐藏参数
在/home/oracle/pfile中增加几行行参数

_allow_resetlogs_corruption=true
_allow_error_simulation=true
_offline_rollback_segments="_SYSSMU10_3550978943$"
_corrupted_rollback_segments="_SYSSMU10_3550978943$"
_minimum_giga_scn=1

添加

[oracle@DATA1 ~]$ cd $ORACLE_HOME/dbs
[oracle@DATA1 ~]$ vim /home/oracle/pfile 
orcl.__db_cache_size=369098752
orcl.__java_pool_size=16777216
orcl.__large_pool_size=16777216
orcl.__oracle_base='/db/app/oracle'#ORACLE_BASE set from environment
orcl.__pga_aggregate_target=603979776
orcl.__sga_target=1124073472
orcl.__shared_io_pool_size=0
orcl.__shared_pool_size=687865856
orcl.__db_cache_size=369098752
orcl.__java_pool_size=16777216
orcl.__large_pool_size=16777216
orcl.__oracle_base='/db/app/oracle'#ORACLE_BASE set from environment
orcl.__pga_aggregate_target=603979776
orcl.__sga_target=1124073472
orcl.__shared_io_pool_size=0
orcl.__shared_pool_size=687865856
orcl.__db_cache_size=369098752
orcl.__java_pool_size=16777216
orcl.__large_pool_size=16777216
orcl.__oracle_base='/db/app/oracle'#ORACLE_BASE set from environment
orcl.__pga_aggregate_target=603979776
orcl.__sga_target=1124073472
orcl.__shared_io_pool_size=0
orcl.__shared_pool_size=687865856
orcl.__streams_pool_size=0
_allow_resetlogs_corruption=true
_allow_error_simulation=true
_offline_rollback_segments="_SYSSMU10_3550978943$"
_corrupted_rollback_segments="_SYSSMU10_3550978943$"
_minimum_giga_scn=1
*.audit_file_dest='/db/app/oracle/admin/orcl/adump'
*.audit_trail='db'
*.compatible='11.2.0.0.0'
*.control_files='/db/app/oracle/oradata/orcl/control01.ctl','/db/app/oracle/flash_recovery_area/orcl/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='orcl'
*.db_recovery_file_dest='/db/app/oracle/flash_recovery_area'
*.db_recovery_file_dest_size=4070572032
*.diagnostic_dest='/db/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'
*.memory_target=1717567488
*.open_cursors=300
*.processes=1000
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=1105
*.undo_tablespace='UNDOTBS2'
undo_management='manual'

3.强制open数据库
首先指定pfile挂载数据库到mount状态

SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> startup mount pfile='/home/oracle/pfile';
ORACLE instance started.
Total System Global Area   97588504 bytes
Fixed Size                   451864 bytes
Variable Size              33554432 bytes
Database Buffers           62914560 bytes
Redo Buffers                 667648 bytes
Database mounted.

然后recover,这里报错忽略不用关,直接按几下回车跳出

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 897612315 generated at 10/19/2005 16:54:18 needed for thread 1
ORA-00289: suggestion : /opt/oracle/oradata/conner/archive/1_160.dbf
ORA-00280: change 897612315 for thread 1 is in sequence #160
Specify log: {=suggested | filename | AUTO | CANCEL}
....这里快速按回车跳出。不用输入任何东西
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/conner/system01.dbf'
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
Database altered.
SQL> alter database open;
Database altered.

4.拷贝数据
数据库终于处于open状态了,二话不说,迅速登录数据库(我这里用的是navicat),备份表、数据、结构、字段到本地,比较幸运,备份没有报错,具体备份操作我就不说了,可以参考我之前的帖子
https://www.jianshu.com/p/9007058e115a
5.新建数据库
这里也可以参考我之前的帖子
https://www.jianshu.com/p/2e38174e449c
6.新建表空间,用户,并赋予相应权限

create tablespace JKHT datafile '/db/app/oracle/jkdata/jkdata.dbf' size 200m autoextend on next 100m maxsize unlimited;
create user jkdata identified by jkdata123456 default tablespace JK_HT;
grant connect,resource,dba to jkdata;
grant create any view to jkdata;

7.数据恢复
同样是navicat操作,但是视图什么的得重新创建
8.数据备份策略
参考我这篇帖子
https://www.jianshu.com/p/f730fef1ebf8
9.小计
一句话:“”备份、备份、备份重要事说三遍!!!”
其实本来不算大事故,但是由于自己的忽略和心存侥幸,让这事折腾了三天,试了很多办法,好在解决问题,切记,备份!!

你可能感兴趣的:(准生产环境oracle数据库事故——一次断电引发的血案)