Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索

 大家或许会遇到一个Cluster中,3个Nodes挂掉两个的情况,这时剩下的一个Node上的DB就会变成Recovery Pending的状态,从而无法访问。AlwaysOn Group及Replica的状态也会变得不正常,显示Resolving状态。这时,如果没有数据的备份,同时挂掉的两个Nodes也恢复不了,而你又需要使处于Recovery Pending的DB恢复成正常可以访问的状态,你会怎么办呢?这里将探索解决方案。

Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索_第1张图片Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索_第2张图片

首先尝试Detach, Take Offline等,不过失败:

Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索_第3张图片

 

Detach或Take Offline时都会报如下错误:

The operation cannot be performed on database "ASRS_F1" because it is involved in a database mirroring session or an availability group. Some operations are not allowed on a database that is participating in a database mirroring session or in an availability group.
ALTER DATABASE statement failed. (Microsoft SQL Server, Error: 1468)

Rename时报如下错误:

Database 'ASRS_F1' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details. (Microsoft SQL Server, Error: 945)

Delete时报如下错误(测试环境下想试试会发生什么,生产环境切勿乱尝试):

The database 'ASRS_F1' is currently joined to an availability group.  Before you can drop the database, you need to remove it from the availatility group. (Microsoft SQL Server, Error: 3752)

强制离线(ALTER DATABASE [ASRS_F1] SET OFFLINE WITH ROLLBACK IMMEDIATE)时会报如下错误:

Msg 1468, Level 16, State 1, Line 1
The operation cannot be performed on database "ASRS_F1" because it is involved in a database mirroring session or an availability group. Some operations are not allowed on a database that is participating in a database mirroring session or in an availability group.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

恢复(RESTORE DATABASE ASRS_F1 WITH RECOVERY)也不行:
Msg 3104, Level 16, State 1, Line 1
RESTORE cannot operate on database 'ASRS_F1' because it is configured for database mirroring or has joined an availability group. If you intend to restore the database, use ALTER DATABASE to remove mirroring or to remove the database from its availability group.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

关闭HADR(ALTER DATABASE ASRS_F1 SET HADR OFF)也不行:
Msg 35220, Level 16, State 1, Line 1
Could not process the operation. AlwaysOn Availability Groups replica manager is waiting for the host computer to start a Windows Server Failover Clustering (WSFC) cluster and join it. Either the local computer is not a cluster node, or the local cluster node is not online. If the computer is a cluster node, wait for it to join the cluster. If the computer is not a cluster node, add the computer to a WSFC cluster. Then, retry the operation.

按照之前大部分的错误提示,将DB从group中移除:

ALTER AVAILABILITY GROUP agASRS REMOVE DATABASE ASRS_F1

也不行:
Msg 35220, Level 16, State 1, Line 1
Could not process the operation. AlwaysOn Availability Groups replica manager is waiting for the host computer to start a Windows Server Failover Clustering (WSFC) cluster and join it. Either the local computer is not a cluster node, or the local cluster node is not online. If the computer is a cluster node, wait for it to join the cluster. If the computer is not a cluster node, add the computer to a WSFC cluster. Then, retry the operation.

又想到Disable AlwaysOn Availablity Groups:

Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索_第4张图片

 不幸的是,点击Apply后,弹出了:

 Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索_第5张图片

 最后OK退出时,发现服务又重启了一下,且并没有弹出错误提示,刷新后再看属性,居然Disable了。

 Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索_第6张图片

 

 再来移除时

ALTER AVAILABILITY GROUP agASRS REMOVE DATABASE ASRS_F1

会报这样的错误:
Msg 35221, Level 16, State 1, Line 1
Could not process the operation. AlwaysOn Availability Groups replica manager is disabled on this instance of SQL Server. Enable AlwaysOn Availability Groups, by using the SQL Server Configuration Manager. Then, restart the SQL Server service, and retry the currently operation. For information about how to enable and disable AlwaysOn Availability Groups, see SQL Server Books Online.

 不管怎样,貌似都无法实现,个人感觉更新相关的系统表或许可行:

SELECT NAME,STATE,state_desc,replica_id,group_database_id FROM SYS.DATABASES

 

但是,更新失败:

Msg 259, Level 16, State 1, Line 1
Ad hoc updates to system catalogs are not allowed.

 针对这个报错,通过sp_configure配置相关的功能,发现还是无法更新……

众多尝试中,我发现有一种可以让当前的DB可用:

1、  Stop SQL Server服务

Cluster中3个Nodes挂掉2个,恢复Recovery Pending的DB的方案探索_第7张图片

2、 将数据文件(mdf、ldf)复制到其他地方,比如另一台Server

3、将数据文件Attach。

 后续希望能找到直接在原有Server上恢复的方案。 

 

你可能感兴趣的:(SQL,Server,HADR)