环境:
DRBD资源池名称:jcluster
主节点primary
从节点secondary
挂载目录 /data
主要用到的命令:
service drbd start
service drbd stop
service drbd status
service mysqld stop
查看链接状态服务
fuser -m -v /data/
无法umount时kill连接进程
fuser -m -v -k /data/
umount /data/
drbdadm connect jcluster //连接到DRBD资源池
drbdadm disconnect jcluster //断开资源池
drbdadm connect --discard-my-data jcluster //从节点同步主节点的数据,并且discard自己的数据
最近业务服务器重启,使用Pacemaker + DRBD + MySQL实现高可用,重启的时候现场人员启动没有连接心跳线,导致DRBD服务出现脑裂问题,drbd-overview、crm status查看现象,如下:
主节点:
[root@primary ~]# drbd-overview
0:jcluster/0 StandAlone Primary/Unknown UpToDate/DUnknown r----- /data ext4 2.7T 2.0G 2.6T 1%
[root@primary ~]# crm status
Last updated: Thu Mar 30 10:03:18 2017
Last change: Tue Mar 28 10:25:46 2017
Stack: classic openais (with plugin)
Current DC: primary - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
6 Resources configured
Online: [ primary secondary ]
Master/Slave Set: ms_drbd_just [drbd_just]
Masters: [ primary ]
Slaves: [ secondary ]
Resource Group: justcall
fs_just (ocf::heartbeat:Filesystem): Started primary
ip_just (ocf::heartbeat:IPaddr2): Started primary
crond_just (lsb:crond): Started primary
apache_just (ocf::heartbeat:apache): Started primary
Failed actions:
drbd_just_monitor_30000 on secondary 'not running' (7): call=25, status=complete, last-rc-change='Thu Mar 30 10:01:36 2017', queued=0ms, exec=0ms
从节点:
[root@secondary ~]# drbd-overview
0:jcluster/0 StandAlone Secondary/Unknown UpToDate/DUnknown r-----
[root@secondary ~]# crm status
Last updated: Thu Mar 30 10:02:58 2017
Last change: Tue Mar 28 10:25:46 2017
Stack: classic openais (with plugin)
Current DC: primary - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
6 Resources configured
Online: [ primary secondary ]
Master/Slave Set: ms_drbd_just [drbd_just]
Masters: [ primary ]
Slaves: [ secondary ]
Resource Group: justcall
fs_just (ocf::heartbeat:Filesystem): Started primary
ip_just (ocf::heartbeat:IPaddr2): Started primary
crond_just (lsb:crond): Started primary
apache_just (ocf::heartbeat:apache): Started primary
Failed actions:
drbd_just_monitor_30000 on secondary 'not running' (7): call=25, status=complete, last-rc-change='Thu Mar 30 10:01:36 2017', queued=0ms, exec=0ms
两个节点都处于StandAlone状态,主节点StandAlone Primary/Unknown UpToDate/DUnknown,从节点StandAlone Secondary/Unknown UpToDate/DUnknown
这种情况好处理,首先确认心跳线正常连接了,可以互相ping通。
主节点操作,连接到资源池
[root@primary ~]# drbdadm connect jcluster
过5-10分钟左右状态变成了
[root@primary ~]# drbd-overview
0:jcluster/0 WFConnection Primary/Unknown UpToDate/DUnknown C r----- /data ext4 2.7T 2.0G 2.6T 1%
从节点状态,执行drbdadm connect --discard-my-data jcluster命令,从节点同步主节点的数据,并且discard自己的数据
[root@secondary ~]# drbd-overview
0:jcluster/0 StandAlone Secondary/Unknown UpToDate/DUnknown r-----
从节点
[root@secondary ~]#drbdadm connect --discard-my-data jcluster
数据自动同步,数据量不大的话5分钟同步完成
[root@secondary ~]# drbd-overview
0:jcluster/0 SyncTarget Secondary/Primary Inconsistent/UpToDate C r-----
[>....................] sync'ed: 4.1% (4332/4512)M
[root@secondary ~]# drbd-overview
0:jcluster/0 SyncTarget Secondary/Primary Inconsistent/UpToDate C r-----
[=>..................] sync'ed: 10.9% (4024/4512)M
[root@secondary ~]# drbd-overview
0:jcluster/0 SyncTarget Secondary/Primary Inconsistent/UpToDate C r-----
[=>..................] sync'ed: 12.5% (3956/4512)M
[root@secondary ~]# drbd-overview
0:jcluster/0 SyncTarget Secondary/Primary Inconsistent/UpToDate C r-----
[=>..................] sync'ed: 13.4% (3912/4512)M
[root@secondary ~]# drbd-overview
0:jcluster/0 SyncTarget Secondary/Primary Inconsistent/UpToDate C r-----
[=>..................] sync'ed: 14.8% (3848/4512)M
同步后的状态
[root@primary ~]# drbd-overview
0:jcluster/0 Connected Primary/Secondary UpToDate/UpToDate C r----- /data ext4 2.7T 2.0G 2.6T 1%
[root@secondary ~]# drbd-overview
0:jcluster/0 Connected Secondary/Primary UpToDate/UpToDate C r-----
服务都正常了,Failed actions 报错是记录之前的报错。
[root@primary ~]# crm status
Last updated: Thu Mar 30 10:19:56 2017
Last change: Tue Mar 28 10:25:46 2017
Stack: classic openais (with plugin)
Current DC: primary - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
6 Resources configured
Online: [ primary secondary ]
Master/Slave Set: ms_drbd_just [drbd_just]
Masters: [ primary ]
Slaves: [ secondary ]
Resource Group: justcall
fs_just (ocf::heartbeat:Filesystem): Started primary
ip_just (ocf::heartbeat:IPaddr2): Started primary
crond_just (lsb:crond): Started primary
apache_just (ocf::heartbeat:apache): Started primary
Failed actions:
drbd_just_monitor_30000 on secondary 'not running' (7): call=25, status=complete, last-rc-change='Thu Mar 30 10:08:36 2017', queued=0ms, exec=0ms
[root@secondary ~]# crm status
Last updated: Thu Mar 30 10:20:31 2017
Last change: Tue Mar 28 10:25:46 2017
Stack: classic openais (with plugin)
Current DC: primary - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
6 Resources configured
Online: [ primary secondary ]
Master/Slave Set: ms_drbd_just [drbd_just]
Masters: [ primary ]
Slaves: [ secondary ]
Resource Group: justcall
fs_just (ocf::heartbeat:Filesystem): Started primary
ip_just (ocf::heartbeat:IPaddr2): Started primary
crond_just (lsb:crond): Started primary
apache_just (ocf::heartbeat:apache): Started primary
Failed actions:
drbd_just_monitor_30000 on secondary 'not running' (7): call=25, status=complete, last-rc-change='Thu Mar 30 10:08:36 2017', queued=0ms, exec=0ms
参考:
drbd脑裂恢复实例
http://blog.csdn.net/levy_cui/article/details/56484618
一次DRBD脑裂行为的模拟
http://myhat.blog.51cto.com/391263/606318/
drbd脑裂处理
http://itindex.net/detail/50197-drbd
记一次DRBD Unknown故障处理过程
http://koumm.blog.51cto.com/703525/1769112/
linux 下解决umount 时出现的 "Device is busy"问题
http://blog.csdn.net/mzpmzk/article/details/53892956