DRBD+Pacemaker详解

一、DRBD定义

  • DRBD的全称为:Distributed Replicated Block Device (DRBD)分布式块设备复制,DRBD是由内核模块和相关脚本而构成,用以构建高可用性的集群。其实现方式是通过网络来镜像整个设备。它允许用户在远程机器上建立一个本地块设备的实时镜像。与心跳连接结合使用,也可以把它看作是一种网络RAID。DRBD因为是工作在系统内核空间,而不是用户空间,它直接复制的是二进制数据,这是它速度快的根本原因。

  • 一个DRBD系统由两个以上节点构成,与HA集群类似,也有主用节点和备用节点之分,在带有主要设备的节点上,应用程序和操作系统可以运行和访问DRBD设备。

  • 在主节点写入的数据通过drbd设备存储到主节点的磁盘设备中,同时,这个数据也会自动发送到备用节点相应的drbd设备,最终写入备用节点的磁盘设备中,在备用节点上,drbd只是将数据从drbd设备写入到备用节点的磁盘设备中。

  • DRBD在数据进入Buffer Cache时,先经过DRBD这一层,复制一份数据经过TCP/IP协议封装,发送到另一个节点上,另一个节点通过TCP/IP协议来接受复制过来的数据,同步到次节点的DRBD设备上。

  • DRBD内核中工作模型:

wKioL1NXfvCzD4vXAAH_MAq4llI473.jpg

DRBD是有资源组成:

  • resource name:可以使用除空白字符以外的任意ACSII表中的字符。

  • drbd设备:drbd的设备的访问路径;设备文件/dev/drbd#。

  • disk:各节点为组成此drbd设备所提供的块设备。

  • 网络属性:节点间为了实现跨主机的磁盘镜像而使用的网络配置。

DRBD的复制模式:

协议A(异步) 数据一旦写入磁盘并发送到本地TCP/IP协议栈中就认为完成了写入操作。
协议B(半同步) 收到对方接收确认即发送到对方的TCP/IP协议栈就认为完成了写入操作。
协议C(同步) 等待对方写入完成后并返回确认信息才认为完成写入操作。

脑裂(split brain)自动修复方法:

  • 丢弃比较新的主节点的所做的修改:

    在这种模式下,当网络重新建立连接并且发现了裂脑,DRBD就会丢弃自切换到主节点后所修改的数据。

  • 丢弃老的主节点所做的修改:

    在这种模式下,DRBD将丢弃首先切换到主节点后所修改的数据。

  • 丢弃修改比较少的主节点的修改:

    在这种模式下,DRBD会检查两个节点的数据,然后丢弃修改比较少的主机上的节点。

  • 一个节点数据没有发生变化的完美的修复裂脑:

    在这种模式下,如果其中一台主机的在发生裂脑时数据没有发生修改,则可简单的完美的修复并声明已经解决裂脑问题。

了解基本概念后可以来安装配置DRBD了。

二、安装

由于在Linux 2.6.33以后的版本中,drbd已经集成到内核中;而之前的版本只能通过两种方法安装:

1、对内核打补丁;编译该模块

2、安装rpm包(要与内核完全匹配)

准备工作:

#下载与内核版本一致的内核rpm包和用户空间工具包
[root@node1 drbd]# ls
drbd-8.4.3-33.el6.x86_64.rpm  drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm
[root@node1 drbd]# uname -r
2.6.32-431.el6.x86_64
                                                                                                                                                                                                                                                                                                                                                                                                              
#两个测试节点
node1.soul.com    192.168.0.111
node2.soul.com    192.168.0.112
                                                                                                                                                                                                                                                                                                                                                                                                        
#保证时间同步;双机互信
[root@node1 ~]# ssh node2 'date';date
Wed Apr 23 18:13:30 CST 2014
Wed Apr 23 18:13:30 CST 2014
[root@node1 ~]#
Wed Apr 23 18:11:12 CST 2014

安装:

[root@node1 drbd]# rpm -ivh drbd-8.4.3-33.el6.x86_64.rpm drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm
warning: drbd-8.4.3-33.el6.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 66534c2b: NOKEY
Preparing...                ########################################### [100%]
   1:drbd-kmdl-2.6.32-431.el########################################### [ 50%]
   2:drbd                   ########################################### [100%]
[root@node1 drbd]# scp *.rpm node2:/root/drbd/
drbd-8.4.3-33.el6.x86_64.rpm                                     100%  283KB 283.3KB/s   00:00  
drbd-kmdl-2.6.32-431.el6-8.4.3-33.el6.x86_64.rpm                 100%  145KB 145.2KB/s   00:00  
[root@node1 drbd]# ssh node2 "rpm -ivh /root/drbd/*.rpm"
warning: /root/drbd/drbd-8.4.3-33.el6.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 66534c2b: NOKEY
Preparing...                ##################################################
drbd-kmdl-2.6.32-431.el6    ##################################################
drbd                        ##################################################
[root@node1 drbd]#
#两个节点都安装相同的包

查看配置文件

[root@node1 ~]# cd /etc/drbd.d/
[root@node1 drbd.d]# ls
global_common.conf
[root@node1 drbd.d]#
[root@node1 drbd.d]# vim global_common.conf
global {
        usage-count no;    #在可以访问互联网的情况下drbd可以统计使用数据
        # minor-count dialog-refresh disable-ip-verification
}
common {
        handlers {    #处理器
                # These are EXAMPLE handlers only.
                # They may have severe implications,
                # like hard resetting the node under certain circumstances.
                # Be careful when chosing your poison.
                #启动下面这三项:
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }
        startup {    #启动时执行脚本
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        }
        options {
                # cpu-mask on-no-data-accessible
        }
        disk {    #drbd设备
                # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
                # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
                on-io-error     detach;    #当io发生错误时直接拆掉磁盘
        }
        net {
                # protocol timeout max-epoch-size max-buffers unplug-watermark
                # connect-int ping-int sndbuf-size rcvbuf-size ko-count
                # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
                # after-sb-1pri after-sb-2pri always-asbp rr-conflict
                # ping-timeout data-integrity-alg tcp-cork on-congestion
                # congestion-fill congestion-extents csums-alg verify-alg
                # use-rle
                cram-hmac-alg    "sha1";
                shared-secret     "node.soul.com";    #建议使用随机数值
                protocol    C;    #复制协议
        }
syncer {
                rate    1000M;    #传输速率
        }
}

三、配置使用

提供一个磁盘分区;或单独一个磁盘;两节点需大小一致

[root@node1 ~]# fdisk -l /dev/sda
Disk /dev/sda: 128.8 GB, 128849018880 bytes
255 heads, 63 sectors/track, 15665 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00076a20
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          26      204800   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              26        7859    62914560   8e  Linux LVM
/dev/sda3            7859        8512     5252256   83  Linux    #该分区作为drbd资源
#node2也是如此

定义资源

[root@node1 ~]# vim /etc/drbd.d/web.res
resource web {
        on node1.soul.com {
                device  /dev/drbd0;
                disk    /dev/sda3;
                address 192.168.0.111:7789;
                meta-disk       internal;
        }
        on node2.soul.com {
                device  /dev/drbd0;
                disk    /dev/sda3;
                address 192.168.0.112:7789;
                meta-disk       internal;
        }
}
                                                                                                                                                                                                                                                                                                                                                       
#复制到node2上一份
[root@node1 ~]# scp /etc/drbd.d/web.res node2:/etc/drbd.d/
web.res                                                                100%  247     0.2KB/s   00:00  
[root@node1 ~]# scp /etc/drbd.d/global_common.conf node2:/etc/drbd.d/
global_common.conf                                                     100% 1945     1.9KB/s   00:00  
[root@node1 ~]#

初始化资源并启动服务

[root@node1 ~]# drbdadm create-md web
Writing meta data...
initializing activity log
NOT initializing bitmap
New drbd meta data block successfully created.
#node2同样执行
                                                                                                                                                                                                                                                                                                              
[root@node1 ~]# service drbd start
Starting DRBD resources: [
     create res: web
   prepare disk: web
    adjust disk: web
     adjust net: web
]
..........
***************************************************************
 DRBD's startup script waits for the peer node(s) to appear.
 - In case this node was already a degraded cluster before the
   reboot the timeout is 0 seconds. [degr-wfc-timeout]
 - If the peer was available before the reboot the timeout will
   expire after 0 seconds. [wfc-timeout]
   (These values are for resource 'web'; 0 sec -> wait forever)
 To abort waiting enter 'yes' [  11]:     #等待node2启动,node2启动即可
.
[root@node1 ~]
                                                                                                                                                                                                                                                                                                            
#查看状态
[root@node1 ~]# drbd-overview
  0:web/0  Connected Secondary/Secondary Inconsistent/Inconsistent C r-----
[root@node1 ~]#

手动将其中一个节点提升为主节点

[root@node1 ~]# drbdadm primary --force web
[root@node1 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-11-29 12:28:00
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:321400 nr:0 dw:0 dr:329376 al:0 bm:19 lo:0 pe:1 ua:8 ap:0 ep:1 wo:f oos:4931544
    [>...................] sync'ed:  6.3% (4812/5128)M
    finish: 0:02:33 speed: 32,048 (32,048) K/sec
#显示正常同步进行
#显示的Primary/Secondary左边为自己;右边为对方节点
                                                                                                                                                                                                                                                                                           
[root@node2 ~]# drbd-overview
  0:web/0  Connected Secondary/Primary UpToDate/UpToDate C r-----
#node2上显示

手动做主从切换

#因为同时只能有一个为主;所以切换时;需要先将主的降级;才能提升
#且只有主的才能格式化被挂载使用
node1
[root@node1 ~]# drbdadm secondary web
[root@node1 ~]# drbd-overview
  0:web/0  Connected Secondary/Secondary UpToDate/UpToDate C r-----
                                                                                                                                                                                                                                                                          
node2
[root@node2 ~]# drbdadm primary web
[root@node2 ~]# drbd-overview
  0:web/0  Connected Primary/Secondary UpToDate/UpToDate C r-----
                                                                                                                                                                                                                                                                          
#格式化挂载
[root@node2 ~]# mke2fs -t ext4 /dev/drbd0
This filesystem will be automatically checked every 35 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[root@node2 ~]#
[root@node2 ~]# mount /dev/drbd0 /mnt/
[root@node2 ~]# ls /mnt/
lost+found
#挂载成功  
#如果要为node1为主;这边需要先卸载降级

四、与pacemaker结合使用自动角色转换

[root@node1 ~]# drbdadm secondary web
[root@node1 ~]# drbd-overview
  0:web/0  Connected Secondary/Secondary UpToDate/UpToDate C r-----
                                                                                                                                                                          
[root@node1 ~]# service drbd stop
Stopping all DRBD resources: .
[root@node1 ~]# ssh node2 'service drbd stop'
Stopping all DRBD resources: .
[root@node1 ~]#
[root@node1 ~]# chkconfig drbd off
[root@node1 ~]# ssh node2 'chkconfig drbd off'
#卸载降级关闭服务;且关闭开启自动启动。两个节点相同操作

配置pacemaker

#具体配置就不详细介绍了;上一篇介绍的有
[root@node1 ~]# crm status
Last updated: Wed Apr 23 21:00:11 2014
Last change: Wed Apr 23 18:59:58 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1.soul.com node2.soul.com ]
                                                                                                                                                                 
crm(live)configure# show
node node1.soul.com
node node2.soul.com
property $id="cib-bootstrap-options" \
    dc-version="1.1.10-14.el6-368c726" \
    cluster-infrastructure="classic openais (with plugin)" \
    expected-quorum-votes="2" \
    stonith-enabled="false" \
    no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
    resource-stickiness="100"
                                                                                                                                                                 
#配置好直接查看相关信息

配置drbd为高可用资源

#drbd的资源代理是有linbit提供
crm(live)ra# info ocf:linbit:drbd
Parameters (* denotes required, [] the default):
drbd_resource* (string): drbd resource name
    The name of the drbd resource from the drbd.conf file.
drbdconf (string, [/etc/drbd.conf]): Path to drbd.conf
    Full path to the drbd.conf file.
....
#查看详细信息
#定义资源
crm(live)configure# primitive webdrbd ocf:linbit:drbd params drbd_resource=web op monitor role=Master interval=40s timeout=30s op monitor role=Slave interval=60s timeout=30s op start timeout=240s op stop timeout=100s
crm(live)configure# verify
crm(live)configure# master MS_webdrbd webdrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)configure# verify
crm(live)configure# show
node node1.soul.com
node node2.soul.com
primitive webdrbd ocf:linbit:drbd \
    params drbd_resource="web" \
    op monitor role="Master" interval="40s" timeout="30s" \
    op monitor role="Slave" interval="60s" timeout="30s" \
    op start timeout="240s" interval="0" \
    op stop timeout="100s" interval="0"
ms MS_webdrbd webdrbd \
    meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)configure# commit

定义完成后可以查看当前状态

crm(live)# status
Last updated: Wed Apr 23 21:39:49 2014
Last change: Wed Apr 23 21:39:32 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node1.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
2 Resources configured
Online: [ node1.soul.com node2.soul.com ]
 Master/Slave Set: MS_webdrbd [webdrbd]
     Masters: [ node1.soul.com ]
     Slaves: [ node2.soul.com ]
                                                                                                                                 
#可以让node1下线测试
crm(live)# node standby node1.soul.com
crm(live)# status
Last updated: Wed Apr 23 21:44:31 2014
Last change: Wed Apr 23 21:44:27 2014 via crm_attribute on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node1.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
2 Resources configured
Node node1.soul.com: standby
Online: [ node2.soul.com ]
 Master/Slave Set: MS_webdrbd [webdrbd]
     Masters: [ node2.soul.com ]
     Stopped: [ node1.soul.com ]
#可以看出node2自动变成主节点了

五、定义webfs服务共享drbd资源

crm(live)configure# primitive webfs ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/var/www/html" fstype="ext4" op monitor interval=30s timeout=40s on-fail=restart op start timeout=60s op stop timeout=60s
crm(live)configure# verify
crm(live)configure# colocation webfs_with_MS_webdrbd_M inf: webfs MS_webdrbd:Master
crm(live)configure# verify
crm(live)configure# order webfs_after_MS_webdrbd_M inf: MS_webdrbd:promote webfs:start
crm(live)configure# verify
crm(live)configure# commit
                                                                                                              
crm(live)# status
Last updated: Wed Apr 23 21:57:23 2014
Last change: Wed Apr 23 21:57:11 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node1.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ node1.soul.com node2.soul.com ]
 Master/Slave Set: MS_webdrbd [webdrbd]
     Masters: [ node2.soul.com ]
     Slaves: [ node1.soul.com ]
 webfs  (ocf::heartbeat:Filesystem):    Started node2.soul.com
[root@node2 ~]# ls /var/www/html/
issue  lost+found
#在node2上测试挂载成功
                                                                                                              
#测试转移
crm(live)# node standby node2.soul.com
crm(live)# status
Last updated: Wed Apr 23 21:59:00 2014
Last change: Wed Apr 23 21:58:51 2014 via crm_attribute on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node1.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Node node2.soul.com: standby
Online: [ node1.soul.com ]
 Master/Slave Set: MS_webdrbd [webdrbd]
     Masters: [ node1.soul.com ]
     Stopped: [ node2.soul.com ]
 webfs  (ocf::heartbeat:Filesystem):    Started node1.soul.com
                                                                                                              
[root@node1 ~]# ls /var/www/html/
issue  lost+found
[root@node1 ~]#
#测试转移成功

六、配置webip和webserver实现web服务高可用

crm(live)# configure primitive webip ocf:heartbeat:IPaddr params ip="192.168.0.222" op monitor it=30s on-fail=restart
crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=30s on-fail=restart
crm(live)configure# group webcluster MS_webdrbd  webip webfs webserver
INFO: resource references in colocation:webfs_with_MS_webdrbd_M updated
INFO: resource references in order:webfs_after_MS_webdrbd_M updated
crm(live)configure# show
group webcluster webip webfs webserver
ms MS_webdrbd webdrbd \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation webfs_with_MS_webdrbd_M inf: webcluster MS_webdrbd:Master
order webfs_after_MS_webdrbd_M inf: MS_webdrbd:promote webcluster:start
#查看信息正符合要求
            
crm(live)# status
Last updated: Wed Apr 23 23:02:26 2014
Last change: Wed Apr 23 22:55:36 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node1.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
5 Resources configured
Node node2.soul.com: standby
Online: [ node1.soul.com ]
 Master/Slave Set: MS_webdrbd [webdrbd]
     Masters: [ node1.soul.com ]
     Stopped: [ node2.soul.com ]
 Resource Group: webcluster
     webip  (ocf::heartbeat:IPaddr):    Started node1.soul.com
     webfs  (ocf::heartbeat:Filesystem):    Started node1.soul.com
     webserver  (lsb:httpd):    Started node1.soul.com
            
#转移测试
crm(live)# node standby node1.soul.com
crm(live)# status
Last updated: Wed Apr 23 23:03:18 2014
Last change: Wed Apr 23 23:03:15 2014 via crm_attribute on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node1.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
5 Resources configured
Node node1.soul.com: standby
Online: [ node2.soul.com ]
 Master/Slave Set: MS_webdrbd [webdrbd]
     Masters: [ node2.soul.com ]
     Stopped: [ node1.soul.com ]
 Resource Group: webcluster
     webip  (ocf::heartbeat:IPaddr):    Started node2.soul.com
     webfs  (ocf::heartbeat:Filesystem):    Started node2.soul.com
     webserver  (lsb:httpd):    Started node2.soul.com

到此;drbd基础与pacemaker实现自动转移以完成。



如有错误;恳请指正。


你可能感兴趣的:(drbd,pacemaker,corosync)