drbd双主模式,我们可以同时访问两个节点上drbd资源,以达到负载均衡的效果。在drbd双主模式下,需要共享集群操作系统,如GFS2或OCFS2,以及分布式锁DLM实现锁。下面我们就来介绍下pacemaker+DRBD主从实现的高可用集群。
注:此文是在博文“pacmaker+drbd主从”为基础环境进行的配置。
GFS2及DLM需要cluster运行fence,因此我们需要为集群配置fence。由于我们虚拟机是在ESXI5.5建立的,我们使用fence-agents-vmware-soap来虚拟fence设备。
1.安装fence-agents-vmware-soap
#pcmk-1 pcmk-2
yum install fence-agents-vmware-soap.x86_64
#相关命令参数
[root@pcmk-1 ~]# fence_vmware_soap -h
Usage:
fence_vmware_soap [options]
Options:
-a, --ip=[ip] IP address or hostname of fencing device
-l, --username=[name] Login name
-p, --password=[password] Login password or passphrase
-z, --ssl Use ssl connection
-t, --notls Disable TLS negotiation and force SSL3.0.
This should only be used for devices that do not support TLS1.0 and up.
-n, --plug=[id] Physical plug number on device, UUID or
identification of machine
-u, --ipport=[port] TCP/UDP port to use
(default 80, 443 if --ssl option is used)
-4, --inet4-only Forces agent to use IPv4 addresses only
-6, --inet6-only Forces agent to use IPv6 addresses only
-S, --password-script=[script] Script to run to retrieve password
--ssl-secure Use ssl connection with verifying certificate
--ssl-insecure Use ssl connection without verifying certificate
-o, --action=[action] Action: status, reboot (default), off or on
-v, --verbose Verbose mode
-D, --debug-file=[debugfile] Debugging to output file
-V, --version Output version information and exit
-h, --help Display this help and exit
-C, --separator=[char] Separator for CSV created by 'list' operation
--power-timeout=[seconds] Test X seconds for status change after ON/OFF
--shell-timeout=[seconds] Wait X seconds for cmd prompt after issuing command
--login-timeout=[seconds] Wait X seconds for cmd prompt after login
--power-wait=[seconds] Wait X seconds after issuing ON/OFF
--delay=[seconds] Wait X seconds before fencing is started
--retry-on=[attempts] Count of attempts to retry power on
2.查看设备的uuid
[root@pcmk-1 ~]# fence_vmware_soap -z -l admin -p admin -a 10.10.10.21 -o list --ssl-insecure|grep pacemaker
pacemaker-test2,4207c8bc-3412-8450-98a1-7a67287f0b39
pacemaker-test1,4207131e-bb72-bc48-d68c-a302664b6abf
其中10.10.10.21是两台虚拟机所在的ESXI虚拟主机或vCenter。
另一种方法是在两台虚拟机上执行以下命令。
dmidecode | grep -i uuid | tr A-Z a-z
uuid: 4207131e-bb72-bc48-d68c-a302664b6abf
3.为cluster添加stonith
[root@pcmk-1 drbd.d]# pcs cluster cib stonith_cfg
[root@pcmk-1 drbd.d]# pcs -f stonith_cfg stonith create fence_vmware fence_vmware_soap ipaddr=10.10.10.21 ipport=443 ssl_insecure=1 inet4_only=1 login="admin" passwd="admin" action=reboot pcmk_host_map="pcmk-1:4207131e-bb72-bc48-d68c-a302664b6abf;pcmk-2:4207c8bc-3412-8450-98a1-7a67287f0b39" pcmk_host_list="pcmk-1,pcmk-2" pcmk_host_check=static-list power_wait=3 op monitor interval=60s
[root@pcmk-1 drbd.d]# pcs -f stonith_cfg stonith
fence_wmware (stonith:fence_vmware_soap): Stopped
4.开启stonith
还记得在”corosync+pacemaker高可用”博文中,我们没有配置stonith资源,因此将stonith关闭了,由于我们在此配置了stonith,因此需要将stonith开启。
[root@pcmk-1 drbd.d]# pcs -f stonith_cfg property set stonith-enabled=true
[root@pcmk-1 drbd.d]# pcs -f stonith_cfg property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: mycluster
dc-version: 1.1.13-10.el7-44eb2dd
have-watchdog: false
last-lrm-refresh: 1453774514
stonith-enabled: tru
5.更新配置文件
[root@pcmk-1 drbd.d]# pcs cluster cib-push stonith_cfg
CIB updated
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Tue Jan 26 14:06:37 2016 Last change: Tue Jan 26 14:06:14 2016 by root via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
Master/Slave Set: drbd_data_clone [drbd_data]
Masters: [ pcmk-1 ]
Slaves: [ pcmk-2 ]
drbd_fs (ocf::heartbeat:Filesystem): Started pcmk-1
fence_vmware (stonith:fence_vmware_soap): Started pcmk-2
PCSD Status:
pcmk-1: Online
pcmk-2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
至此我们的fence已经配置完,当出现脑裂的情况下,fence会自动重启cluster中的一台设备来避免脑裂。例如:当关闭pcmk-1的网络时,资源会漂移到pcmk-2上,并且pcmk-1节点会重启。
1.安装相关组件
#pcmk-1 pcmk-2
[root@pcmk-1 ~]# yum install -y gfs2-utils dlm lvm2-cluster
2.为cluster配置dlm资源
[root@pcmk-1 drbd.d]# pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
3.创建并应用GFS2文件系统
(1)停止之前的drbd_fs
[root@pcmk-1 drbd.d]# pcs resource disable drbd_fs
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Wed Jan 27 09:18:42 2016 Last change: Wed Jan 27 09:18:39 2016 by root via crm_resource on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 9 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
Master/Slave Set: drbd_data_clone [drbd_data]
Masters: [ pcmk-1 ]
Slaves: [ pcmk-2 ]
drbd_fs (ocf::heartbeat:Filesystem): (target-role:Stopped) Stopped
(2)格式化文件系统
[root@pcmk-1 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:drbd /dev/drbd0
It appears to contain an existing filesystem (xfs)
This will destroy any data on /dev/drbd0
Are you sure you want to proceed? [y/n]y
Device: /dev/drbd0
Block size: 4096
Device size: 3.50 GB (917467 blocks)
Filesystem size: 3.50 GB (917463 blocks)
Journals: 2
Resource groups: 15
Locking protocol: "lock_dlm"
Lock table: "mycluster:drbd"
UUID: b36b86be-9d7b-f49b-17b8-590b82331b03
(3)重新为集群配置drbd_fs
[root@pcmk-1 ~]# pcs resource show drbd_fs
Resource: drbd_fs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/drbd fstype=xfs
Meta Attrs: target-role=Stopped
Operations: start interval=0s timeout=60 (drbd_fs-start-interval-0s)
stop interval=0s timeout=60 (drbd_fs-stop-interval-0s)
monitor interval=20 timeout=40 (drbd_fs-monitor-interval-20)
#需要将fstype=xfs改成fstype=gfs2
[root@pcmk-1 ~]# pcs resource update drbd_fs fstype=gfs2
[root@pcmk-1 ~]# pcs resource show drbd_fs
Resource: drbd_fs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/drbd fstype=gfs2
Meta Attrs: target-role=Stopped
Operations: start interval=0s timeout=60 (drbd_fs-start-interval-0s)
stop interval=0s timeout=60 (drbd_fs-stop-interval-0s)
monitor interval=20 timeout=40 (drbd_fs-monitor-interval-20)
(4)配置约束策略
gfs2需要dlm先启动,因此我们可以如下设置
[root@pcmk-1 ~]# pcs constraint colocation add drbd_fs with dlm-clone INFINITY
[root@pcmk-1 ~]# pcs constraint order dlm-clone then drbd_fs
Adding dlm-clone drbd_fs (kind: Mandatory) (Options: first-action=start then-action=start)
3.克隆ip
[root@pcmk-1 drbd.d]# pcs cluster cib loadbalance_cfg
[root@pcmk-1 drbd.d]# pcs -f loadbalance_cfg resource clone ClusterIP clone-max=2 clone-node-max=2 globally-unique=true
[root@pcmk-1 drbd.d]# pcs -f loadbalance_cfg resource update ClusterIP clusterip_hash=sourceip
[root@pcmk-1 drbd.d]# pcs cluster cib-push loadbalance_cfg
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Wed Jan 27 10:01:35 2016 Last change: Wed Jan 27 10:01:32 2016 by root via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 8 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
Master/Slave Set: drbd_data_clone [drbd_data]
Masters: [ pcmk-1 ]
Slaves: [ pcmk-2 ]
drbd_fs (ocf::heartbeat:Filesystem): (target-role:Stopped) Stopped
fence_vmware (stonith:fence_vmware_soap): Started pcmk-2
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: ClusterIP-clone [ClusterIP] (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started pcmk-1
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started pcmk-2
PCSD Status:
pcmk-1: Online
pcmk-2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
4.克隆文件系统
[root@pcmk-1 drbd.d]# pcs cluster cib active_cfg
[root@pcmk-1 drbd.d]# pcs -f active_cfg resource clone drbd_fs
5.更新配置文件使drbd_data_clone由主从更新为双主
[root@pcmk-1 drbd.d]# pcs -f active_cfg resource update drbd_data_clone master-max=2
[root@pcmk-1 drbd.d]# pcs cluster cib-push active_cfg
CIB updated
6.启动cluster的drbd_fs资源
[root@pcmk-1 drbd.d]# pcs resource enable drbd_fs
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Wed Jan 27 10:32:55 2016 Last change: Wed Jan 27 10:32:51 2016 by root via crm_resource on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 9 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
Master/Slave Set: drbd_data_clone [drbd_data]
Masters: [ pcmk-1 pcmk-2 ]
fence_vmware (stonith:fence_vmware_soap): Started pcmk-2
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: ClusterIP-clone [ClusterIP] (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started pcmk-2
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started pcmk-1
Clone Set: drbd_fs-clone [drbd_fs]
Started: [ pcmk-1 pcmk-2 ]
PCSD Status:
pcmk-1: Online
pcmk-2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
1.在停止drbd_fs资源,配置完gfs2后,启动drbd_fs。用cat /proc/drbd发现两个drbd资源状态为standalone,无法连接及同步。后来关掉cluster集群,重新分配的drbd并同步才完成。因此在生产环境中配置时一定要注意。
2.在做drbd主从时,需要先将drbd资源配置文件添加“allow-two-primaries yes”
3.选择合适的fence设备
4.一般情况下lvm要配合gfs2+clvm(cluster-lvm2)+dlm来进行集群文件系统的配置,在此我们按照“Pacemaker-1.1-Clusters_from_Scratch-zh-CN”教程进行配置,因此没有用clvm。请参考”https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Global_File_System_2/ch-clustsetup-GFS2.html”
5.本文具体配置参照”http://clusterlabs.org/doc/zh-CN/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html”