pacemaker+drbd双主

简介

  drbd双主模式,我们可以同时访问两个节点上drbd资源,以达到负载均衡的效果。在drbd双主模式下,需要共享集群操作系统,如GFS2或OCFS2,以及分布式锁DLM实现锁。下面我们就来介绍下pacemaker+DRBD主从实现的高可用集群。

注:此文是在博文“pacmaker+drbd主从”为基础环境进行的配置。

配置fence

  GFS2及DLM需要cluster运行fence,因此我们需要为集群配置fence。由于我们虚拟机是在ESXI5.5建立的,我们使用fence-agents-vmware-soap来虚拟fence设备。
1.安装fence-agents-vmware-soap

#pcmk-1 pcmk-2
yum install fence-agents-vmware-soap.x86_64
#相关命令参数
[root@pcmk-1 ~]# fence_vmware_soap -h
Usage:
    fence_vmware_soap [options]
Options:
   -a, --ip=[ip]                  IP address or hostname of fencing device
   -l, --username=[name]          Login name
   -p, --password=[password]      Login password or passphrase
   -z, --ssl                      Use ssl connection
   -t, --notls                    Disable TLS negotiation and force SSL3.0.
                                        This should only be used for devices that do not support TLS1.0 and up.
   -n, --plug=[id]                Physical plug number on device, UUID or
                                        identification of machine
   -u, --ipport=[port]            TCP/UDP port to use
                                        (default 80, 443 if --ssl option is used)
   -4, --inet4-only               Forces agent to use IPv4 addresses only
   -6, --inet6-only               Forces agent to use IPv6 addresses only
   -S, --password-script=[script] Script to run to retrieve password
   --ssl-secure                   Use ssl connection with verifying certificate
   --ssl-insecure                 Use ssl connection without verifying certificate
   -o, --action=[action]          Action: status, reboot (default), off or on
   -v, --verbose                  Verbose mode
   -D, --debug-file=[debugfile]   Debugging to output file
   -V, --version                  Output version information and exit
   -h, --help                     Display this help and exit
   -C, --separator=[char]         Separator for CSV created by 'list' operation
   --power-timeout=[seconds]      Test X seconds for status change after ON/OFF
   --shell-timeout=[seconds]      Wait X seconds for cmd prompt after issuing command
   --login-timeout=[seconds]      Wait X seconds for cmd prompt after login
   --power-wait=[seconds]         Wait X seconds after issuing ON/OFF
   --delay=[seconds]              Wait X seconds before fencing is started
   --retry-on=[attempts]          Count of attempts to retry power on

2.查看设备的uuid

[root@pcmk-1 ~]# fence_vmware_soap -z -l admin -p admin  -a 10.10.10.21 -o list --ssl-insecure|grep pacemaker
pacemaker-test2,4207c8bc-3412-8450-98a1-7a67287f0b39
pacemaker-test1,4207131e-bb72-bc48-d68c-a302664b6abf

其中10.10.10.21是两台虚拟机所在的ESXI虚拟主机或vCenter。
另一种方法是在两台虚拟机上执行以下命令。

dmidecode | grep -i uuid | tr A-Z a-z
uuid: 4207131e-bb72-bc48-d68c-a302664b6abf

3.为cluster添加stonith

[root@pcmk-1 drbd.d]# pcs cluster cib stonith_cfg
[root@pcmk-1 drbd.d]# pcs -f stonith_cfg stonith create fence_vmware fence_vmware_soap  ipaddr=10.10.10.21 ipport=443 ssl_insecure=1 inet4_only=1 login="admin" passwd="admin" action=reboot pcmk_host_map="pcmk-1:4207131e-bb72-bc48-d68c-a302664b6abf;pcmk-2:4207c8bc-3412-8450-98a1-7a67287f0b39" pcmk_host_list="pcmk-1,pcmk-2" pcmk_host_check=static-list  power_wait=3 op monitor interval=60s
[root@pcmk-1 drbd.d]# pcs -f stonith_cfg stonith
 fence_wmware   (stonith:fence_vmware_soap):    Stopped

4.开启stonith
还记得在”corosync+pacemaker高可用”博文中,我们没有配置stonith资源,因此将stonith关闭了,由于我们在此配置了stonith,因此需要将stonith开启。

[root@pcmk-1 drbd.d]# pcs -f stonith_cfg property set stonith-enabled=true
[root@pcmk-1 drbd.d]# pcs -f stonith_cfg property
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: mycluster
 dc-version: 1.1.13-10.el7-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1453774514
 stonith-enabled: tru

5.更新配置文件

[root@pcmk-1 drbd.d]# pcs cluster cib-push stonith_cfg
CIB updated
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Tue Jan 26 14:06:37 2016      Last change: Tue Jan 26 14:06:14 2016 by root via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started pcmk-1
 Master/Slave Set: drbd_data_clone [drbd_data]
     Masters: [ pcmk-1 ]
     Slaves: [ pcmk-2 ]
 drbd_fs    (ocf::heartbeat:Filesystem):    Started pcmk-1
 fence_vmware   (stonith:fence_vmware_soap):    Started pcmk-2

PCSD Status:
  pcmk-1: Online
  pcmk-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

至此我们的fence已经配置完,当出现脑裂的情况下,fence会自动重启cluster中的一台设备来避免脑裂。例如:当关闭pcmk-1的网络时,资源会漂移到pcmk-2上,并且pcmk-1节点会重启。

配置GFS2文件系统

1.安装相关组件

#pcmk-1 pcmk-2
[root@pcmk-1 ~]# yum install -y gfs2-utils dlm lvm2-cluster

2.为cluster配置dlm资源

[root@pcmk-1 drbd.d]# pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true

3.创建并应用GFS2文件系统
(1)停止之前的drbd_fs

[root@pcmk-1 drbd.d]# pcs resource disable drbd_fs
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Wed Jan 27 09:18:42 2016      Last change: Wed Jan 27 09:18:39 2016 by root via crm_resource on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 9 resources configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started pcmk-1
 Master/Slave Set: drbd_data_clone [drbd_data]
     Masters: [ pcmk-1 ]
     Slaves: [ pcmk-2 ]
 drbd_fs    (ocf::heartbeat:Filesystem):    (target-role:Stopped) Stopped

(2)格式化文件系统

[root@pcmk-1 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:drbd /dev/drbd0
It appears to contain an existing filesystem (xfs)
This will destroy any data on /dev/drbd0
Are you sure you want to proceed? [y/n]y
Device:                    /dev/drbd0
Block size:                4096
Device size:               3.50 GB (917467 blocks)
Filesystem size:           3.50 GB (917463 blocks)
Journals:                  2
Resource groups:           15
Locking protocol:          "lock_dlm"
Lock table:                "mycluster:drbd"
UUID:                      b36b86be-9d7b-f49b-17b8-590b82331b03

(3)重新为集群配置drbd_fs

[root@pcmk-1 ~]# pcs resource show drbd_fs
 Resource: drbd_fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd0 directory=/drbd fstype=xfs 
  Meta Attrs: target-role=Stopped 
  Operations: start interval=0s timeout=60 (drbd_fs-start-interval-0s)
              stop interval=0s timeout=60 (drbd_fs-stop-interval-0s)
              monitor interval=20 timeout=40 (drbd_fs-monitor-interval-20)
#需要将fstype=xfs改成fstype=gfs2
[root@pcmk-1 ~]# pcs resource update drbd_fs fstype=gfs2
[root@pcmk-1 ~]# pcs resource show drbd_fs
 Resource: drbd_fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd0 directory=/drbd fstype=gfs2 
  Meta Attrs: target-role=Stopped 
  Operations: start interval=0s timeout=60 (drbd_fs-start-interval-0s)
              stop interval=0s timeout=60 (drbd_fs-stop-interval-0s)
              monitor interval=20 timeout=40 (drbd_fs-monitor-interval-20)

(4)配置约束策略
gfs2需要dlm先启动,因此我们可以如下设置

[root@pcmk-1 ~]# pcs constraint colocation add drbd_fs with dlm-clone INFINITY
[root@pcmk-1 ~]# pcs constraint order dlm-clone then drbd_fs
Adding dlm-clone drbd_fs (kind: Mandatory) (Options: first-action=start then-action=start)

3.克隆ip

[root@pcmk-1 drbd.d]# pcs cluster cib loadbalance_cfg
[root@pcmk-1 drbd.d]# pcs -f loadbalance_cfg resource clone ClusterIP clone-max=2 clone-node-max=2 globally-unique=true
[root@pcmk-1 drbd.d]#  pcs -f loadbalance_cfg resource update ClusterIP clusterip_hash=sourceip
[root@pcmk-1 drbd.d]# pcs cluster cib-push loadbalance_cfg
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Wed Jan 27 10:01:35 2016      Last change: Wed Jan 27 10:01:32 2016 by root via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 8 resources configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

 Master/Slave Set: drbd_data_clone [drbd_data]
     Masters: [ pcmk-1 ]
     Slaves: [ pcmk-2 ]
 drbd_fs    (ocf::heartbeat:Filesystem):    (target-role:Stopped) Stopped
 fence_vmware   (stonith:fence_vmware_soap):    Started pcmk-2
 Clone Set: dlm-clone [dlm]
     Started: [ pcmk-1 pcmk-2 ]
 Clone Set: ClusterIP-clone [ClusterIP] (unique)
     ClusterIP:0    (ocf::heartbeat:IPaddr2):   Started pcmk-1
     ClusterIP:1    (ocf::heartbeat:IPaddr2):   Started pcmk-2

PCSD Status:
  pcmk-1: Online
  pcmk-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

4.克隆文件系统

[root@pcmk-1 drbd.d]# pcs cluster cib active_cfg
[root@pcmk-1 drbd.d]# pcs -f active_cfg resource clone drbd_fs

5.更新配置文件使drbd_data_clone由主从更新为双主

[root@pcmk-1 drbd.d]# pcs -f active_cfg resource update drbd_data_clone master-max=2
[root@pcmk-1 drbd.d]# pcs cluster cib-push active_cfg
CIB updated

6.启动cluster的drbd_fs资源

[root@pcmk-1 drbd.d]# pcs resource enable drbd_fs
[root@pcmk-1 drbd.d]# pcs status
Cluster name: mycluster
Last updated: Wed Jan 27 10:32:55 2016      Last change: Wed Jan 27 10:32:51 2016 by root via crm_resource on pcmk-1
Stack: corosync
Current DC: pcmk-1 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 9 resources configured

Online: [ pcmk-1 pcmk-2 ]

Full list of resources:

 Master/Slave Set: drbd_data_clone [drbd_data]
     Masters: [ pcmk-1 pcmk-2 ]
 fence_vmware   (stonith:fence_vmware_soap):    Started pcmk-2
 Clone Set: dlm-clone [dlm]
     Started: [ pcmk-1 pcmk-2 ]
 Clone Set: ClusterIP-clone [ClusterIP] (unique)
     ClusterIP:0    (ocf::heartbeat:IPaddr2):   Started pcmk-2
     ClusterIP:1    (ocf::heartbeat:IPaddr2):   Started pcmk-1
 Clone Set: drbd_fs-clone [drbd_fs]
     Started: [ pcmk-1 pcmk-2 ]


PCSD Status:
  pcmk-1: Online
  pcmk-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

总结

1.在停止drbd_fs资源,配置完gfs2后,启动drbd_fs。用cat /proc/drbd发现两个drbd资源状态为standalone,无法连接及同步。后来关掉cluster集群,重新分配的drbd并同步才完成。因此在生产环境中配置时一定要注意。
2.在做drbd主从时,需要先将drbd资源配置文件添加“allow-two-primaries yes”
3.选择合适的fence设备
4.一般情况下lvm要配合gfs2+clvm(cluster-lvm2)+dlm来进行集群文件系统的配置,在此我们按照“Pacemaker-1.1-Clusters_from_Scratch-zh-CN”教程进行配置,因此没有用clvm。请参考”https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Global_File_System_2/ch-clustsetup-GFS2.html”
5.本文具体配置参照”http://clusterlabs.org/doc/zh-CN/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html”

你可能感兴趣的:(HA高可用)