legehappy

nfs+DRBD+corosync+pacemaker 实现高可用（ha）的nfs集群

nfs+DRBD+corosync+pacemaker 实现高可用（ha）的nfs集群（centos7）

摘要：

环境介绍

借助pcs安装与配置corosync和pacemaker（pcs只是一个管理工具）

DRBD安装配置参考之前的博客《DRBD-MYSQL分布式块设备实现高可用》

http://legehappy.blog.51cto.com/13251607/1975804

Nfs的安装和配置

Crmsh安装和资源管理

六、测试

环境介绍：

基于上一篇Corosync+pacemaker+DRBD+mysql（mariadb）实现高可用（ha）的mysql集群（centos7）博客：http://legehappy.blog.51cto.com/13251607/1976251，突然想到nfs也可以基于这种架构解决单点故障问题，nfs+DRBD+corosync+pacemaker这种架构可实现nfs的多点高可用集群。

系统版本信息：

[root@cml1 ~]# cat/etc/redhat-release

CentOS Linux release 7.2.1511 (Core)

[root@cml2 ~]# cat /etc/redhat-release

CentOS Linux release 7.2.1511 (Core)

主机对应关系：

node1：cml1：192.168.5.101

node2：cml2：192.168.5.102

client：cml3：192.168.5.104

配置集群的前提：

（1）时间同步

[root@cml1~]# ntpdate cn.pool.ntp.org

[root@cml2~]# ntpdate cn.pool.ntp.org

（2）主机名互相访问

[root@cml1~]# ssh-keygen

[root@cml1~]# ssh-copy-id cml2

[root@cml1~]# hostname

cml1

[root@cml1~]# cat /etc/hosts

192.168.5.101 cml1 www.cml1.com

192.168.5.102 cml2 www.cml2.com

192.168.5.104 cml3 www.cml3.com

192.168.5.105 cml4 www.cml4.com

（3）是否使用仲裁设备。

Centos7上面不需要使用

二、借助pcs安装与配置corosync和pacemaker（pcs只是一个管理工具）

1、在两节点上执行：

[root@cml1 ~]# yum install -y pacemaker pcspsmisc policycoreutils-python

2、两节点上启动pcs并且开机启动：

[root@cml1 ~]# systemctl start pcsd.service
[root@cml1 ~]# systemctl enablepcsd.service

3、两节点上修改用户hacluster的密码（用户已经固定不可以改变）

[root@cml1 ~]# echo redhat | passwd --stdin hacluster

4、注册pcs集群主机（默认注册使用用户名hacluster，和密码）：

[root@cml1 corosync]# pcs cluster auth cml25pxl2    ##设置注册那个集群节点
cml1: Already authorized
cml2: Already authorized

5、在集群上注册两台集群：

[root@cml1 corosync]# pcs cluster setup--name mycluster cml1 cml2 --force。  ##设置集群

6、接下来就在某个节点上已经生成来corosync配置文件：

[root@cml1 corosync]# ls

corosync.conf corosync.conf.example corosync.conf.example.udpu corosync.xml.example uidgid.d

#我们看到生成来corosync.conf配置文件：

7、我们看一下注册进来的文件：

[root@cml1 corosync]# cat corosync.conf
totem {
   version: 2
   secauth: off
   cluster_name: webcluster
   transport: udpu
}
 
nodelist {
   node {
       ring0_addr: cml1
       nodeid: 1
    }
 
   node {
       ring0_addr: cml2
       nodeid: 2
    }
}
 
quorum {
   provider: corosync_votequorum
   two_node: 1
}
 
logging {
   to_logfile: yes
   logfile: /var/log/cluster/corosync.log
   to_syslog: yes
}

8、启动集群：

[root@cml1 corosync]# pcs cluster start--all
cml1: Starting Cluster...
cml2: Starting Cluster...
##相当于启动来pacemaker和corosync:
[root@cml1 corosync]# ps -ef | grepcorosync
root     57490      1  1 21:47 ?        00:00:52 corosync
root     75893  51813  0 23:12 pts/0    00:00:00 grep --color=auto corosync
[root@cml1 corosync]# ps -ef | greppacemaker
root     57502      1  0 21:47 ?        00:00:00 /usr/sbin/pacemakerd -f
haclust+ 57503  57502  0 21:47 ?        00:00:03 /usr/libexec/pacemaker/cib
root     57504  57502  0 21:47 ?        00:00:00/usr/libexec/pacemaker/stonithd
root     57505  57502  0 21:47 ?        00:00:01 /usr/libexec/pacemaker/lrmd
haclust+ 57506  57502  0 21:47 ?        00:00:01 /usr/libexec/pacemaker/attrd
haclust+ 57507  57502  0 21:47 ?        00:00:00 /usr/libexec/pacemaker/pengine
haclust+ 57508  57502  0 21:47 ?        00:00:01 /usr/libexec/pacemaker/crmd
root     75938  51813  0 23:12 pts/0    00:00:00 grep --color=auto pacemaker

8、查看集群的状态（显示为no faults就是ok）

[root@cml1 corosync]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
       id    = 192.168.5.101
       status     = ring 0 active with no faults
[root@cml1 corosync]# ssh cml2corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
       id    = 192.168.5.102
       status     = ring 0 active with no faults

10、可以查看集群是否有错：

[root@cml1 corosync]# crm_verify -L -V
  error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
   error:unpack_resources:     Either configuresome or disable STONITH with the stonith-enabled option
  error: unpack_resources:     NOTE:Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

##因为我们没有配置STONITH设备，所以我们下面要关闭

11、关闭STONITH设备：

[root@cml1 corosync]# pcs property setstonith-enabled=false
[root@cml1 corosync]# crm_verify -L -V
[root@cml1 corosync]# pcs property list
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: mycluster
 dc-version: 1.1.16-12.el7_4.2-94ff4df
 have-watchdog: false
 stonith-enabled: false

三、DRBD安装配置

参考之前的博客《DRBD-MYSQL分布式块设备实现高可用》http://legehappy.blog.51cto.com/13251607/1975804

[root@cml1 drbd.d]# cat nfs.res
resource nfs {
protocol C;
meta-disk internal;
device /dev/drbd1;
syncer {
verify-alg sha1;
}
net {
allow-two-primaries;
}
on cml1 {
disk /dev/sdb1;
address 192.168.5.101:7789;
}
on cml2 {
disk /dev/sdb1;
address 192.168.5.102:7789;
}
}

四、nfs安装与配置：

##在node1和node2服务器上配置nfs服务：

[root@cml1 ~]# yum install  nfs-utils -y
[root@cml1 ~]# systemctl enable nfs-server
[root@cml1 ~]# systemctl start nfs-server
[root@cml1 ~]# systemctl start rpcbind
[root@cml1 ~]# systemctl enable rpcbind

##创建挂载点：

[root@cml1 ~]# cat /etc/exports
/nfs_data 192.168.5.0/24(rw,sync)
[root@cml1 ~]# mkdir /nfs_data
[root@cml2 ~]# cat /etc/exports
/nfs_data 192.168.5.0/24(rw,sync)
[root@cml2 ~]# mkdir /nfs_data
[root@cml1 ~]# systemctl restart nfs-server
[root@cml2 ~]# systemctl restart nfs-server
##测试查看过载目录：
[root@cml3 ~]# showmount -e 192.168.5.101
Export list for 192.168.5.101:
/nfs_data 192.168.5.0/24
[root@cml3 ~]# showmount -e 192.168.5.102
Export list for 192.168.5.102:
/nfs_data 192.168.5.0/24

五、Crmsh安装和资源管理

1、安装crmsh：

集群我们可以下载安装crmsh来操作(从github来下载，然后解压直接安装)：只在一个节点安装即可。（但最好选择两节点上安装这样测试时方便点）

[root@cml1 ~]# cd /usr/local/src/
You have new mail in /var/spool/mail/root
[root@cml1 src]# ls
nginx-1.12.0         php-5.5.38.tar.gz
crmsh-2.3.2.tar  nginx-1.12.0.tar.gz  zabbix-3.2.7.tar.gz
[root@cml1 src]# tar -xf crmsh-2.3.2.tar
[root@cml1 crmsh-2.3.2]# python setup.pyinstall

2、用crmsh来管理：

[root@cml1 ~]# crm help

Help overview for crmsh

Available topics:

Overview Help overview forcrmsh

Topics Available topics

Description Program description

CommandLine Command lineoptions

Introduction Introduction

Interface User interface

Completion Tab completion

Shorthand Shorthand syntax

Features Features

Shadows Shadow CIB usage

Checks Configurationsemantic checks

Templates Configurationtemplates

Testing Resource testing

Security Access ControlLists (ACL)

Resourcesets Syntax: Resourcesets

AttributeListReferences Syntax: Attribute list references

AttributeReferences Syntax: Attribute references

RuleExpressions Syntax: Rule expressions

Lifetime Lifetime parameterformat

Reference Command reference

3、借助crm管理工具配置DRBD+nfs+corosync+pacemaker高可用集群：

##先停掉nfs、drbd服务

[root@cml1 ~]# systemctl stop nfs-server
[root@cml1 ~]# systemctl stop drbd
[root@cml2 ~]# systemctl stop nfs-server
[root@cml2 ~]# systemctl stop drbd
 
[root@cml1 ~]# crm
crm(live)# status
Stack: corosync
Current DC: cml1 (version1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Thu Oct 26 08:52:49 2017
Last change: Thu Oct 26 08:51:45 2017 byroot via cibadmin on cml1
 
2 nodes configured
0 resources configured
 
Online: [ cml1 cml2 ]
 
No resources
crm(live)configure# propertystonith-enabled=false
crm(live)configure# propertyno-quorum-policy=ignore
crm(live)configure# property migration-limit=1   ###表示服务抢占一次不成功就给另一个节点接管
crm(live)# configure
crm(live)configure#  primitive nfsdrbd ocf:linbit:drbd paramsdrbd_resource=nfs op start timeout=240 op stop timeout=100 op monitorrole=Master interval=20
crm(live)configure#  ms ms_mysqldrbd nfsdrbd meta master-max=1master-node-max=1 clone-max=2 clone-node-max=1 notify=true
crm(live)configure# verify

2、添加挂载资源：

crm(live)configure# primitive mystoreocf:heartbeat:Filesystem params device=/dev/drbd1 directory=/nfs_datafstype=ext4 op start timeout=60 op stop timeout=60
crm(live)configure# colocationmystore_with_ms_nfsdrbd inf: mystore ms_mysqldrbd:Master
crm(live)configure# orderms_mysqld_befor_mystore Mandatory: ms_mysqldrbd mystore
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Stack: corosync
Current DC: cml1 (version1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Thu Oct 26 21:08:41 2017
Last change: Thu Oct 26 21:08:38 2017 byroot via cibadmin on cml1
 
2 nodes configured
3 resources configured
 
Online: [ cml1 cml2 ]
 
Full list of resources:
 
 Master/Slave Set: ms_nfsdrbd [nfsdrbd]
    Masters: [ cml2 ]
    Slaves: [ cml1 ]
 mystore (ocf::heartbeat:Filesystem):  Started cml2
[root@cml2 ~]# df -TH
Filesystem              Type      Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs        19G 6.7G   13G  36% /
devtmpfs                devtmpfs  501M    0  501M   0% /dev
tmpfs                   tmpfs     512M 278M  234M  55% /dev/shm
tmpfs                   tmpfs     512M  27M  486M   6% /run
tmpfs                   tmpfs     512M    0 512M   0% /sys/fs/cgroup
/dev/sda1               xfs       521M 161M  361M  31% /boot
tmpfs                   tmpfs     103M    0  103M   0% /run/user/0
/dev/drbd1              ext4       11G  69M  9.9G   1% /nfs_data

3、添加nfs_server

crm(live)configure# primitive nfs_serversystemd:nfs-server op start timeout=100 interval=0 op stop timeout=100interval=0
crm(live)configure# verify
crm(live)configure#  colocation nfs_server_with_mystore inf:nfs_server mystore
crm(live)configure# order mystore_befor_nfsMandatory: mystore nfs_server
crm(live)configure# show
node 1: cml1 \
       attributesstandby=off
node 2: cml2 \
       attributesstandby=off
primitive mystore Filesystem \
       paramsdevice="/dev/drbd1" directory="/nfs_data" fstype=ext4 \
       opstart timeout=60 interval=0 \
       opstop timeout=60 interval=0
primitive nfs_server systemd:nfs-server \
       opstart timeout=100 interval=0 \
       opstop timeout=100 interval=0
primitive nfsdrbd ocf:linbit:drbd \
       paramsdrbd_resource=nfs \
       opstart timeout=240 interval=0 \
       opstop timeout=100 interval=0 \
       opmonitor role=Master interval=20 timeout=30 \
       opmonitor role=Slave interval=30 timeout=30
ms ms_nfsdrbd nfsdrbd \
       metamaster-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
order ms_nfsdrbd_befor_mystore Mandatory: ms_nfsdrbdmystore
order mystore_befor_nfs Mandatory: mystorenfs_server
colocation mystore_with_ms_nfsdrbd inf:mystore ms_nfsdrbd:Master
colocation nfs_server_with_mystore inf:nfs_server mystore
property cib-bootstrap-options: \
       have-watchdog=false\
       dc-version=1.1.16-12.el7_4.4-94ff4df\
       cluster-infrastructure=corosync\
       cluster-name=webcluster\
       stonith-enabled=false\
       no-quorum-policy=ignore\
       migration-limit=1

4、添加虚拟vip：

crm(live)configure# primitive vipocf:heartbeat:IPaddr params ip=192.168.5.200 op monitor interval=20 timeout=20on-fail=restart
crm(live)configure# verify
crm(live)configure# colocation vip_with_nfsinf: vip nfs_server
crm(live)configure# verify
crm(live)configure# show
node 1: cml1 \
       attributesstandby=off
node 2: cml2 \
       attributesstandby=off
primitive mystore Filesystem \
       paramsdevice="/dev/drbd1" directory="/nfs_data" fstype=ext4 \
       opstart timeout=60 interval=0 \
       opstop timeout=60 interval=0
primitive nfs_server systemd:nfs-server \
       opstart timeout=100 interval=0 \
       opstop timeout=100 interval=0
primitive nfsdrbd ocf:linbit:drbd \
       paramsdrbd_resource=nfs \
       opstart timeout=240 interval=0 \
       opstop timeout=100 interval=0 \
       opmonitor role=Master interval=20 timeout=30 \
       opmonitor role=Slave interval=30 timeout=30
primitive vip IPaddr \
       paramsip=192.168.5.200 \
       opmonitor interval=20 timeout=20 on-fail=restart
ms ms_nfsdrbd nfsdrbd \
       metamaster-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
order ms_nfsdrbd_befor_mystore Mandatory:ms_nfsdrbd mystore
order mystore_befor_nfs Mandatory: mystorenfs_server
colocation mystore_with_ms_nfsdrbd inf:mystore ms_nfsdrbd:Master
colocation nfs_server_with_mystore inf:nfs_server mystore
colocation vip_with_nfs inf: vip nfs_server
property cib-bootstrap-options: \
       have-watchdog=false\
       dc-version=1.1.16-12.el7_4.4-94ff4df\
       cluster-infrastructure=corosync\
       cluster-name=webcluster\
       stonith-enabled=false\
       no-quorum-policy=ignore\
       migration-limit=1
crm(live)configure# commit

5、查看节点状态：

crm(live)# status
Stack: corosync
Current DC: cml1 (version1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Thu Oct 26 21:14:37 2017
Last change: Thu Oct 26 21:14:22 2017 byroot via cibadmin on cml1
 
2 nodes configured
5 resources configured
 
Online: [ cml1 cml2 ]
 
Full list of resources:
 
 Master/Slave Set: ms_nfsdrbd [nfsdrbd]
    Masters: [ cml2 ]
    Slaves: [ cml1 ]
 mystore (ocf::heartbeat:Filesystem):  Started cml2
 nfs_server    (systemd:nfs-server):    Started cml2
 vip (ocf::heartbeat:IPaddr): Started cml2

六、测试：

[root@cml2 ~]# df -TH

Filesystem Type Size Used Avail Use% Mounted on

/dev/mapper/centos-root xfs 19G 6.7G 13G 36% /

devtmpfs devtmpfs 501M 0 501M 0% /dev

tmpfs tmpfs 512M 278M 234M 55% /dev/shm

tmpfs tmpfs 512M 27M 486M 6% /run

tmpfs tmpfs 512M 0 512M 0% /sys/fs/cgroup

/dev/sda1 xfs 521M 161M 361M 31% /boot

tmpfs tmpfs 103M 0 103M 0% /run/user/0

/dev/drbd1 ext4 11G 69M 9.9G 1% /nfs_data

[root@cml2 ~]# ip addr

2: ens34: mtu 1500 qdisc pfifo_fast state UP qlen1000

link/ether 00:0c:29:5a:c5:ee brd ff:ff:ff:ff:ff:ff

inet 192.168.5.102/24 brd 192.168.5.255 scope global ens34

valid_lft forever preferred_lft forever

inet 192.168.5.200/24brd 192.168.5.255 scope global secondary ens34

valid_lft forever preferred_lft forever

###vip已经在cml2主机上了

[root@cml3 ~]# showmount -e 192.168.5.200

Export list for 192.168.5.200:

/nfs_data 192.168.5.0/24

[root@cml3 ~]# showmount -e 192.168.5.200

Export list for 192.168.5.200:

/nfs_data 192.168.5.0/24

[root@cml3 ~]# mkdir /nfs

[root@cml3 ~]# mount -t nfs192.168.5.200:/nfs_data/ /nfs

[root@cml3 ~]# df -TH

Filesystem Type Size Used Avail Use% Mounted on

/dev/mapper/centos-root xfs 19G 6.6G 13G 35% /

devtmpfs devtmpfs 503M 0 503M 0% /dev

tmpfs tmpfs 513M 0 513M 0% /dev/shm

tmpfs tmpfs 513M 14M 500M 3% /run

tmpfs tmpfs 513M 0 513M 0% /sys/fs/cgroup

/dev/sda1 xfs 521M 131M 391M 25% /boot