Corosync
corosync最初只是用来演示OpenAIS集群框架接口规范的一个应用,可以实现HA心跳信息传输的功能,是众多实现HA集群软件中之一,可以说corosync是OpenAIS的一部分,然而后面的发展超越了官方最初的设想,越来越多的厂商尝试使用corosync作为集群解决方案,如Redhat的RHCS集群套件就是基于corosync实现。
corosync只提供了message layer(即实现HeartBeat + CCM),而没有直接提供CRM,一般使用Pacemaker进行资源管理。
Pacemaker
pacemaker是一个开源的高可用资源管理器(CRM),位于HA集群架构中资源管理、资源代理(RA)这个层次,它不能提供底层心跳信息传递的功能,要想与对方节点通信需要借助底层的心跳传递服务,将信息通告给对方。
corosync和pacemaker的架构图
Corosync主要就是实现集群中Message layer层的功能:完成集群心跳及事务信息传递
Pacemaker主要实现的是管理集群中的资源(CRM),真正启用、停止集群中的服务是RA(资源代理)这个子组件。RA的类别有下面几种类型:
LSB:位于/etc/rc.d/init.d/*,至少支持start,stop,restart,status,reload,force-reload;
注意:不能开机自动运行;要有CRM来启动 //centos6用这种类型控制
OCF: /usr/lib/ocf/resource.d/provider/,类似于LSB脚本,但支持start,stop,status,monitor,meta-data;
STONITH:调用stonith设备的功能
systemd:unit file,/usr/lib/systemd/system/
注意:服务必须设置enable,开启自启; //centos7支持
service:调用用户的自定义脚本
实验环境:
虚拟机IP: 172.18.250.77 node1.magedu.com CentOS7
虚拟机IP: 172.18.250.78 node2.magedu.com CentOS7
一、安装corosync和pacemaker
安装之前先要确定节点上时间是否同步、防火墙和selinux是否会称为阻碍、各节点之间是否能通过主机名通信、各节点是否能通过主机密钥通信
]# hostname node1.magedu.com ]# hostname node2.magedu.com ]# ntpdate 172.18.0.1 //找台时间服务器在两节点上时间同步 ]# systemctl stop firewalld.service //停止防火墙 ]# getenforce //selinux确保关闭 Disabled
安装服务:
]# yum -y install corosync pacemaker ]# rpm -ql corosync /etc/corosync /etc/corosync/corosync.conf.example //corosync的配置文件 /etc/corosync/corosync.conf.example.udpu //基于UDP的配置文件 /etc/corosync/corosync.xml.example //基于xml的扩展标记配置文件 /usr/sbin/corosync //启动程序 ]# vim /etc/corosync/corosync.conf totem { version: 2 cluster_name: mycluster //集群名称 crypto_cipher: aes128 //对称加密算法 crypto_hash: sha1 //单向加密算法 interface { ringnumber: 0 //回环号码,如果一个主机有多块网卡,避免心跳信息回流 bindnetaddr: 172.18.0.0 //绑定心跳网段 corosync会自动判断本地网卡上配置的哪个IP地址是属于这个网络的,并把这个接口作为多播心跳信息传递的接口 mcastaddr: 239.25.1.1 //心跳信息组播地址(所有节点组播地址必须为同一个) mcastport: 5405 // 组播时使用的端口 ttl: 1 //只向外一跳心跳信息,避免组播报文环路 } } logging { //日志功能,默认就行 fileline: off to_stderr: no to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: no //是否启用syslog debug: off timestamp: on //是否打印时间戳,利于定位错误,但会产生大量系统调用,消耗CPU资源 logger_subsys { subsys: QUORUM debug: off } } quorum { //投票系统 provider: corosync_votequorum //支持哪种投票方式 # expected_votes: 8 //总投票数 two_nodes: 1 //两节点特殊 } nodelist { //节点列表 node { ring0_addr: 172.18.250.77 //节点IP nodeid: 1 //节点编号 } node { ring0_addr: 172.18.250.78 nodeid: 2 } }
创建集群之间传递心跳信息的共享密钥:
]# corosync-keygen --help Usage: corosync-keygen [-k] [-l] -k / --key-file= - Write to the specified keyfile instead of the default /etc/corosync/authkey. -l / --less-secure - Use a less secure random number source (/dev/urandom) that is guaranteed not to require user input for entropy. This can be used when this application is used from a script. ]# corosync-keygen -l //生成简单的密钥 ]# cd /etc/corosync/ ]# ll total 16 -r-------- 1 root root 128 May 29 18:16 authkey //自动生成在/etc/corosync/目录下,确保为400 #复制密钥文件到另一节点: ]# scp -p authkey corosync.conf [email protected]:/etc/corosync/ ]# systemctl start corosync.service pacemaker.service ]# ss -uan //确保端口启动 State Recv-Q Send-Q Local Address:Port Peer Address:Port UNCONN 0 0 172.18.250.77:5404 *:* UNCONN 0 0 172.18.250.77:5405 *:* UNCONN 0 0 239.25.1.1:5405 *:* #启动完成后要进行一系列的确认,看各组件工作是否正常: grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log // 另外一个节点上也执行同样的命令,来确保 Cluster Engine 工作是否正常 tail /var/log/cluster/corosync.log // 查看两个corosync节点之间成员关系是否初始化,两个节点之间应该开始同步一些集群事务信息 grep "TOTEM" /var/log/cluster/corosync.log // 查看初始化成员节点通知是否正常发出 grep ERROR /var/log/cluster/corosync.log // 检查启动过程中是否有错误产生 ]# crm_mon //查看集群节点 Last updated: Sun May 29 21:53:09 2016 Last change: Sun May 29 20:09:08 2016 by root via cibadmin on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 0 resources configured Online: [ node1.magedu.com node2.magedu.com ]
到此corosync已正常工作,后面需要配置各种服务资源,而pacemaker只是一个资源管理器,没提供管理接口,所以实现CRM的管理接口有两类:
CLI: 命令行接口
crmsh(SUSE):
pcs
GUI: 图形接口
HB_GUI
Conga(luci/riccl): Web接口
crmsh提供了一个命令行的交互接口来对Pacemaker集群进行管理,它具有更强大的管理功能,同样也更加易用,在更多的集群上都得到了广泛的应用,类似软件还有 pcs。注:在crm管理接口所做的配置会同步到各个节点上;
二、安装crmsh
]# ls crmsh-2.1.4-1.1.x86_64.rpm pssh-2.3.1-4.2.x86_64.rpm python-pssh-2.3.1-4.2.x86_64.rpm ]# yum -y install *.rpm ]# systemctl start corosync.server pacemaker.service
crm的特性:
1、任何操作都需要commit提交后才会生效;
2、想要删除一个资源之前需要先将资源停止
3、可以用 help COMMAND 获取该命令的帮助
4、与Linux命令行一样,都支持TAB补全
crm有两种工作方式:
1、命令行模式:
]# crm status Last updated: Mon May 30 10:07:02 2016 Last change: Mon May 30 10:06:50 2016 by hacluster via crmd on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 1 resource configured Online: [ node1.magedu.com node2.magedu.com ]
2、交互式模式:
]# crm crm(live)# status Last updated: Mon May 30 10:07:08 2016 Last change: Mon May 30 10:06:50 2016 by hacluster via crmd on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 1 resource configured Online: [ node1.magedu.com node2.magedu.com ]
crm(live)# help cib manage shadow CIBs // cib管理 resource resources management // 管理资源的启动、停止等 configure CRM cluster configuration // 编辑集群配置信息 node nodes management // 集群节点管理子命令 history CRM cluster history site Geo-cluster support ra resource agents information center // 资源代理子命令(所有与资源代理相关的程都在此命令之下) status show cluster status # 显示当前集群的状态信息 template Edit and import a configuration from a template //编辑或导入一个配置模板 script Cluster script management //集群脚本管理
crm常用的管理命令:
1、管理资源的约束、资源的粘性及资源的类型 CIB:集群事务库,保存并传播集群配置的文件
crm(live)# configure crm(live)configure# help acl_target Define target access rights _test Help for command _test clone Define a clone //定义一个克隆 colocation Colocate resources //定义资源的约束 commit Commit the changes to the CIB //保存配置到CIB default-timeouts Set timeouts for operations to minimums from the meta-data delete Delete CIB objects //删除一个CIB配置 edit Edit CIB objects //编辑配置文件 erase Erase the CIB fencing_topology Node fencing order //隔离节点顺序 filter Filter CIB objects //对CIB进行过滤 graph Generate a directed graph group Define a group //定义一个组 load Import the CIB from a file //从文件中导入CIB location A location preference modgroup Modify group monitor Add monitor operation to a primitive //监控一个资源 ms Define a master-slave resource node Define a cluster node //定义一个集群界定 op_defaults Set resource operations defaults order Order resources //顺序排列资源 primitive Define a resource //定义一个资源 property Set a cluster property //设置集群全局配置 verify verify the CIB with crm_verify // CIB语法验证 show display CIB objects // 显示CIB配置文件 rsc_defaults set resource defaults // 设置资源默认属性(粘性) location a location preference // 定义位置约束优先级(默认运行于那一个节点(如果位置约束的值相同,默认倾向性那一个高,就在那一个节点上运行)) order order resources // 资源的启动的先后顺序
定义资源的约束,默认为0
1、location:位置约束,描述对资源节点的倾向性
2、colocation:资源彼此间是否“在一起”的倾向性,运行在同一节点
3、order:资源启动/关闭的资源的倾向性
资源的类型:
primitive:基本资源,只能运行于一个节点
group:组资源, 将组成一个HA Service所需要的所有资源组织在一起
clone:克隆,同一资源可以出现多分副本,可以运行多个节点
multi-state(master/slave):是克隆型资源的特殊表示,副本间存在主从关系
2、管理资源的状态:
crm(live)resource# help cleanup Cleanup resource status //清理资源的状态 emote Demote a master-slave resource //对一个资源降级操作 failcount Manage failcounts //管理资源的错误次数 help Show help (help topics for list of topics) ls List levels and commands //列出等级和命令 maintenance Enable/disable per-resource maintenance mode manage Put a resource into managed mode meta Manage a meta attribute //管理资源的属性 migrate Migrate a resource to another node //强制对资源进行迁移 param Manage a parameter of a resource //管理资源的参数 promote Promote a master-slave resource //对一个资源升级操作 quit Exit the interactive shell //退出 refresh Refresh CIB from the LRM status reprobe Probe for resources not started by the CRM restart Restart a resource //重启资源 scores Display resource scores secret Manage sensitive parameters start Start a resource status Show status of resources //显示资源的状态 stop Stop a resource trace Start RA tracing //开启RA跟踪 unmanage Put a resource into unmanaged mode unmigrate Unmigrate a resource to another node untrace Stop RA tracing up Go back to previous level utilization Manage a utilization attribute
3、管理节点:
crm(live)# node crm(live)node# help attribute Manage attributes cd Navigate the level structure clearstate Clear node state //清除节点状态 delete Delete node //删除节点 fence Fence node //隔离节点 help Show help (help topics for list of topics) ls List levels and commands maintenance Put node into maintenance mode online Set node online //节点上线 quit Exit the interactive shell ready Put node into ready mode show Show node //显示节点的信息 standby Put node into standby //节点下线 status Show nodes' status as XML //已XML格式显示节点信息 status-attr Manage status attributes up Go back to previous level utilization Manage utilization attributes
4、RA资源代理:实现服务的真正启动、停止等操作
crm(live)ra# help cd Navigate the level structure classes List classes and providers // 为资源代理分类 help Show help (help topics for list of topics) info Show meta data for a RA //显示资源的属性 list List RA for a class (and provider) //列出RA可管理的服务 ls List levels and commands providers Show providers for a RA and a class quit Exit the interactive shell up Go back to previous level
示例:配置httpd的高可用
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=172.18.250.79 op monitor interval=20 timeout=20 crm(live)configure# primitive webserver systemd:httpd op monitor interval=20 timeout=20 crm(live)configure# verify crm(live)configure# commit crm(live)# status Last updated: Mon May 30 10:49:44 2016 Last change: Mon May 30 10:49:27 2016 by root via cibadmin on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 2 resources configured Online: [ node1.magedu.com node2.magedu.com ] vip (ocf::heartbeat:IPaddr): Started node1.magedu.com webserver (systemd:httpd): Started node2.magedu.com //定义的webserver会自动分配到node2节点上,corosync会自动实现资源均衡分配 crm(live)configure# group webservice vip webserver //定义组约束 crm(live)configure# verify crm(live)configure# commit crm(live)# status //资源都约束在了node1节点 Last updated: Mon May 30 10:52:53 2016 Last change: Mon May 30 10:52:31 2016 by root via cibadmin on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 2 resources configured Online: [ node1.magedu.com node2.magedu.com ] Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node1.magedu.com webserver (systemd:httpd): Started node1.magedu.com
测试是否实现httpd的高可用:
手动停止node1节点:
]# crm node standby //在250.77上手动下线node1
资源发生了迁移,并实现了httpd的高可用:
scrm(live)# status Last updated: Mon May 30 10:59:04 2016 Last change: Mon May 30 10:57:45 2016 by root via crm_attribute on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 2 resources configured Node node1.magedu.com: standby Online: [ node2.magedu.com ] Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node2.magedu.com webserver (systemd:httpd): Started node2.magedu.com
示例:配置一个基于NFS的高可用的mariadb服务
]# systemctl enable mariadb.service //两个节点上开启mariadb自启 Created symlink from /etc/systemd/system/multi-user.target.wants/mariadb.service to /usr/lib/systemd/system/mariadb.service. crm(live)configure# primitive webnfs ocf:heartbeat:Filesystem params device="172.18.250.76:/data" directory="/mydata/data" fstype="nfs" op monitor interval=20 timeout=20 //定义nfs资源 crm(live)configure# primitive webmysql systemd:mariadb op monitor interval=20 timeout=20 //定义mysql资源 crm(live)configure# group webservice vip webnfs webmysql //定义组约束 crm(live)configure# verify crm(live)configure# commit ]# vim /etc/my.cnf [mysqld] datadir=/mydata/data //修改数据库存储的目录 ]# chown mysql:mysql /mydata/data
测试节点下线mysql能否转移:
crm(live)# status //现在是在Node1上 Last updated: Mon May 30 13:05:46 2016 Last change: Mon May 30 12:54:24 2016 by root via crm_attribute on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 3 resources configured Online: [ node1.magedu.com node2.magedu.com ] Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node1.magedu.com webnfs (ocf::heartbeat:Filesystem): Started node1.magedu.com webmysql (systemd:mariadb): Started node1.magedu.com ]# crm node standby //node1执行standby crm(live)# status //资源转移到了node2上 Last updated: Mon May 30 13:07:00 2016 Last change: Mon May 30 13:06:35 2016 by root via crm_attribute on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 3 resources configured Node node1.magedu.com: standby Online: [ node2.magedu.com ] Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node2.magedu.com webnfs (ocf::heartbeat:Filesystem): Started node2.magedu.com webmysql (systemd:mariadb): Started node2.magedu.com
注意:corosync自带了投票系统的,当集群发生网络分区时,拥有资源的node服务器不能向其它节点发送心跳信息时,左右不在协调,这时需要quorum机制来实现资源的重新分配。
quorum表示法定票数,只要发生网络分区,投票系统根据己方所用有的节点数是否大于节点总数除以2,只要大于,那就是拥有法定票数的一方,那就会把资源转移过来并选出一个DC。没有法定票数的一方就执行stop(停止所有资源,默认)、ignore(忽略,继续提供工作),suiclde(自杀),freeze(冻结)。
crm(live)configure# property no-quorum-policy= //默认为stop no-quorum-policy (enum, [stop]): What to do when the cluster does not have quorum What to do when the cluster does not have quorum Allowed values: stop, freeze, ignore, suicide
资源的监控:
primitive
[description=
[params attr_list]
[meta attr_list]
[utilization attr_list]
[operations id_spec]
[op op_type [
attr_list :: [$id=
id_spec :: $id=
op_type :: start | stop | monitor
定义资源的监控时,需要查看默认时间,不能低于默认时间:
crm(live)ra# info systemd:mariadb systemd unit file for mariadb (systemd:mariadb) MariaDB database server Operations' defaults (advisory minimum): start timeout=15 stop timeout=15 status timeout=15 restart timeout=15 monitor timeout=15 interval=15 start-delay=15
资源的约束性:
1、定义约束组
2、location和colocation
location
colocation
crm(live)configure# location webip_pre_node1 vip inf: node1.magedu.com //定义位置约束 inf:表示无穷大 crm(live)configure# colocation webip_webnfs_webmysql inf: vip webnfs webmysql //定义资源“在一起”约束 crm(live)configure# order webip_before_webnfs_webmysql mandatory: vip webnfs webmysql //定义资源的启动顺序,mandatory表示强制
三、安装ldirectord,实现对ldirectord的高可用
ldirectors集成了lvs的负载均衡能力,也具有对后端主机进行健康检测的功能
]# ls ldirectord-3.9.6-0rc1.1.1.x86_64.rpm ]# yum -y install *.rpm ]# systemctl enable ldirectord.service
编辑配置文件:
]# vim /etc/ha.d/ldirectord.cf # Global Directives checktimeout=3 //检测超时时长 checkinterval=1 //检测扫描时间 #fallback=127.0.0.1:80 //后端主机全失效时,提供后端主机的服务 #fallback6=[::1]:80 autoreload=yes //配置文件发生改变后自动装载 logfile="/var/log/ldirectord.log" //日志 #logfile="local0" //日志格式 #emailalert="[email protected]" //邮件服务 #emailalertfreq=3600 #emailalertstatus=all quiescent=no //是否允许静默模式 # Sample for an http virtual service virtual=172.18.250.80:80 //定义vip real=172.18.250.76:80 gate //后端主机信息,算法是DR(gateway) real=172.18.250.75:80 gate //后端主机信息,算法是DR(gateway) fallback=127.0.0.1:80 gate service=http //后端主机提供服务的类型 scheduler=rr //调度算法,轮询 #persistent=600 //是否保持连接 #netmask=255.255.255.255 protocol=tcp //支持的协议 checktype=negotiate //检测方式 checkport=80 //检测端口 request="index.html" //请求的页面 receive="Test Page" //请求到的内容,才算后端主机正常
在corosync上定义ldiretord资源
crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip="172.18.250.80" lvs_support="DR" op monitor interval=15 timeout=15 crm(live)configure# primitive director systemd:ldirectord op monitor interval=15 timeout=15 crm(live)configure# group dirservice vip director crm(live)# status Last updated: Mon May 30 14:18:50 2016 Last change: Mon May 30 14:18:43 2016 by root via cibadmin on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 2 resources configured Online: [ node1.magedu.com node2.magedu.com ] Resource Group: dirservice vip (ocf::heartbeat:IPaddr2): Started node1.magedu.com director (systemd:ldirectord): Started node1.magedu.com
查看ipvs:
]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.18.250.80:80 rr -> 172.18.250.75:80 Route 1 0 0 -> 172.18.250.76:80 Route 1 0 0
让一节点下线,看资源是否能正常转移:
crm(live)# status //转移成功 Last updated: Mon May 30 14:32:31 2016 Last change: Mon May 30 14:32:26 2016 by root via crm_attribute on node1.magedu.com Stack: corosync Current DC: node1.magedu.com (version 1.1.13-10.el7-44eb2dd) - partition with quorum 2 nodes and 2 resources configured Node node1.magedu.com: standby Online: [ node2.magedu.com ] Resource Group: dirservice vip (ocf::heartbeat:IPaddr2): Started node2.magedu.com director (systemd:ldirectord): Started node2.magedu.com ]# ipvsadm -Ln //在node2上查看 IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.18.250.80:80 rr -> 172.18.250.75:80 Route 1 0 0 -> 172.18.250.76:80 Route 1 0 0
让一台后端主机下线,能否自动判断故障并踢出服务:
]# killall httpd //关闭httpd ]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.18.250.80:80 rr -> 172.18.250.75:80 Route 1 0 0 ]# service httpd start //启动httpd ]# ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.18.250.80:80 rr -> 172.18.250.75:80 Route 1 0 0 -> 172.18.250.76:80 Route 1 0 0 //自动添加