----本文大纲
corosync、pacemaker各自是什么
常见高可用集群解决方案
安装corosync、pacemaker
pacemaker资源管理器(CRM)命令注解
实例演示
一、corosync、pacemaker各自是什么
corosync是用于高可用环境中的提供通讯服务的,它位于高可用集群架构中的底层(Message Layer),扮演着为各节点(node)之间提供心跳信息传递这样的一个角色;
pacemaker是一个开源的高可用资源管理器(CRM),位于HA集群架构中资源管理、资源代理(RA)这个层次,它不能提供底层心跳信息传递的功能,它要想与对方节点通信需要借助底层的心跳传递服务,将信息通告给对方。通常它与corosync的结合方式有两种:
pacemaker作为corosync的插件运行;
pacemaker作为独立的守护进程运行;
注:
由于corosync的早期版本不具备投票能力,所以集群内的节点总数应为奇数,并且大于2
在corosync1.0的时候,其本身不具备票务功能(votes),不过在corosync2.0之后引入了votequorum
cman(DC)+corosync(如果想用pacemaker又想用cman,只能把cman当成corosync的插件来用)
二、常见高可用集群解决方案
heartbeat+crm
cman+rgmanager
cman+pacemaker
corosync+pacemaker(pacemaker作为资源管理器)
三、安装corosync、pacemaker
#yum install -y corosync
其配置文件位于/etc/corosync/下,模板为corosync.conf.example
# Please read the corosync.conf.5 manual page compatibility: whitetank #兼容08.以前的版本 totem { version: 2 #totme 的版本 secauth: off #安全认证是否打开,最好打开 threads: 0 #用于安全认证开启并行线程数 interface { ringnumber: 0 #环号码,如果一个主机有多块网卡,避免心跳信息回流 bindnetaddr: 192.168.1.1 #网络地址(节点所在的网络地址) mcastaddr: 226.94.1.1 #广播地址 mcastport: 5405 #多播占用的端口 ttl: 1 #只向外一跳心跳信息,避免组播报文环路 } } #totem定义集群内各节点间是如何通信的,totem本是一种协议,专用于corosync专用于各节点间的协议,协议是有版本的 logging { fileline: off to_stderr: no #日志信息是否发往错误输出(否) to_logfile: yes #是否记录日志文件 to_syslog: yes #是否记录于syslog日志-->此类日志记录于/var/log/message中 logfile: /var/log/cluster/corosync.log #日志存放位置 debug: off #只要不是为了排错,最好关闭debug,它记录的信息过于详细,会占用大量的磁盘IO. timestamp: on #记录日志的时间戳 logger_subsys { subsys: AMF debug: off } } amf { mode: disabled }
如果想让pacemker在corosync以一个插件来用,就要在corosync.conf文件写如下内容
service { ver:0 name:pacemaker } #corosync启动后会自动启动pacemaker aisexec { user :root group:root } #启用ais功能时以什么身份来运行,默认为root,aisexec区域也可以不写
#scp -p authkey corosync.conf 192.168.1.111:/etc/corosync/
第二步、启动corosync
[root@essun corosync]# ssh essun.node2.com 'service corosync start' Starting Corosync Cluster Engine (corosync): [ OK ] [root@essun corosync]# service corosync start Starting Corosync Cluster Engine (corosync): [ OK ]
查看日志信息,可以明显的看到corosync是否启动正常(在每一个节点上都要查看)
#tail -40 /var/log/cluster/corosync.log Apr 25 23:12:01 [2811] essun.node3.com crmd: info: update_attrd: Connecting to attrd... 5 retries remaining Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_replace: Digest matched on replace from essun.node2.com: cb225a22df77f4f0bfbf7bd73c7d4160 Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_replace: Replaced 0.4.1 with 0.4.1 from essun.node2.com Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_replace operation for section 'all': OK (rc=0, origin=essun.node2.com/crmd/24, version=0.4.1) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Forwarding cib_delete operation for section //node_state[@uname='essun.node3.com']/transient_attributes to master (origin=local/crmd/9) Apr 25 23:12:01 [2811] essun.node3.com crmd: info: do_log: FSA: Input I_NOT_DC from do_cl_join_finalize_respond() received in state S_PENDING Apr 25 23:12:01 [2811] essun.node3.com crmd: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ] Apr 25 23:12:01 [2809] essun.node3.com attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd) Apr 25 23:12:01 [2806] essun.node3.com cib: info: write_cib_contents: Wrote version 0.3.0 of the CIB to disk (digest: 02ededba58f5938f53dd45f5bd06f577) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation for section nodes: OK (rc=0, origin=essun.node2.com/crmd/26, version=0.5.1) Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: info: cib_process_diff: Diff 0.4.1 -> 0.5.1 from local not applied to 0.3.1: current "epoch" is less than required Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed, requesting a full refresh (-207) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation for section status: OK (rc=0, origin=essun.node2.com/crmd/29, version=0.5.2) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation for section status: OK (rc=0, origin=essun.node2.com/crmd/31, version=0.5.3) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section 'all': OK (rc=0, origin=local/crmd/4, version=0.5.3) Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: info: cib_process_diff: Diff 0.5.1 -> 0.5.2 from local not applied to 0.5.3: current "num_updates" is greater than required Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section 'all': OK (rc=0, origin=local/crmd/5, version=0.5.3) Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: info: cib_process_diff: Diff 0.5.2 -> 0.5.3 from local not applied to 0.5.3: current "num_updates" is greater than required Apr 25 23:12:01 [2807] essun.node3.com stonith-ng: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section 'all': OK (rc=0, origin=local/crmd/6, version=0.5.3) Apr 25 23:12:01 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation for section cib: OK (rc=0, origin=essun.node2.com/crmd/34, version=0.5.4) Apr 25 23:12:02 [2809] essun.node3.com attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='essun.node3.com']//transient_attributes//nvpair[@name='probe_complete']: No such device or address (rc=-6, origin=local/attrd/2, version=0.5.4) Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section /cib: OK (rc=0, origin=local/attrd/3, version=0.5.4) Apr 25 23:12:02 [2809] essun.node3.com attrd: notice: attrd_perform_update: Sent update 4: probe_complete=true Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation for section status: OK (rc=0, origin=essun.node2.com/attrd/4, version=0.5.5) Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/4) Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='essun.node3.com']//transient_attributes//nvpair[@name='probe_complete']: No such device or address (rc=-6, origin=local/attrd/5, version=0.5.5) Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section /cib: OK (rc=0, origin=local/attrd/6, version=0.5.5) Apr 25 23:12:02 [2809] essun.node3.com attrd: notice: attrd_perform_update: Sent update 7: probe_complete=true Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/7) Apr 25 23:12:02 [2806] essun.node3.com cib: info: retrieveCib: Reading cluster configuration from: /var/lib/pacemaker/cib/cib.dnz3rc (digest: /var/lib/pacemaker/cib/cib.dOgpug) Apr 25 23:12:02 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_apply_diff operation for section status: OK (rc=0, origin=essun.node2.com/attrd/4, version=0.5.6) Apr 25 23:12:02 [2806] essun.node3.com cib: info: write_cib_contents: Archived previous version as /var/lib/pacemaker/cib/cib-2.raw Apr 25 23:12:02 [2806] essun.node3.com cib: info: write_cib_contents: Wrote version 0.5.0 of the CIB to disk (digest: 420e9390e2cb813eebbdf3bb73416dd2) Apr 25 23:12:02 [2806] essun.node3.com cib: info: retrieveCib: Reading cluster configuration from: /var/lib/pacemaker/cib/cib.kgClFd (digest: /var/lib/pacemaker/cib/cib.gQtyTi) Apr 25 23:12:14 [2806] essun.node3.com cib: info: crm_client_new: Connecting 0x1d8dc80 for uid=0 gid=0 pid=2828 id=2dfaa45a-28c4-4c7e-9613-603fb1217e12 Apr 25 23:12:14 [2806] essun.node3.com cib: info: cib_process_request: Completed cib_query operation for section 'all': OK (rc=0, origin=local/cibadmin/2, version=0.5.6) Apr 25 23:12:14 [2806] essun.node3.com cib: info: crm_client_destroy: Destroying 0 events
如果正常后,就可以使用crm status命令来查看当前集群节点信息了
[root@essun corosync]# crm status Last updated: Fri Apr 25 23:18:11 2014 Last change: Fri Apr 25 23:12:01 2014 via crmd on essun.node2.com Stack: classic openais (with plugin) Current DC: essun.node2.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 0 Resources configured Online: [ essun.node2.com essun.node3.com ]
当前有两个节点在线,node2和node3
四、pacemaker资源管理器(CRM)命令注解
进入到crmsh中交互执行
2、crm命令介绍
一级子命令
[root@essun corosync]# crm crm(live)# help This is crm shell, a Pacemaker command line interface. Available commands: cib manage shadow CIBs #cib沙盒 resource resources management #所有的资源都在这个子命令后定义 configure CRM cluster configuration #编辑集群配置信息 node nodes management #集群节点管理子命令 options user preferences #用户优先级 history CRM cluster history# site Geo-cluster support ra resource agents information center #资源代理子命令(所有与资源代理相关的程都在此命令之下) status show cluster status #显示当前集群的状态信息 help,? show help (help topics for list of topics)#查看当前区域可能的命令 end,cd,up go back one level #返回第一级crm(live)# quit,bye,exit exit the program #退出crm(live)交互模式
resource子命令
所有的资源状态都此处控制
crm(live)resource# help vailable commands: status show status of resources #显示资源状态信息 start start a resource #启动一个资源 stop stop a resource #停止一个资源 restart restart a resource #重启一个资源 promote promote a master-slave resource #提升一个主从资源 demote demote a master-slave resource #降级一个主从资源 manage put a resource into managed mode unmanage put a resource into unmanaged mode migrate migrate a resource to another node #将资源迁移到另一个节点上 unmigrate unmigrate a resource to another node param manage a parameter of a resource #管理资源的参数 secret manage sensitive parameters #管理敏感参数 meta manage a meta attribute #管理源属性 utilization manage a utilization attribute failcount manage failcounts #管理失效计数器 cleanup cleanup resource status #清理资源状态 refresh refresh CIB from the LRM status #从LRM(LRM本地资源管理)更新CIB(集群信息库),在 reprobe probe for resources not started by the CRM #探测在CRM中没有启动的资源 trace start RA tracing #启用资源代理(RA)追踪 untrace stop RA tracing #禁用资源代理(RA)追踪 help show help (help topics for list of topics) #显示帮助 end go back one level #返回一级(crm(live)#) quit exit the program #退出交互式程序
configure子命令
所有资源的定义都是在此子命令下完成的
crm(live)configure# help Available commands: node define a cluster node #定义一个集群节点 primitive define a resource #定义资源 monitor add monitor operation to a primitive #对一个资源添加监控选项(如超时时间,启动失败后的操作) group define a group #定义一个组类型(将多个资源整合在一起) clone define a clone #定义一个克隆类型(可以设置总的克隆数,每一个节点上可以运行几个克隆) ms define a master-slave resource #定义一个主从类型(集群内的节点只能有一个运行主资源,其它从的做备用) rsc_template define a resource template #定义一个资源模板 location a location preference #定义位置约束优先级(默认运行于那一个节点(如果位置约束的值相同,默认倾向性那一个高,就在那一个节点上运行)) colocation colocate resources #排列约束资源(多个资源在一起的可能性) order order resources #资源的启动的先后顺序 rsc_ticket resources ticket dependency property set a cluster property #设置集群属性 rsc_defaults set resource defaults #设置资源默认属性(粘性) fencing_topology node fencing order #隔离节点顺序 role define role access rights #定义角色的访问权限 user define user access rights #定义用用户访问权限 op_defaults set resource operations defaults #设置资源默认选项 schema set or display current CIB RNG schema show display CIB objects #显示集群信息库对 edit edit CIB objects #编辑集群信息库对象(vim模式下编辑) filter filter CIB objects #过滤CIB对象 delete delete CIB objects #删除CIB对象 default-timeouts set timeouts for operations to minimums from the meta-data rename rename a CIB object #重命名CIB对象 modgroup modify group #改变资源组 refresh refresh from CIB #重新读取CIB信息 erase erase the CIB #清除CIB信息 ptest show cluster actions if changes were committed rsctest test resources as currently configured cib CIB shadow management cibstatus CIB status management and editing template edit and import a configuration from a template commit commit the changes to the CIB #将更改后的信息提交写入CIB verify verify the CIB with crm_verify #CIB语法验证 upgrade upgrade the CIB to version 1.0 save save the CIB to a file #将当前CIB导出到一个文件中(导出的文件存于切换crm 之前的目录) load import the CIB from a file #从文件内容载入CIB graph generate a directed graph xml raw xml help show help (help topics for list of topics) #显示帮助信息 end go back one level #回到第一级(crm(live)#) quit exit the program #退出crm交互模式
node子命令
节点管理和状态命令
crm(live)resource# cd .. crm(live)# node crm(live)node# help Node management and status commands. Available commands: status show nodes status as XML #以xml格式显示节点状态信息 show show node #命令行格式显示节点状态信息 standby put node into standby #模拟指定节点离线(standby在后面必须的FQDN) online set node online # 节点重新上线 maintenance put node into maintenance mode ready put node into ready mode fence fence node #隔离节点 clearstate Clear node state #清理节点状态信息 delete delete node #删除 一个节点 attribute manage attributes utilization manage utilization attributes status-attr manage status attributes help show help (help topics for list of topics) end go back one level quit exit the program
ra子命令
资源代理类别都在此处
crm(live)node# cd .. crm(live)# ra crm(live)ra# help Available commands: classes list classes and providers #为资源代理分类 list list RA for a class (and provider)#显示一个类别中的提供的资源 meta show meta data for a RA #显示一个资源代理序的可用参数(如meta ocf:heartbeat:IPaddr2) providers show providers for a RA and a class help show help (help topics for list of topics) end go back one level quit exit the program
注:
虽然这些命令所用的单词都很简单,但我还是将经常用得到的标注一下,虽然现在刚学完,记的比较清楚,但可能在以后的某一天对这里的某一个命令出现了盲区,岂不痛心疾首。(千万不要高估自己的记忆力,有时一个不小心就会骗了你!)
五、实例演示
注:
配置高可用的前提
时间同步
无密码登录
主机名解析
此处只为了演示命令的使用,并非生产环境配置
1、本机环境
系统:
centos 6.5 x86_64
节点:
essun.node2.com 192.168.1.111
essun.node3.com 192.168.1.108
各节点所需要的软件与资源
虚拟ip 一个 192.168.1.100
在两个节点上各安装上httpd服务,添加默认测试页,测试完成后禁止服务开机自动启动。
挂载nfs资源,提供nfs的主机为192.168.1.110
2、定义资源
禁用stonith-enable(如果不清楚有那些参数,可以使用按两下tab键对命令补全,使用cd ..可以反回到上一级命令)
crm(live)configure# property stonith-enabled=false #(假装故障的节点已经安全的关机了, 不启用stonith进行裁决) crm(live)configure# verify #(此处没有信息就表示己经是正确操作) crm(live)configure# commit #(此时就可以正常提交了) crm(live)configure# show #(显示己经提交且正在生效的属性信息) node essun.node2.com node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false"
忽略投票规则
crm(live)configure# property no-quorum-policy=ignore crm(live)configure# verify crm(live)configure# commit
定义一个虚拟ip
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.100 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false"
ip:参数名
定义一个文件系统挂载
先进入ra中查找文件系统所使用的资源代理
crm(live)configure ra# classes lsb ocf / heartbeat pacemaker service stonith crm(live)configure ra# list ocf CTDB ClusterMon Delay Dummy Filesystem HealthCPU HealthSMART IPaddr IPaddr2 IPsrcaddr LVM MailTo Route SendArp Squid Stateful SysInfo SystemHealth VirtualDomain Xinetd apache conntrackd controld dhcpd ethmonitor exportfs mysql named nfsserver pgsql ping pingd postfix remote rsyncd symlink crm(live)configure ra# providers Filesystem heartbeat
由此可知文件系统的资源代理是由ocf:heartbeat提供
查看此资源代理可的参数
crm(live)configure ra# meta ocf:heartbeat:Filesystem Manages filesystem mounts (ocf:heartbeat:Filesystem) Resource script for Filesystem. It manages a Filesystem on a shared storage medium. The standard monitor operation of depth 0 (also known as probe) checks if the filesystem is mounted. If you want deeper tests, set OCF_CHECK_LEVEL to one of the following values: 10: read first 16 blocks of the device (raw read) This doesn't exercise the filesystem at all, but the device on which the filesystem lives. This is noop for non-block devices such as NFS, SMBFS, or bind mounts. 20: test if a status file can be written and read The status file must be writable by root. This is not always the case with an NFS mount, as NFS exports usually have the "root_squash" option set. In such a setup, you must either use read-only monitoring (depth=10), export with "no_root_squash" on your NFS server, or grant world write permissions on the directory where the status file is to be placed. Parameters (* denotes required, [] the default): device* (string): block device The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specificatio n. directory* (string): mount point The mount point for the filesystem. fstype* (string): filesystem type The type of filesystem to be mounted. ...........省略中.......
此处带有*表示必须参数,现在我们就可以定义了
crm(live)configure# primitive webnfs ocf:heartbeat:Filesystem params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" op monitor interval=60s timeout=60s op start timeout=60s op stop timeout=60s crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false"
注解:
primitive #定义资源命令
webnfs #资源ID
ocf:heartbeat:Filesystem # 资源代理(RA)
params device="192.168.1.110:/share" #共享目录
directory="/var/www/html" #挂载目录
fstype="nfs" #文件类型
op monitor #对此webnfs做监控
interval=60s #间隔时间
timeout=60s #超时时间
op start timeout=60s #启动超时时间
op stop timeout=60s #停止超时时间
定义web服务资源
crm(live)configure# primitive webserver lsb:httpd crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" primitive webserver lsb:httpd property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false"
将多个资源整全在一起(绑定在一起运行)
crm(live)configure# group webservice webip webnfs webserver crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" primitive webserver lsb:httpd group webservice webip webnfs webserver property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false"
换个方式查看一下己生效的资源信息
crm(live)configure# cd .. crm(live)# status Last updated: Sat Apr 26 01:51:45 2014 Last change: Sat Apr 26 01:49:54 2014 via cibadmin on essun.node3.com Stack: classic openais (with plugin) Current DC: essun.node2.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ essun.node2.com essun.node3.com ] Resource Group: webservice webip (ocf::heartbeat:IPaddr): Started essun.node2.com webnfs (ocf::heartbeat:Filesystem): Started essun.node2.com webserver (lsb:httpd): Started essun.node2.com
上图表示所有的资源都在node2上,也就是192.168.1.111这个ip上,使用curl命令访问一下,看一下效果
[root@bogon share]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:63:4A:25 inet addr:192.168.1.110 Bcast:255.255.255.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe63:4a25/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2747 errors:0 dropped:0 overruns:0 frame:0 TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:212090 (207.1 KiB) TX bytes:99626 (97.2 KiB) [root@bogon share]# curl http://192.168.1.111 来自于NFS文件系统
此时模拟node2节点故障,看资源会是否转移
crm(live)node# standby essun.node2.com crm(live)# status Last updated: Sat Apr 26 02:05:24 2014 Last change: Sat Apr 26 02:04:17 2014 via crm_attribute on essun.node3.com Stack: classic openais (with plugin) Current DC: essun.node2.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 3 Resources configured Node essun.node2.com: standby Online: [ essun.node3.com ] Resource Group: webservice webip (ocf::heartbeat:IPaddr): Started essun.node3.com webnfs (ocf::heartbeat:Filesystem): Started essun.node3.com webserver (lsb:httpd): Started essun.node3.com
再curl一次
[root@bogon share]# curl http://192.168.1.111 curl: (7) couldn't connect to host [root@bogon share]# curl http://192.168.1.100 来自于NFS文件系统 [root@bogon share]# curl http://192.168.1.108 来自于NFS文件系统
注解:
第一次curl表示httpd服务己经不再节点node2上运行了
第二次curl表示我使用vip还是可能访问得到挂载页面,表示服务没有因node2下线而终止
第三次curl表示使用node3ip同样也能访问到服务,可能判断服务运行于node3上。
这时,如果node2重新上线服务是不会切换到node2上的,如果想让node2上线后可以切换回来可以使用位置约束来指定其权重
下面使用第二种方式来限定资源,先将组定义删除,可以在crm configure #edit 编辑cib文件,将组定义的条目删除即可
crm(live)node# online essun.node2.com crm(live)# status Last updated: Sat Apr 26 02:20:13 2014 Last change: Sat Apr 26 02:19:29 2014 via crm_attribute on essun.node2.com Stack: classic openais (with plugin) Current DC: essun.node2.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ essun.node2.com essun.node3.com ] Resource Group: webservice webip (ocf::heartbeat:IPaddr): Started essun.node3.com webnfs (ocf::heartbeat:Filesystem): Started essun.node3.com webserver (lsb:httpd): Started essun.node3.com
服务果然没有回来,看我咋把它收回来的a_c!
第一步,删除组限定,最好的办法使用edit命令,同样也可使用命令
crm(live)resource# stop webservice #组别名 crm(live)configure# delete webservice #删除组别 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com \ attributes standby="off" node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" primitive webserver lsb:httpd property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1398450597"
这时己经没有组别定义了,这样就可以进行我的“计划”了
定义排列约束(在一起的可能性)
crm(live)configure# colocation webserver-with-webnfs-webip inf: webip webnfs webserver crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com \ attributes standby="off" node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" primitive webserver lsb:httpd colocation webserver-with-webnfs-webip inf: webip webnfs webserver property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1398450597"
注解:
colocation:排列约束命令
webserver-with-webnfs-webip: #约束名(ID)
inf:#(可能性,inf表示永久在一起,也可以是数值)
webip webnfs webserver:#资源名称
定义资源启动顺序
crm(live)configure# order ip_before_webnfs_before_webserver mandatory: webip webnfs webserver crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com \ attributes standby="off" node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" primitive webserver lsb:httpd colocation webserver-with-webnfs-webip inf: webip webnfs webserver order ip_before_webnfs_before_webserver inf: webip webnfs webserver property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1398450597"
注解:
order :顺序约束的命令
ip_before_webnfs_before_webserver #约束ID
mandatory: #指定级别(此处有三种级别:mandatory:强制, Optional:可选,Serialize:序列化)
webip webnfs webserver #资源名,这里书写的先后顺序相当重要
定义位置约束
crm(live)configure# location webip_and_webnfs_and_webserver webip 500: essun.node2.com crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com \ attributes standby="off" node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" primitive webserver lsb:httpd location webip_and_webnfs_and_webserver webip 500: essun.node2.com colocation webserver-with-webnfs-webip inf: webip webnfs webserver order ip_before_webnfs_before_webserver inf: webip webnfs webserver property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1398450597"
注解:
定义默认资源属性
crm(live)configure# rsc_defaults resource-stickiness=100 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node essun.node2.com \ attributes standby="off" node essun.node3.com primitive webip ocf:heartbeat:IPaddr \ params ip="192.168.1.100" primitive webnfs ocf:heartbeat:Filesystem \ params device="192.168.1.110:/share" directory="/var/www/html" fstype="nfs" \ op monitor interval="60s" timeout="60s" \ op start timeout="60s" interval="0" \ op stop timeout="60s" interval="0" primitive webserver lsb:httpd location webip_and_webnfs_and_webserver webip 500: essun.node2.com colocation webserver-with-webnfs-webip inf: webip webnfs webserver order ip_before_webnfs_before_webserver inf: webip webnfs webserver property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.3-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1398450597" rsc_defaults $id="rsc-options" \ resource-stickiness="100"
注解:
这样定义代表集群中每一个资源的默认粘性,只有当资源服务不在当前节点时,粘性才会生效,比如,这里我定义了三个资源webip、webnfs、webserver,对每一个资源的粘性为100,那么加在一起就变成了300,之前己经定义node2的位置约束的值为500,当node2宕机后,重新上线,这样就切换到node2上了。
最后看一下状态,资源都运行于node2上,将node2故障
crm(live)# status Last updated: Sat Apr 26 03:14:30 2014 Last change: Sat Apr 26 03:14:19 2014 via cibadmin on essun.node3.com Stack: classic openais (with plugin) Current DC: essun.node2.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ essun.node2.com essun.node3.com ] webip (ocf::heartbeat:IPaddr): Started essun.node2.com webnfs (ocf::heartbeat:Filesystem): Started essun.node2.com webserver (lsb:httpd): Started essun.node2.com crm(live)# node crm(live)node# standby essun.node2.com
资源己在node3上运行了
crm(live)# status Last updated: Sat Apr 26 03:18:17 2014 Last change: Sat Apr 26 03:15:20 2014 via crm_attribute on essun.node3.com Stack: classic openais (with plugin) Current DC: essun.node2.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 3 Resources configured Node essun.node2.com: standby Online: [ essun.node3.com ] webip (ocf::heartbeat:IPaddr): Started essun.node3.com webnfs (ocf::heartbeat:Filesystem): Started essun.node3.com webserver (lsb:httpd): Started essun.node3.com
再curl两次
[root@bogon share]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:63:4A:25 inet addr:192.168.1.110 Bcast:255.255.255.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe63:4a25/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2747 errors:0 dropped:0 overruns:0 frame:0 TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:212090 (207.1 KiB) TX bytes:99626 (97.2 KiB) [root@bogon share]# curl http://192.168.1.100 来自于NFS文件系统 [root@bogon share]# curl http://192.168.1.108 来自于NFS文件系统 [root@bogon share]#
将node2重新上线看资源是否能回来
crm(live)node# online essun.node2.com crm(live)node# cd .. crm(live)# status Last updated: Sat Apr 26 03:21:46 2014 Last change: Sat Apr 26 03:21:36 2014 via crm_attribute on essun.node3.com Stack: classic openais (with plugin) Current DC: essun.node2.com - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ essun.node2.com essun.node3.com ] webip (ocf::heartbeat:IPaddr): Started essun.node2.com webnfs (ocf::heartbeat:Filesystem): Started essun.node2.com webserver (lsb:httpd): Started essun.node2.com
再curl三次
[root@bogon share]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:63:4A:25 inet addr:192.168.1.110 Bcast:255.255.255.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe63:4a25/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2747 errors:0 dropped:0 overruns:0 frame:0 TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:212090 (207.1 KiB) TX bytes:99626 (97.2 KiB) [root@bogon share]# curl http://192.168.1.100 来自于NFS文件系统 [root@bogon share]# curl http://192.168.1.108 curl: (7) couldn't connect to host [root@bogon share]# curl http://192.168.1.111 来自于NFS文件系统 [root@bogon share]#
注解:
1.100是虚拟的集群IP
1.108为essun.node3.com
1.111为essun.node2.com
事实证明,资源还是夺回来了
=======================到此corosync+pacemaker的crmsh常用指令介绍完毕===========
PS:
英文不好,可能注释的不够准确,各们看官请多多海涵a_c~~~~~~