案例概述:
案例(一):使用ansible全自动化配置corosync集群环境
在生产环境中,经常遇到需要配置集群环境或相同配置的服务器,如果手工一台一台的去调试,这样会增加我们再运维过程中的工作量及错误的发生,我们应怎样去优化及减少错误的发生?下面我们通过一个案例实现自动化部署corosync环境:
准备工作:
(1)将已经配置好的corosync配置文件做好模板存放
# mkdir -pv ansible/corosync/{conf,packages} # cp -p /etc/corosync/{authkey,corosync.conf} /ansible/corosync/conf/
(2)将corosync集群套件依赖rpm打包
**corosync-1.4.1-17.el6.x86_64
**crmsh-1.2.6-4.el6.x86_64.rpm
**pssh-2.3.1-2.el6.x86_64.rpm
# cp -p /root/corosync_packages/{crmsh-1.2.6-4.el6.x86_64.rpm,pssh-2.3.1-2.el6.x86_64.rpm} /ansible/corosync/packages/
(3)将以上配置文件及rpm包文件打包传送至ansible服务器上再解压
# tar -czvf corosync.tar.gz ansible/ # scp corosync.tar.gz [email protected]:/ ansible服务器操作: # tar xf corosync.tar.gz # tree -a /ansible /ansible `-- corosync |-- .corosync_install.yaml.swp |-- conf | |-- authkey | `-- corosync.conf `-- packages |-- crmsh-1.2.6-4.el6.x86_64.rpm `-- pssh-2.3.1-2.el6.x86_64.rpm 3 directories, 5 files
配置ansible:
(1)服务部署架构如下:
node1.samlee.com 172.16.100.6 为:hanodes组 node2.samlee.com 172.16.100.7 为:hanodes组 ##node1和node2必须配置双机互信
(2)配置ansible端能基于密钥认证的方式联系各被管理的节点
--生成公钥/私钥 # ssh-keygen -t rsa -P '' --拷贝密钥文件至各管理节点 # ssh-copy-id -i .ssh/id_rsa.pub [email protected] # ssh-copy-id -i .ssh/id_rsa.pub [email protected]
(3)定义主机组
# vim /etc/ansible/hosts [hanodes] node1.samlee.com node2.samlee.com
定义playbook剧本实现全自动化安装配置:
# vim /ansible/corosync/corosync_install.yaml - hosts: hanodes remote_user: root vars: crmsh: crmsh-1.2.6-4.el6.x86_64.rpm pssh: pssh-2.3.1-2.el6.x86_64.rpm tasks: - name: corosync installing yum: name=corosync state=present - name: pacemaker installing yum: name=pacemaker state=present - name: crmsh rpm packages copy: src=/ansible/corosync/packages/{{ crmsh }} dest=/tmp/{{ crmsh }} - name: pssh rpm packages copy: src=/ansible/corosync/packages/{{ pssh }} dest=/tmp/{{ pssh }} - name: crmsh_pssh installing command: yum -y install /tmp/{{ crmsh }} /tmp/{{ pssh }} - name: authkey configure file copy: src=/ansible/corosync/conf/authkey dest=/etc/corosync/authkey - name: authkey mode 400 file: path=/etc/corosync/authkey mode=400 notify: - restart corosync - name: corosync.conf cofigure file copy: src=/ansible/corosync/conf/corosync.conf dest=/etc/corosync/corosync.conf tags: - conf notify: - restart corosync - name: ensure the corosync service startup on boot service: name=corosync state=started enabled=yes handlers: - name: restart corosync service: name=corosync state=restarted
执行playbook剧本:
# ansible-playbook /ansible/corosync/corosync_install.yaml
如果配置文件发生改变需要再次同步配置文件,可以指定playbook指定段,使用 -t 段名:
# ansible-playbook /ansible/corosync/corosync_install.yaml -t conf
以上为使用ansible全自动化配置corosync集群环境内容。
---------------------------------------------------------------------------------------------------------
案例(二):NFS+httpd实现高可用web服务
(1)配置如下:
crm(live)# configure crm(live)configure# property stonith-enabled=false crm(live)configure# property no-quorum-policy=ignore crm(live)configure# rsc_defaults resource-stickiness=100 crm(live)configure# verify crm(live)configure# commit crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.16.100.56 op monitor interval=30s timeout=20s on-fail=restart crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.16.100.9:/web/htdocs" directory="/var/www/html" fstype="nfs" op monitor interval=20s timeout=40s op start timeout=60s op stop timeout=60s on-fail=restart crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=20s on-fail=restart crm(live)configure# group webservice webip webstore webserver ##定义资源顺序约束: crm(live)configure# order webip_before_webstore_before_webserver mandatory: webip webstore webserver crm(live)configure# verify crm(live)configure# commit
查看效果如下:
案例(三):使用pcs配置corosync高可用集群管理
pcs常用命令汇总:
查看集群状态:
# pcs status
查看集群当前配置:
# pcs config
开机后集群自启动:
# pcs cluster enable –all
启动集群:
# pcs cluster start –all
查看集群资源状态:
# pcs resource show
验证集群配置情况:
# crm_verify -L -V
测试资源配置:
# pcs resource debug-start resource
设置节点为备用状态:
# pcs cluster standby node1
查看集群所有属性 :
# pcs property list --all
查看集群所有默认属性 :
# pcs property list --default
显示RA classes(资源类别):
# pcs resource standards
显示OCF的providers:
# pcs resource providers
显示某类别下所有RA例子:
# pcs resource agents ocf:heartbeat
显示某RA的属性信息例子:
pcs resource describe ocf:heartbeat:IPaddr
显示所有资源约束信息:
# pcs constraint list --full
删除资源约束
# pcs constraint order remove webstore webip
删除资源组及资源
# pcs resource delete webservice
只删除组资源不删除明细资源
# pcs resource ungroup webservice
迁移资源与crmsh的 # crm resource migrate webip 执行结果一样
# pcs resource move webip
1.修改集群全局配置--禁用stonith设备
# pcs property set stonith-enabled=false --查询修改后配置 # pcs property list --all | grep stonith stonith-action: reboot stonith-enabled: false stonith-timeout: 60s
2.修改集群全局配置--配置no-quorum-policy票数选项
# pcs property set no-quorum-policy=ignore --查询修改后配置 # pcs property list --all | grep no-quorum-policy no-quorum-policy: ignore
3.修改集群全局配置--配置defaults resource-stickiness资源粘性值
# pcs resource defaults resource-stickiness=100 --查询修改后配置 # crm configure show node node1.samlee.com node node2.samlee.com property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6-368c726" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1471324551" rsc_defaults $id="rsc-options" \ resource-stickiness="100"
案例应用:使用pcs配置web服务高可用
(1)配置如下:
##配置浮点VIP # pcs resource create webip ocf:heartbeat:IPaddr ip=172.16.100.55 op monitor interval=30s timeout=20s ##配置nfs共享存储 # pcs resource create webstore ocf:heartbeat:Filesystem device="172.16.100.9:/web/htdocs" directory="/var/www/html" fstype="nfs" op monitor interval=20s timeout=40s op start timeout=60s op stop timeout=60s ##配置httpd服务 # pcs resource create webserver lsb:httpd op monitor interval=30s timeout=20s on-fail=restart ##建立资源组 # pcs resource group add webservice webip webstore webserver ##定义资源启动顺序约束--先启动webip之后再启动webstore之后再启动webserver # pcs constraint order webip then webstore # pcs constraint order webstore then webserver ##查询约束信息 # pcs constraint order show Ordering Constraints: start webip then start webstore start webstore then start webserver ##定义资源位置约束--倾向于运行在node1节点上 # pcs constraint location webservice prefers node1.samlee.com=500 ##查询约束信息 # pcs constraint location show Location Constraints: Resource: webservice Enabled on: node1.samlee.com (score:500)