实验环境:
1)基于vmwere虚拟机实现
2)本配置共有两个测试节点,分别node1.a.com和node2.a.com,相的IP地址分别为172.16.4.11和172.16.4.22;
3)模拟实现的集群服务是mysql服务;
4)提供mysql服务的地址为172.16.4.1
- DEVICE=eth0
- BOOTPROTO=static
- IPADDR=172.16.4.11
- NETMASK=255.255.0.0
- ONBOOT=yes
- HWADDR=00:0c:29:92:c3:da
- NETWORKING=yes
- NETWORKING_IPV6=no
- HOSTNAME=node1.a.com
3)所有节点的主机名称和对应的IP地址解析服务可以正常工作,我们不需要DNS服务,只需要保证两个节点上的/etc/hosts文件均为下面的内容(两个节点均要配置):
172.16.4.11 node1.a.com node1
172.16.4.22 node2.a.com node2
#yum install -y libibverbs librdmacm lm_sensors libtool-ltdl openhpi-libs openhpi perl-TimeDate
- cluster-glue-1.0.6-1.6.el5.i386.rpm
- cluster-glue-libs-1.0.6-1.6.el5.i386.rpm
- corosync-1.2.7-1.1.el5.i386.rpm
- corosynclib-1.2.7-1.1.el5.i386.rpm
- heartbeat-3.0.3-2.3.el5.i386.rpm<-----用于传递心跳信息,此包必须安装
- heartbeat-libs-3.0.3-2.3.el5.i386.rpm
- libesmtp-1.0.4-5.el5.i386.rpm
- openais-1.1.3-1.6.el5.i386.rpm
- openaislib-1.1.3-1.6.el5.i386.rpm
- pacemaker-1.0.11-1.2.el5.i386.rpm
- pacemaker-libs-1.0.11-1.2.el5.i386.rpm
- perl-TimeDate-1.16-5.el5.noarch.rpm
- resource-agents-1.0.4-1.1.el5.i386.rpm
# cd /etc/corosync
# cp corosync.conf.example corosync.conf
# vim /etc/corosync/corosync.conf
- # Please read the corosync.conf.5 manual page
- compatibility: whitetank
- totem {
- version: 2
- secauth: on《-----传递心跳信息时,进行相互认证
- threads: 0
- interface {
- ringnumber: 0
- bindnetaddr: 172.16.0.0《------此处是您需要修改的地方,为网卡的网络地址
- mcastaddr: 226.94.1.1《-----发送组播信息,默认配置即可
- mcastport: 5405《---发送组播消息的端口,默认即可
- }
- }
- logging {《-------配置记录日志的选项
- fileline: off
- to_stderr: no
- to_logfile: yes
- to_syslog: yes
- logfile: /var/log/cluster/corosync.log《-----此处日志存放的地方
- debug: off《------关闭调试日志功能
- timestamp: on《-----打开时间戳
- logger_subsys {
- subsys: AMF
- debug: off
- }
- }
- amf {
- mode: disabled
- }
- service {《-----------------此处以下是您需要添加的内容
- ver: 0
- name: pacemaker《-----添加pacemaker
- }
- aisexec {
- user: root《------运行openais的用户
- group: root
- }
为两个节点创建corosync生成的日志所在的目录:
# mkdir /var/log/cluster
# ssh node2 -- mkdir /var/log/cluster
Starting Corosync Cluster Engine (corosync): [ OK ] 《------出现此,说明您的corosync已经启动
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
- Sep 15 11:20:41 localhost smartd[2350]: Opened configuration file /etc/smartd.conf
- Sep 15 11:24:24 localhost smartd[2416]: Opened configuration file /etc/smartd.conf
- Sep 22 10:38:39 localhost smartd[2671]: Opened configuration file /etc/smartd.conf
- Sep 22 11:23:01 localhost corosync[3530]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
- Sep 22 11:23:01 localhost corosync[3530]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
- Sep 22 11:23:01 localhost corosync[3530]: [TOTEM ] Initializing transport (UDP/IP).
- Sep 22 11:23:01 localhost corosync[3530]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
- Sep 22 11:23:01 localhost corosync[3530]: [TOTEM ] The network interface [172.16.4.11] is now up.
- Sep 22 11:23:02 localhost corosync[3530]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
- Sep 22 11:23:02 localhost corosync[3530]: [pcmk ] info: pcmk_startup: CRM: Initialized
- Sep 22 11:23:02 localhost corosync[3530]: [pcmk ] Logging: Initialized pcmk_startup
- Sep 22 11:23:02 localhost corosync[3530]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
- Sep 22 11:23:02 localhost corosync[3530]: [pcmk ] info: pcmk_startup: Service: 9
- Sep 22 11:23:02 localhost corosync[3530]: [pcmk ] info: pcmk_startup: Local hostname: node1.a.com
Starting Corosync Cluster Engine (corosync): [ OK ]《----出现此,说明您的节点2corosync已经启动,您需要在节点2上继续验证是否出现异常错误,执行验证错误的步骤即可。
- ============
- Last updated: Thu Sep 22 11:24:46 2011
- Stack: openais
- Current DC: node1.a.com - partition with quorum
- Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
- 2 Nodes configured, 2 expected votes
- 0 Resources configured.
- ============
- Online: [ node1.a.com node2.a.com ]《----此处说明您的两个集群节点均已运行正常。
- crm_verify[3590]: 2011/09/22_11:25:33 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
- crm_verify[3590]: 2011/09/22_11:25:33 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
- crm_verify[3590]: 2011/09/22_11:25:33 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
- Errors found during check: config not valid
- -V may provide more details
# crm configure property stonith-enabled=false《----这样执行的命令会提交而且会立即生效
- node node1.a.com
- node node2.a.com
- property $id="cib-bootstrap-options" \
- dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
- cluster-infrastructure="openais" \
- expected-quorum-votes="2" \
- stonith-enabled="false"《----stonith已经被禁用
- drbd83-8.3.8-1.el5.centos.i386.rpm
- kmod-drbd83-8.3.8-1.el5.centos.i686.rpm
# cp -f /usr/share/doc/drbd83-8.3.8/drbd.conf /etc/
- # You can find an example in /usr/share/doc/drbd.../drbd.conf.example 《----注释信息
- include "drbd.d/global_common.conf";《-------主要定义global段和common段
- include "drbd.d/*.res"; 《-----包含定义资源的文件
编辑/etc/drbd.d/global_common.conf
- global {
- usage-count no;
- # minor-count dialog-refresh disable-ip-verification
- }
- common {
- protocol C;
- handlers {
- pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
- pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
- local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
- fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; 《---开启fence功能
- split-brain "/usr/lib/drbd/notify-split-brain.sh root"; 《--防止脑裂
- out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; 《----同步写入
- before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
- after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
- }
- startup {
- wfc-timeout 120;
- degr-wfc-timeout 120;
- }
- disk {
- on-io-error detach;
- fencing resource-only;
- }
- net {
- cram-hmac-alg "sha1";
- shared-secret "mydrbdlab";
- }
- syncer {
- rate 100M; 《---同步时传输的速率
- }
- }
- resource mysql {
- on node1.a.com {
- device /dev/drbd0;
- disk /dev/sdb5;
- address 172.16.4.11:7789;
- meta-disk internal;
- }
- on node2.a.com {
- device /dev/drbd0;
- disk /dev/sdb5;
- address 172.16.4.22:7789;
- meta-disk internal;
- }
- }
- Writing meta data...
- initializing activity log
- NOT initialized bitmap
- New drbd meta data block successfully created.《-------此处显示您已经成功创建
- version: 8.3.8 (api:88/proto:86-94)
- GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
- 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:4891540
- 0:mysql Connected Secondary/Secondary Inconsistent/Inconsistent C r----
- version: 8.3.8 (api:88/proto:86-94)
- GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by [email protected], 2010-06-04 08:04:16
- 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
- ns:918728 nr:0 dw:0 dr:925856 al:0 bm:55 lo:1 pe:42 ua:223 ap:0 ep:1 wo:b oos:3974132
- [==>.................] sync'ed: 18.9% (3880/4776)M delay_probe: 55《-----此时已经在同步消息
- finish: 0:00:43 speed: 91,176 (70,568) K/sec
- 0:mysql Connected Primary/Secondary UpToDate/UpToDate C r----
- Primary/Secondary
- Secondary/Primary
#chkconfig drbd off
- 添加:datadir=/data
#chkconfig mysqld off
- # crm configure property stonith-enabled=flase《----因为没有stonish设备,所以这里设置为不启用
- # crm configure property no-quorum-policy=ignore《-----忽略没有quorum的决策
- #crm
- #configure
- # primitive MysqlIP ocf:heartbeat:IPaddr params ip="172.16.4.1"《----定义IP资源
- # primitive Mysql lsb:mysqld op start timeout="120" op stop tomeout="120" op monitor interval="20" timeout="30"《-------定义mysql服务资源
- #primitive MysqlDrbd ocf:linbit:drbd params drbd_resource="mysql" op monitor interval="15" role="Master" op monitor interval="30" role="Slave" op start timeout="240" op stop timeout="120"《------定义Drbd的集群服务资源,因为drbd有主从机制,这里需要定义“master”和“salve”
- # primitive MysqlFS ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/data" fstype="ext3" op start timeout="60" op stop timeout="60"《-----定义文件系统资源,为mysql准备存放数据的地方。
- # ms MS_MysqlDrbd MysqlDrbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" 《-----主从的clone资源
- # colocation Mysql_on_MysqlDrbd inf: Mysql MS_MysqlDrbd:Master 《---Mysql运行在主节点上(位置约束)
- # colocation MysqlFS_on_MS_MysqlDrbd inf: MysqlFS MS_MysqlDrbd:Master《----文件系统资源也运行在主节点上(位置约束)
- # order Drbd_before_MysqlFS inf: MS_MysqlDrbd:promote MysqlFS:start 《---inf是一种永久的约束,定义启动顺序,启动MysqlDrbd后再启动MysqlFS
- # order Mysql_after_MysqlFS inf: MysqlFS:promote Mysql:start《-----同上
- # group Gmysql MysqlFS MysqlIP Mysql 《----规定资源均在一个节点上
- # commit 《-----提交配置
- ============
- Last updated: Thu Sep 22 12:34:47 2011
- Stack: openais
- Current DC: node1.a.com - partition with quorum
- Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
- 2 Nodes configured, 2 expected votes
- 2 Resources configured.
- ============
- Online: [ node1.a.com node2.a.com ]
- Master/Slave Set: MS_MysqlDrbd
- Masters: [ node2.a.com ]
- Slaves: [ node1.a.com ]
- Resource Group: Gmysql
- MysqlFS (ocf::heartbeat:Filesystem): Started node2.a.com
- MysqlIP (ocf::heartbeat:IPaddr): Started node2.a.com
- Mysql (lsb:mysqld): Started node2.a.com
[root@node2 ~]# mysql《-----此时mysql服务已经启用
- Welcome to the MySQL monitor. Commands end with ; or \g.
- Your MySQL connection id is 3
- Server version: 5.0.77 Source distribution
- Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
- mysql> show databases;
- +--------------------+
- | Database |
- +--------------------+
- | information_schema |
- | lost+found |
- | mysql |
- | test |
- +--------------------+
- 4 rows in set (0.00 sec)
- mysql> create database mydb;《-----------创建一个数据库mydb
- Query OK, 1 row affected (0.02 sec)
- mysql> show databases;
- +--------------------+
- | Database |
- +--------------------+
- | information_schema |
- | lost+found |
- | mydb |
- | mysql |
- | test |
- +--------------------+
- 5 rows in set (0.01 sec)
- mysql>
#crm node standby
- ============
- Last updated: Thu Sep 22 12:49:19 2011
- Stack: openais
- Current DC: node1.a.com - partition with quorum
- Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
- 2 Nodes configured, 2 expected votes
- 2 Resources configured.
- ============
- Node node2.a.com: standby
- Online: [ node1.a.com ]
- Master/Slave Set: MS_MysqlDrbd
- Slaves: [ node2.a.com node1.a.com ]
- ============
- Last updated: Thu Sep 22 12:44:10 2011
- Stack: openais
- Current DC: node1.a.com - partition with quorum
- Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
- 2 Nodes configured, 2 expected votes
- 2 Resources configured.
- ============
- Online: [ node1.a.com node2.a.com ]
- Master/Slave Set: MS_MysqlDrbd
- Masters: [ node1.a.com ]
- Slaves: [ node2.a.com ]
- Resource Group: Gmysql
- MysqlFS (ocf::heartbeat:Filesystem): Started node1.a.com
- MysqlIP (ocf::heartbeat:IPaddr): Started node1.a.com
- Mysql (lsb:mysqld): Started node1.a.com
Ps:笔者在实验时,第一次资源扭转成功,又模拟节点1失效时,节点2却无法获得资源,最后老师指点,手动删除:location drbd-fence-by-handler-MS_MysqlDrbd MS_MysqlDrbd \
rule $id="drbd-fence-by-handler-rule-MS_MysqlDrbd" $role="Master" -inf: #uname ne node1.a.com即可。因为此时drbd默认不会将资源转来转去,只想留在本机上。