前提:
1)本配置共有两个测试节点,分别node1.blue.com和node2.blue.com,IP地址分别为172.16.8.100和172.16.8.101;共享数据NFS服务,172.16.8.102;
2)集群服务为mariaDB服务;
3)提供mariaDB服务的地址为172.16.8.200,即vip;
确保2个节点的时间必须同步:(可以直接搭建一个NTP服务)
# ntpdate 172.16.0.1 (两个节点保持一致)
1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,
需要保证两个节点上的/etc/hosts文件均为下面的内容:
172.16.8.100 node1.blue.com node1
172.16.8.101 node2.blue.com node2
为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令:
NODE1:172.16.8.100
# vim /etc/sysconfig/network
HOSTNAME=node1.blue.com
# hostname node1.blue.com
NODE2:172.16.8.101
# vim /etc/sysconfig/network
HOSTNAME=node2.blue.com
# hostname node2.blue.com
2)设定两个节点可以基于密钥进行ssh通信,这可以通过类似如下的命令实现:
Node1:172.16.8.100
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
Node2:172.16.8.101
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
2、配置corosync,(以下命令在node1.magedu.com上执行)
# yum install corosync pacemaker -y :NODE2 做一样的操作;
# rpm -ql corosync 查看生成了那些文件;
# cd /etc/corosync
# ls
# cp corosync.conf.example corosync.conf
# vim corosync.conf
修改:
secauth:on
并设定此配置文件中 bindnetaddr后面的IP地址为你的网卡所在网络的网络地址
bindnetaddr:172.16.0.0
mcastaddr:239.123.321.99 不要使用默认地址就行;
mcastport: 5405 监听端口;
to_syslog: no
service {
ver: 0版本号
name: pacemaker
# use_mgmtd: yes(有没有这一项都可以)
}
aisexec {可以不用指定,不是关键信息
user: root
user: root
}
生成节点间通信时用到的认证密钥文件:
# corosync-keygen(如果密钥不够,可以下载rpm包来增加密钥的数量)
将Node1:corosync和authkey复制至node2:
# scp -p corosync authkey node2:/etc/corosync/
NODE2 上进行验证即可, # ll /etc/corosync/
分别为两个节点创建corosync生成的日志所在的目录:
# mkdir /var/log/cluster
# ssh node2 'mkdir /var/log/cluster'
3、启动corosync(以下命令在node1上执行):
# service corosync start; ssh node2 'service corosync start'
# ss -tnul 5405 的多播地址被打开了;
# cd /var/log/cluster
# ls
# tail -f corosync.log 时时查看 日志信息;
查看corosync引擎是否正常启动:
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
查看初始化成员节点通知是否正常发出:
# grep TOTEM /var/log/cluster/corosync.log
检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务;此处可安全忽略。
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
查看pacemaker是否正常启动:
# grep pcmk_startup /var/log/cluster/corosync.log
4、安装crmsh
# cd
# lftp 172.16.0.1/pub
# cd Sources/6.x86_64/corosync> mget crmsh-2.1-1.6.x86_64.rpm
# cd Sources/6.x86_64/crmsh> mget pssh-2.3.1-2.e16.x86_64.rpm
# yum --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-4.1.x86_64.rpm
节点二 也做安装即可;
# scp crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-4.1.x86_64.rpm node2:/root
# crm 是一个模式化shell
crm(live)# help
crm(live)# help status 查看命令的用法;
如果安装了crmsh,可使用如下命令查看集群节点的启动状态:
[root@node1 ~]# crm status
Last updated: Sun May 31 15:35:28 2015
Last change: Sun May 31 13:35:35 2015
Stack: classic openais (with plugin)
Current DC: node1.blue.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1.blue.com node2.blue.com ]
从上面的信息可以看出两个节点都已经正常启动,并且集群已经处于正常工作状态。
执行ps auxf命令可以查看corosync启动的各相关进程。
root 2546 0.5 1.4 554968 14560 ? Ssl 13:13 0:46 corosync
189 2552 0.0 1.2 95180 12248 ? S< 13:13 0:05 \_ /usr/libexec/pacemaker/cib
root 2553 0.0 0.5 95288 5104 ? S< 13:13 0:01 \_ /usr/libexec/pacemaker/stonithd
root 2554 0.0 0.3 62928 3428 ? S< 13:13 0:01 \_ /usr/libexec/pacemaker/lrmd
189 2555 0.0 0.3 85928 3752 ? S< 13:13 0:01 \_ /usr/libexec/pacemaker/attrd
189 2556 0.0 2.3 121748 23560 ? S< 13:13 0:01 \_ /usr/libexec/pacemaker/pengine
189 2557 0.0 0.8 136148 8048 ? S< 13:13 0:02 \_ /usr/libexec/pacemaker/crmd
5、配置集群的工作属性,禁用stonith (node1 域 node2 要做相同的步骤)
corosync默认启用了stonith,而当前集群并没有相应的stonith设备,因此此默认配置目前尚不可用,这可以通过如下命令验正:
我们里可以通过如下命令先禁用stonith:
# crm configure property stonith-enabled=false
或者
# crm
crm(live)# configure
crm(live)configure# property stonith-enabled=false
crm(live)configure# show
node node1.blue.com \
attributes standby=off
node node2.blue.com \
attributes standby=off
property cib-bootstrap-options: \
dc-version=1.1.11-97629de \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false
6、创建NFS,并共享:
NFS: 172.16.8.102
# mkdir /mydata
# vim /etc/exports
/mydata 172.16.0.0/16(rw,no_root_squash)
# service nfs restart
Node1:
# mount -t nfs 172.16.8.102:/mydata /mydata
Node2:
# mount -t nfs 172.16.8.102:/mydata /mydata
7、安装mariadb:
[root@node1 ~]# lftp 172.16.0.1/pub
cd ok, cwd=/pub
lftp 172.16.0.1:/pub> cd Sources/sources/mariadb/
lftp 172.16.0.1:/pub/Sources/sources/mariadb> mget mariadb-5.5.43-linux-x86_64.tar.gz
# tar xf mariadb-5.5.43-linux-x86_64.tar.gz -C /usr/local
# cd /usr/local/
# groupadd -r -g 306 mysql
# useradd -r -g 306 -u 306 mysql
# mkdir /mydata/data
# chown -R mysql.mysql /mydata/data
# ll /mydata/data
# ln -sv mariadb-5.5.43-linux-x86_64 mysql
# cd mysql
# ll
# chown -R root.mysql ./*
# scripts/mysql_install_db --user=mysql --datadir=/mydata/data
[root@node1 mysql]# ll /mydata/data/
total 32
-rw-rw---- 1 nobody nobody 16384 May 31 12:06 aria_log.00000001
-rw-rw---- 1 nobody nobody 52 May 31 12:06 aria_log_control
drwx------ 2 nobody nobody 4096 May 31 12:06 mysql
drwx------ 2 nobody nobody 4096 May 31 12:06 performance_schema
drwx------ 2 nobody nobody 4096 May 31 12:06 test
# cp support-files/mysql.server /etc/rc.d/init.d/mysqld
# chkconfig --add mysqld
# chkconfig mysqld off
# mkdir /etc/mysql
# cp support-files/my-large.cnf /etc/mysql/my.cnf
# vim /etc/mysql/my.cnf
添加内容为:
datadir = /mydata/data
innodb_file_per_table = on
skip_name_resolve = on
# service mysqld start
# /usr/local/mysql/bin/mysql
> GRANT ALL ON *.* TO 'root'@'172.16.%.%' IDENTIFIED BY 'mageedu';
> FLUSH PRIVILEGES;
> quit
# service mysqld stop
# scp /etc/mysql/my.cnf node2:/etc/mysql
NODE2 也做相同步骤
# groupadd -r -g 306 mysql
# useradd -r -g 306 -u 306 mysql
# lftp 172.16.0.1/pub
> cd Source/source/mariaDB mget mariadb-5.5.43-linux-x86_64.tar.gz
> bye
[root@node2 ~]# mkdir /mydata
[root@node2 ~]# mount -t nfs 172.16.8.102:/mydata /mydata
看看刚刚Node1 节点上面创建的mysql文件是否存在;
[root@node2 ~]# cd /mydata/
[root@node2 mydata]# cd data/
[root@node2 data]# ls
aria_log.00000001 ibdata1 ib_logfile1 mysql-bin.000001 performance_schema
aria_log_control ib_logfile0 mysql mysql-bin.index test
# tar xf mariadb-5.5.43-linux-x86_64.tar.gz -C /usr/local
# cd /usr/local
# ln -sv mariadb-5.5.43-linux-x86_64 mysql
# cd mysql
# ll
# chown -R root.mysql ./*
# ll
# cp support-files/mysql.server /etc/rc.d/init.d/mysqld
# chkconfig --add mysqld
# chkconfig mysqld off
# service mysqld start
# /usr/local/mysql/bin/mysql
> CREATE DATABASE testdb;
> SHOW DATABASES;
> exit
# service mysqld stop
NODE1:
# service mysqld start
# /usr/local/mysql/bin/mysql
> SHOW DATABASES;
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
| testdb |
+--------------------+
5 rows in set (0.02 sec)
查看刚刚在NODE2上创建的 testdb 是否存在;存在,说明2者访问的数据库是同一个;
# service mysqld stop
8、接下来要创建的mariaDB集群创建一个IP地址资源,以在通过集群提供mariaDB服务时使用;这可以通过如下方式实现:
# crm
crm(live)# configure
定义IP资源:
crm(live)configure# primitive myip ocf:heartbeat:IPaddr params ip='172.16.8.200' op monitor interval=10s timeout=20s
crm(live)configure# verify检查是否有语法错误,
定义mariaDB资源:
crm(live)configure# primitive myserver lsb:mysqld op monitor interval=20s timeout=20s
crm(live)configure# verify
定义NFS资源:
crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.16.8.102:/mydata" directory="/mydata" fstype="nfs" op monitor interval=20s timeout=40s op start timeout=60s op stop timeout=60s
crm(live)configure# verify
crm(live)configure# commit提交定义的资源信息;
定义排列约束:应该nfs 跟随 webip;
crm(live)configure# colocation webstore_with_myip inf: webstore myip
inf:表示无穷大之意;冒号后面要跟空格,否则会是语法错误;
crm(live)configure# verify
定义排列约束:应该 mariadb 跟随 nfs;
crm(live)configure# colocation myserver_with_webstore inf: myserver webstore
crm(live)configure# verify
定义顺序约束: myip启动之后,nfs 才能启动
crm(live)configure# order myip_before_webstore Mandatory: myip webstore
crm(live)configure# verify
定义顺序约束: nfs启动之后,mariadb 才能启动
crm(live)configure# order webstore_before_myserver Mandatory: webstore myserver
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
node node1.blue.com \
attributes standby=off
node node2.blue.com \
attributes standby=on
primitive myip IPaddr \
params ip=172.16.8.200 \
op monitor interval=10s timeout=20s
primitive myserver lsb:mysqld \
op monitor interval=20s timeout=20s
primitive webstore Filesystem \
params device="172.16.8.102:/mydata" directory="/mydata" fstype=nfs \
op monitor interval=20s timeout=40s \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0
colocation myserver_with_webstore inf: myserver webstore
Node2 :
# crm status
Last updated: Sun May 31 17:15:10 2015
Last change: Sun May 31 17:14:49 2015
Stack: classic openais (with plugin)
Current DC: node1.blue.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ node1.blue.com node2.blue.com ]
myip (ocf::heartbeat:IPaddr): Started node1.blue.com
myserver (lsb:mysqld): Started node1.blue.com
webstore (ocf::heartbeat:Filesystem): Started node1.blue.com
在NFS服务器上面做mariadb的测试:
# yum install -y mysql
# mysql -uroot -h172.16.8.200 -p
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
| testdb |
+--------------------+
5 rows in set (0.02 sec)
之前创建的 testdb 都依然显示;
现在把资源时运作在NODE1上面的;
把NODE1变成备用服务;查看mariadb是否会受到影响;
NODE1:
# crm node standby
服务器依然正常运行
把node1 变成在线状态:
# crm node online
查看状态:
# crm status
Last updated: Sun May 31 17:21:30 2015
Last change: Sun May 31 17:21:25 2015
Stack: classic openais (with plugin)
Current DC: node1.blue.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured
Node node1.blue.com: standby
Online: [ node2.blue.com ]
myip (ocf::heartbeat:IPaddr): Started node2.blue.com
myserver (lsb:mysqld): Started node2.blue.com
webstore (ocf::heartbeat:Filesystem): Started node2.blue.com
此时节点就运行在node2上面.
Mariadb 服务依然正常运行, 此时一个高可用的负载均衡的maridb服务就完成了;
此时的 NFS 容易成为此服务的瓶颈所在!!