版本:
[root@node2 ~]# cat /proc/version
Linux version 2.6.32-220.el6.x86_64 (
[email protected]) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011
现象:
基于rhel6.2的apache集群Resouce Group(RG) haweb启动失败:
[root@node1 cluster]# clusvcadm -e haweb -F
Local machine trying to enable service:haweb...Failure
解决过程:
查看rgmanager.log基本找不到有用信息,但是找到了一个之前不知道的很有用的命令rg_test(ref1):
[root@node1 cluster]# cat rgmanager.log
Jun 18 20:37:11 rgmanager [clusterfs] mounting /dev/dm-5 on /var/www/html
Jun 18 20:37:11 rgmanager [clusterfs] mount -t gfs2 /dev/dm-5 /var/www/html
Jun 18 20:37:12 rgmanager [apache] Verifying Configuration Of apache:haweb
Jun 18 20:37:12 rgmanager [apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf
Jun 18 20:37:12 rgmanager [apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
Jun 18 20:37:12 rgmanager [apache] Monitoring Service apache:haweb
Jun 18 20:37:12 rgmanager [apache] Checking Existence Of File /var/run/cluster/apache/apache:haweb.pid [apache:haweb] > Failed
Jun 18 20:37:12 rgmanager [apache] Monitoring Service apache:haweb > Service Is Not Running
Jun 18 20:37:12 rgmanager [apache] Starting Service apache:haweb
Jun 18 20:37:12 rgmanager [apache] Looking For IP Addresses
Jun 18 20:37:12 rgmanager [apache] 1 IP addresses found for haweb/haweb
Jun 18 20:37:13 rgmanager [apache] Looking For IP Addresses > Succeed - IP Addresses Found
Jun 18 20:37:13 rgmanager [apache] Checking: SHA1 checksum of config file /apache/apache:haweb/httpd.conf
Jun 18 20:37:13 rgmanager [apache] Checking: SHA1 checksum > succeed
Jun 18 20:37:13 rgmanager [apache] Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf
Jun 18 20:37:13 rgmanager [apache] Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed
Jun 18 20:37:13 rgmanager [apache] Starting Service apache:haweb > Failed
Jun 18 20:37:13 rgmanager [ip] 172.16.20.50/24 is not configured
Jun 18 20:37:13 rgmanager [apache] Verifying Configuration Of apache:haweb
Jun 18 20:37:13 rgmanager [apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf
Jun 18 20:37:14 rgmanager [apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
Jun 18 20:37:14 rgmanager [apache] Stopping Service apache:haweb
Jun 18 20:37:14 rgmanager [apache] Checking Existence Of File /var/run/cluster/apache/apache:haweb.pid [apache:haweb] > Failed - File Doesn't Exist
Jun 18 20:37:14 rgmanager [apache] Stopping Service apache:haweb > Succeed
Jun 18 20:37:14 rgmanager [clusterfs] unmounting /var/www/html
查看了rg_test的输出终于有了点眉目:
[root@node1 cluster]# rg_test test /etc/cluster/cluster.conf start service haweb
Running in test mode.
Loading resource rule from /usr/share/cluster/openldap.sh
Loading resource rule from /usr/share/cluster/fs.sh
Loading resource rule from /usr/share/cluster/svclib_nfslock
Loading resource rule from /usr/share/cluster/ip.sh
Loading resource rule from /usr/share/cluster/SAPDatabase
Loading resource rule from /usr/share/cluster/orainstance.sh
Loading resource rule from /usr/share/cluster/samba.sh
Loading resource rule from /usr/share/cluster/named.sh
Loading resource rule from /usr/share/cluster/mysql.sh
Loading resource rule from /usr/share/cluster/vm.sh
Loading resource rule from /usr/share/cluster/ocf-shellfuncs
Loading resource rule from /usr/share/cluster/checkquorum
Loading resource rule from /usr/share/cluster/nfsexport.sh
Loading resource rule from /usr/share/cluster/script.sh
Loading resource rule from /usr/share/cluster/oracledb.sh
Loading resource rule from /usr/share/cluster/lvm_by_lv.sh
Loading resource rule from /usr/share/cluster/service.sh
Loading resource rule from /usr/share/cluster/apache.sh
Loading resource rule from /usr/share/cluster/lvm.sh
Loading resource rule from /usr/share/cluster/oralistener.sh
Loading resource rule from /usr/share/cluster/lvm_by_vg.sh
Loading resource rule from /usr/share/cluster/tomcat-6.sh
Loading resource rule from /usr/share/cluster/ASEHAagent.sh
Loading resource rule from /usr/share/cluster/nfsclient.sh
Loading resource rule from /usr/share/cluster/clusterfs.sh
Loading resource rule from /usr/share/cluster/SAPInstance
Loading resource rule from /usr/share/cluster/nfsserver.sh
Loading resource rule from /usr/share/cluster/netfs.sh
Loading resource rule from /usr/share/cluster/postgres-8.sh
Loading resource rule from /usr/share/cluster/fence_scsi_check.pl
Starting haweb...
<info> mounting /dev/dm-5 on /var/www/html
[clusterfs] mounting /dev/dm-5 on /var/www/html
<err> mount -t gfs2 /dev/dm-5 /var/www/html
[clusterfs] mount -t gfs2 /dev/dm-5 /var/www/html
<debug> Verifying Configuration Of apache:haweb
[apache] Verifying Configuration Of apache:haweb
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
<debug> Monitoring Service apache:haweb
[apache] Monitoring Service apache:haweb
<error> Checking Existence Of File /var/run/cluster/apache/apache:haweb.pid [apache:haweb] > Failed
[apache] Checking Existence Of File /var/run/cluster/apache/apache:haweb.pid [apache:haweb] > Failed
<error> Monitoring Service apache:haweb > Service Is Not Running
[apache] Monitoring Service apache:haweb > Service Is Not Running
<info> Starting Service apache:haweb
[apache] Starting Service apache:haweb
<debug> Looking For IP Addresses
[apache] Looking For IP Addresses
<debug> 1 IP addresses found for haweb/haweb
[apache] 1 IP addresses found for haweb/haweb
<debug> Looking For IP Addresses > Succeed - IP Addresses Found
[apache] Looking For IP Addresses > Succeed - IP Addresses Found
<debug> Checking: SHA1 checksum of config file /apache/apache:haweb/httpd.conf
[apache] Checking: SHA1 checksum of config file /apache/apache:haweb/httpd.conf
<debug> Checking: SHA1 checksum > succeed
[apache] Checking: SHA1 checksum > succeed
<debug> Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf
[apache] Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf
<debug> Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed
[apache] Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed
(99)Cannot assign requested address: make_sock: could not bind to address 172.16.20.50:80
no listening sockets available, shutting down
Unable to open logs
<error> Starting Service apache:haweb > Failed
[apache] Starting Service apache:haweb > Failed
Failed to start haweb
根据rg_test的输出查得ref2,得知是resource ip没有在apache启动之前创建的原因。
于是调整3个resouce的父子关系,调整前(ip和clusterfs平级,apache是clusterfs的儿子,注意看缩进关系):
[root@node1 ~]# ccs -h node1 --lsservices
service: domain=node1orbust, name=haweb, recovery=relocate
ip: ref=172.16.20.50/24
clusterfs: ref=webdata
apache: ref=haweb
resources:
ip: monitor_link=on, sleeptime=10, disable_rdisc=on, address=172.16.20.50/24
apache: shutdown_wait=5, config_file=conf/httpd.conf, name=haweb, server_root=/etc/httpd
clusterfs: self_fence=on, name=webdata, force_unmount=on, fstype=gfs2, device=/dev/clusteredvg/webdata, mountpoint=/var/www/html, fsid=42848
[root@node1 ~]#
调整后(apache是clusterfs的儿子,clusterfs是ip的儿子,注意看缩进关系):
[root@node1 ~]# ccs -h node1 --lsservices
service: domain=node1orbust, name=haweb, recovery=relocate
ip: ref=172.16.20.50/24
clusterfs: ref=webdata
apache: ref=haweb
resources:
ip: monitor_link=on, sleeptime=10, disable_rdisc=on, address=172.16.20.50/24
apache: shutdown_wait=5, config_file=conf/httpd.conf, name=haweb, server_root=/etc/httpd
clusterfs: self_fence=on, name=webdata, force_unmount=on, fstype=gfs2, device=/dev/clusteredvg/webdata, mountpoint=/var/www/html, fsid=42848
[root@node1 ~]#
再次使用rg_test测试,没有报错:
[root@node1 cluster]# rg_test test /etc/cluster/cluster.conf start service haweb
Running in test mode.
Loading resource rule from /usr/share/cluster/openldap.sh
Loading resource rule from /usr/share/cluster/fs.sh
Loading resource rule from /usr/share/cluster/svclib_nfslock
Loading resource rule from /usr/share/cluster/ip.sh
Loading resource rule from /usr/share/cluster/SAPDatabase
Loading resource rule from /usr/share/cluster/orainstance.sh
Loading resource rule from /usr/share/cluster/samba.sh
Loading resource rule from /usr/share/cluster/named.sh
Loading resource rule from /usr/share/cluster/mysql.sh
Loading resource rule from /usr/share/cluster/vm.sh
Loading resource rule from /usr/share/cluster/ocf-shellfuncs
Loading resource rule from /usr/share/cluster/checkquorum
Loading resource rule from /usr/share/cluster/nfsexport.sh
Loading resource rule from /usr/share/cluster/script.sh
Loading resource rule from /usr/share/cluster/oracledb.sh
Loading resource rule from /usr/share/cluster/lvm_by_lv.sh
Loading resource rule from /usr/share/cluster/service.sh
Loading resource rule from /usr/share/cluster/apache.sh
Loading resource rule from /usr/share/cluster/lvm.sh
Loading resource rule from /usr/share/cluster/oralistener.sh
Loading resource rule from /usr/share/cluster/lvm_by_vg.sh
Loading resource rule from /usr/share/cluster/tomcat-6.sh
Loading resource rule from /usr/share/cluster/ASEHAagent.sh
Loading resource rule from /usr/share/cluster/nfsclient.sh
Loading resource rule from /usr/share/cluster/clusterfs.sh
Loading resource rule from /usr/share/cluster/SAPInstance
Loading resource rule from /usr/share/cluster/nfsserver.sh
Loading resource rule from /usr/share/cluster/netfs.sh
Loading resource rule from /usr/share/cluster/postgres-8.sh
Loading resource rule from /usr/share/cluster/fence_scsi_check.pl
Starting haweb...
<debug> Link for eth1: Detected
[ip] Link for eth1: Detected
<info> Adding IPv4 address 172.16.20.50/24 to eth1
[ip] Adding IPv4 address 172.16.20.50/24 to eth1
<debug> Pinging addr 172.16.20.50 from dev eth1
[ip] Pinging addr 172.16.20.50 from dev eth1
<debug> Sending gratuitous ARP: 172.16.20.50 52:54:00:01:20:01 brd ff:ff:ff:ff:ff:ff
[ip] Sending gratuitous ARP: 172.16.20.50 52:54:00:01:20:01 brd ff:ff:ff:ff:ff:ff
rdisc: no process killed
<debug> /dev/dm-5 already mounted
[clusterfs] /dev/dm-5 already mounted
<debug> Verifying Configuration Of apache:haweb
[apache] Verifying Configuration Of apache:haweb
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
<debug> Monitoring Service apache:haweb
[apache] Monitoring Service apache:haweb
<error> Checking Existence Of File /var/run/cluster/apache/apache:haweb.pid [apache:haweb] > Failed
[apache] Checking Existence Of File /var/run/cluster/apache/apache:haweb.pid [apache:haweb] > Failed
<error> Monitoring Service apache:haweb > Service Is Not Running
[apache] Monitoring Service apache:haweb > Service Is Not Running
<info> Starting Service apache:haweb
[apache] Starting Service apache:haweb
<debug> Looking For IP Addresses
[apache] Looking For IP Addresses
<debug> 1 IP addresses found for haweb/haweb
[apache] 1 IP addresses found for haweb/haweb
<debug> Looking For IP Addresses > Succeed - IP Addresses Found
[apache] Looking For IP Addresses > Succeed - IP Addresses Found
<debug> Checking: SHA1 checksum of config file /apache/apache:haweb/httpd.conf
[apache] Checking: SHA1 checksum of config file /apache/apache:haweb/httpd.conf
<debug> Checking: SHA1 checksum > succeed
[apache] Checking: SHA1 checksum > succeed
<debug> Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf
[apache] Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf
<debug> Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed
[apache] Generating New Config File /apache/apache:haweb/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed
<debug> Starting Service apache:haweb > Succeed
[apache] Starting Service apache:haweb > Succeed
Start of haweb complete
使用clusvcadm命令成功启动RG:
[root@node1 cluster]# clusvcadm -e haweb -F
Local machine trying to enable service:haweb...Success
service:haweb is now running on node1.private.cluster20.example.com
[root@node1 cluster]#
排错总结:
出现这个错误原因在于太相信redhat自己对各类resource启动顺序的判断了,按照service.sh的定义,没有明确写在里面的resource应该是最后启动、最先停止的,在这个例子中,clusterfs和ip在service.sh里,apache未在service.sh定义,所以ip应该在apache启动前就启动了,但事实显然不是这样的。猜测这个问题和我把apache定成clusterfs的子资源有关?clusterfs优先于ip启动,所以其子资源apache也在ip之前启动了,从而报错,这样似乎也说的过去。如果真是这样,看来最保险的做法还是基于资源的依赖关系定义一棵没有兄弟节点的资源树。
[root@node1 cluster]# cat /usr/share/cluster/service.sh
...
<special tag="rgmanager">
<attributes maxinstances="1"/>
<child type="lvm" start="1" stop="9"/>
<child type="fs" start="2" stop="8"/>
<child type="clusterfs" start="3" stop="7"/>
<child type="netfs" start="4" stop="6"/>
<child type="nfsexport" start="5" stop="5"/>
<child type="nfsclient" start="6" stop="4"/>
<child type="ip" start="7" stop="2"/>
<child type="smb" start="8" stop="3"/>
<child type="script" start="9" stop="1"/>
</special>
...
REF:
1. [SOLVED] Cluster Apache start
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=40798&forum=55
2. [保留] redhat cluster suite与apache配置中关于floating ip的问题?
http://www.chinaunix.net/old_jh/4/835696.html