前期发现的M5000服务器数据库集群存在压力分配不均的问题,经过排查分析,此问题目前已解决,节点二已经可以通过scanip 创建新连接,
现将问题解决过程汇报如下:
问题现象:
1. 巡检中发现rac集群两节点分配的会话连接数相差很大,大部分会话均创建在第一节点,第二节点不能通过scanip分配新会话;
修改前,各个实例分配的会话数如下:
INST_ID COUNT(USERNAME)
-------------------------
2 85
1 1463
2. 查看集群服务进程均显示正常,本地监听与远程监听服务正常,参数文件配置正常,但是scanip下注册的服务并不包括第二节点监听信息;
查看监听注册信息如下:
-bash-3.00$lsnrctl status listener_scan1
LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:10
Copyright(c) 1991, 2010, Oracle. All rights reserved.
Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUSof the LISTENER
------------------------
AliasLISTENER_SCAN1
VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production
StartDate 16-OCT-2013 14:54:42
Uptime15 days 1 hr. 13 min. 28 sec
TraceLevel off
SecurityON: Local OS Authentication
SNMP OFF
ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora
ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/ecsyhdb2/listener_scan1/alert/log.xml
ListeningEndpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))
ServicesSummary...
Service "orcl"has 1 instance(s).
Instance "orcl1",status READY, has 1 handler(s) for this service...
Thecommand completed successfully
----查询信息显示,只注册了orcl1服务,而没有orcl2 服务。
解决过程
1. 经过分析节点二监听信息,没有注册进scan服务,属于oracle rac 的一个bug,相关bug号为:13066936;
2.将远程监听注册手工重新注册一次,以绕过此bug,经过修改,节点二监听信息注册成功;
1. show parameter remote_listener
2. alter system set remote_listener='';
3. alter system register;
4. alter system setremote_listener='db-cluster-scan:1525';
5. alter system register;
注意:执行第二条命令,将取消各节点在scan下的注册信息,此时查看scan下的服务将显示为空,因此建议在修改监听之前,先准备好脚本,争取快速解决,以防程序段出现异常。在执行完第五条命令后,再次查看scan 下的服务,可以查看到两个节点的监听信息均已注册到scanip下
执行第二条命令后的查看scan状态:
-bash-3.00$lsnrctl status listener_scan1
LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:56
Copyright(c) 1991, 2010, Oracle. All rightsreserved.
Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUSof the LISTENER
------------------------
Alias LISTENER_SCAN1
Version TNSLSNR for Solaris: Version11.2.0.2.0 - Production
StartDate 16-OCT-2013 14:54:42
Uptime 15 days 1 hr. 14 min. 15sec
TraceLevel off
Security ON: Local OS Authentication
SNMP OFF
ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora
ListenerLog File /opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml
ListeningEndpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))
Thelistener supports no services
Thecommand completed successfully
修改后,两节点会话数查询:
INST_ID COUNT(USERNAME)
-------------------------
2 343
1 1201
----------此时查看第二节点的会话数已经有明显的上升。
查看scanip状态
-bash-3.00$lsnrctl status listener_scan1
LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:09:09
Copyright(c) 1991, 2010, Oracle. All rights reserved.
Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUSof the LISTENER
------------------------
AliasLISTENER_SCAN1
VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production
StartDate 16-OCT-2013 14:54:42
Uptime15 days 1 hr. 14 min. 27 sec
TraceLevel off
SecurityON: Local OS Authentication
SNMPOFF
ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora
ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml
ListeningEndpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))
ServicesSummary...
Service "orcl"has 2 instance(s).
Instance "orcl1",status READY, has 1 handler(s) for this service...
Instance "orcl2",status READY, has 1 handler(s) for this service...
Thecommand completed successfully
-------------重新注册监听后,查看scan下的注册服务,已经可以看到两个节点的信息。
因第二节点监听是新注册,所以节点二建立的会话数仍然较少,随着运行时间的增加,两节点的会话数将逐渐趋于平衡。
问题原因:
可能是scan listener 之前发生过failover,实例重启后由于oracle的bug导致实例不被注册上。
以下内容为oracle官方文档对此问题的描述:
Description
On migration of remote listener from one node to another, forexample during
node eviction (failover), the database does not re-register with the listener
as it does not receive any EOF.
As a result the database keeps listening on the same socket whenideally
it should re-register.
Rediscovery Notes:
Instance does not register serviceswhen scan fails over
Workaround
alter system set remote_listener ='';
alter system set remote_listener ='
Note:
This fix is just one piece of a largersolution needing other fixes
so that the clusterware / node monitorcan trigger a re-register as
required.
References
Bug:13066936 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article
源文档 <https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=89966933432507&id=13066936.8&_afrWindowMode=0&_adf.ctrl-state=v9ps5dfil_4>