前期发现的M5000服务器数据库集群存在压力分配不均的问题,经过排查分析,此问题目前已解决,节点二已经可以通过scanip 创建新连接,

现将问题解决过程汇报如下:

问题现象:

1. 巡检中发现rac集群两节点分配的会话连接数相差很大,大部分会话均创建在第一节点,第二节点不能通过scanip分配新会话;

修改前,各个实例分配的会话数如下:

INST_ID COUNT(USERNAME)

-------------------------

2 85

1 1463


2. 查看集群服务进程均显示正常,本地监听与远程监听服务正常,参数文件配置正常,但是scanip下注册的服务并不包括第二节点监听信息;

查看监听注册信息如下:


-bash-3.00$lsnrctl status listener_scan1

LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:10

Copyright(c) 1991, 2010, Oracle. All rights reserved.

Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))

STATUSof the LISTENER

------------------------

AliasLISTENER_SCAN1

VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production

StartDate 16-OCT-2013 14:54:42

Uptime15 days 1 hr. 13 min. 28 sec

TraceLevel off

SecurityON: Local OS Authentication

SNMP OFF

ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora

ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/ecsyhdb2/listener_scan1/alert/log.xml

ListeningEndpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))

ServicesSummary...

Service "orcl"has 1 instance(s).

Instance "orcl1",status READY, has 1 handler(s) for this service...

Thecommand completed successfully

----查询信息显示,只注册了orcl1服务,而没有orcl2 服务。

解决过程

1. 经过分析节点二监听信息,没有注册进scan服务,属于oracle rac 的一个bug,相关bug号为:13066936

2.将远程监听注册手工重新注册一次,以绕过此bug,经过修改,节点二监听信息注册成功;


1. show parameter remote_listener

2. alter system set remote_listener='';

3. alter system register;

4. alter system setremote_listener='db-cluster-scan:1525';

5. alter system register;


注意:执行第二条命令,将取消各节点在scan下的注册信息,此时查看scan下的服务将显示为空,因此建议在修改监听之前,先准备好脚本,争取快速解决,以防程序段出现异常。在执行完第五条命令后,再次查看scan 下的服务,可以查看到两个节点的监听信息均已注册到scanip


执行第二条命令后的查看scan状态:


-bash-3.00$lsnrctl status listener_scan1


LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:08:56


Copyright(c) 1991, 2010, Oracle. All rightsreserved.


Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))

STATUSof the LISTENER

------------------------

Alias LISTENER_SCAN1

Version TNSLSNR for Solaris: Version11.2.0.2.0 - Production

StartDate 16-OCT-2013 14:54:42

Uptime 15 days 1 hr. 14 min. 15sec

TraceLevel off

Security ON: Local OS Authentication

SNMP OFF

ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora

ListenerLog File /opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml

ListeningEndpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))

Thelistener supports no services

Thecommand completed successfully


修改后,两节点会话数查询:

INST_ID COUNT(USERNAME)

-------------------------

2 343

1 1201

----------此时查看第二节点的会话数已经有明显的上升。


查看scanip状态

-bash-3.00$lsnrctl status listener_scan1

LSNRCTLfor Solaris: Version 11.2.0.2.0 - Production on 31-OCT-2013 16:09:09

Copyright(c) 1991, 2010, Oracle. All rights reserved.

Connectingto (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))

STATUSof the LISTENER

------------------------

AliasLISTENER_SCAN1

VersionTNSLSNR for Solaris: Version 11.2.0.2.0 - Production

StartDate 16-OCT-2013 14:54:42

Uptime15 days 1 hr. 14 min. 27 sec

TraceLevel off

SecurityON: Local OS Authentication

SNMPOFF

ListenerParameter File /opt/grid/app/11.2.0/grid/network/admin/listener.ora

ListenerLog File/opt/grid/app/11.2.0/grid/log/diag/tnslsnr/db2/listener_scan1/alert/log.xml

ListeningEndpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.110)(PORT=1525)))

ServicesSummary...

Service "orcl"has 2 instance(s).

Instance "orcl1",status READY, has 1 handler(s) for this service...

Instance "orcl2",status READY, has 1 handler(s) for this service...

Thecommand completed successfully

-------------重新注册监听后,查看scan下的注册服务,已经可以看到两个节点的信息。

因第二节点监听是新注册,所以节点二建立的会话数仍然较少,随着运行时间的增加,两节点的会话数将逐渐趋于平衡。

问题原因:

可能是scan listener 之前发生过failover,实例重启后由于oraclebug导致实例不被注册上。


以下内容为oracle官方文档对此问题的描述:

Description

On migration of remote listener from one node to another, forexample during
node eviction (failover), the database does not re-register with the listener
as it does not receive any EOF.
As a result the database keeps listening on the same socket whenideally
it should re-register.

Rediscovery Notes:
Instance does not register serviceswhen scan fails over

Workaround
alter system set remote_listener ='';
alter system set remote_listener =':';

Note:
This fix is just one piece of a largersolution needing other fixes
so that the clusterware / node monitorcan trigger a re-register as
required.

References

Bug:13066936 (This link will only work for PUBLISHED bugs)

Note:245840.1 Information on the sections in this article



源文档 <https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=89966933432507&id=13066936.8&_afrWindowMode=0&_adf.ctrl-state=v9ps5dfil_4>