Evmd进程不断重启原因深究

客户现场环境,全国性7*24小时在线系统,HP-ux B11.31 for Oracle 10.2.0.4 RAC双节点集群。

客户于6月7日做了主机层面硬件变更,数据库未做任何参数更改。启动起来后发现evmd进程一直在不断重启。evmd:事件管理守护进程。此进程还负责启动racgevt进程以管理FAN服务器端调用。各log也都有相应体现;

ocssd进程log:

[    CSSD]2016-06-07 04:02:28.853 >USER:    Copyright 2016, Oracle version 10.2.0.4.0

[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=essdb3DBG_CSSD))

[    CSSD]2016-06-07 04:02:28.853 >USER:    CSS daemon log for node essdb3, number 1, in cluster crs

[    CSSD]2016-06-07 04:02:28.868 [1] >TRACE:   clssscmain: local-only set to false

[    CSSD]2016-06-07 04:02:28.876 [1] >TRACE:   clssnmReadNodeInfo: added node 1 (essdb3) to cluster

[    CSSD]2016-06-07 04:02:28.881 [1] >TRACE:   clssnmReadNodeInfo: added node 2 (essdb4) to cluster

[    CSSD]2016-06-07 04:02:28.887 [5] >TRACE:   clssnm_skgxninit: initialized skgxn version (2/0/Hewlett-Packard SKGXN 2.0

)

[    CSSD]2016-06-07 04:02:38.904 [1] >TRACE:   clssnmNMInitialize: misscount set to (600)

[    CSSD]2016-06-07 04:02:38.905 [1] >TRACE:   clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 300000 ms, reconfig start (misscount) 600000 ms

[    CSSD]2016-06-07 04:02:38.908 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (0//dev/essdb2vg1/rdb2vg1_512_votk)

[    CSSD]2016-06-07 04:02:38.909 [6] >TRACE:   clssnmvDPT: spawned for disk 0 (/dev/essdb2vg1/rdb2vg1_512_votk)

[    CSSD]2016-06-07 04:02:38.910 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (1//dev/essdb2vg2/rdb2vg2_512_votk)

[    CSSD]2016-06-07 04:02:38.911 [7] >TRACE:   clssnmvDPT: spawned for disk 1 (/dev/essdb2vg2/rdb2vg2_512_votk)

[    CSSD]2016-06-07 04:02:38.915 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (2//dev/essdb2vg3/rdb2vg3_512_votk)

[    CSSD]2016-06-07 04:02:38.915 [8] >TRACE:   clssnmvDPT: spawned for disk 2 (/dev/essdb2vg3/rdb2vg3_512_votk)

[    CSSD]2016-06-07 04:02:40.926 [6] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (0//dev/essdb2vg1/rdb2vg1_512_votk)

[    CSSD]2016-06-07 04:02:40.927 [9] >TRACE:   clssnmvKillBlockThread: spawned for disk 0 (/dev/essdb2vg1/rdb2vg1_512_votk) initial sleep interval (1000)ms

[    CSSD]2016-06-07 04:02:40.927 [6] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(6) LATS(1433998) Disk lastSeqNo(6)

[    CSSD]2016-06-07 04:02:40.928 [7] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (1//dev/essdb2vg2/rdb2vg2_512_votk)

[    CSSD]2016-06-07 04:02:40.929 [10] >TRACE:   clssnmvKillBlockThread: spawned for disk 1 (/dev/essdb2vg2/rdb2vg2_512_votk) initial sleep interval (1000)ms

[    CSSD]2016-06-07 04:02:40.930 [7] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(6) LATS(1434001) Disk lastSeqNo(6)

[    CSSD]2016-06-07 04:02:40.932 [8] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (2//dev/essdb2vg3/rdb2vg3_512_votk)

[    CSSD]2016-06-07 04:02:40.933 [11] >TRACE:   clssnmvKillBlockThread: spawned for disk 2 (/dev/essdb2vg3/rdb2vg3_512_votk) initial sleep interval (1000)ms

[    CSSD]2016-06-07 04:02:40.934 [8] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(6) LATS(1434005) Disk lastSeqNo(6)

[    CSSD]2016-06-07 04:02:40.943 [1] >TRACE:   clssnmFatalInit: fatal mode enabled

[    CSSD]2016-06-07 04:02:40.976 [13] >TRACE:   clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49895))


[    CSSD]2016-06-07 04:02:40.977 [13] >TRACE:   clssnmClusterListener: Probing node essdb4 (2), probcon(6000000000251ed0)

[    CSSD]2016-06-07 04:02:40.978 [13] >TRACE:   clssnmConnComplete: MSGSRC 2, type 6, node 2, flags 0x0001, con 6000000000251ed0, probe 6000000000251ed0

[    CSSD]2016-06-07 04:02:40.978 [13] >TRACE:   clssnmConnComplete: node 2, essdb4, con(6000000000251ed0), probcon(6000000000251ed0), ninfcon(0000000000000000), node unique 1465243343, prev unique 0, msg unique 1465243343 node state 0

[    CSSD]2016-06-07 04:02:40.978 [13] >TRACE:   clssnmConnComplete: connected to node 2 (con 6000000000251ed0), ninfcon (6000000000251ed0), state (0), flag (9229)

[    CSSD]2016-06-07 04:02:40.982 [14] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))

[    CSSD]2016-06-07 04:02:40.982 [14] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_essdb3_crs))

[    CSSD]2016-06-07 04:02:40.985 [17] >TRACE:   clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=27)(HOST=192.168.1.3)(PORT=49894))

[    CSSD]2016-06-07 04:02:43.568 [13] >TRACE:   clssnmHandleSync: diskTimeout set to (597000)ms

[    CSSD]2016-06-07 04:02:43.568 [20] >TRACE:   clssnmRcfgMgrThread: initial lastleader(2) unique(1465243343)

[    CSSD]2016-06-07 04:02:43.568 [13] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[essdb4] seq[1] sync[1]

[    CSSD]2016-06-07 04:02:43.569 [13] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(1)

[    CSSD]2016-06-07 04:02:43.570 [13] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2016-06-07 04:02:43.570 [13] >TRACE:   clssnmUpdateNodeState: node 1, state (1/3) unique (1465243348/1465243348) prevConuni(0) birth (0/1) (old/new)

[    CSSD]2016-06-07 04:02:43.570 [13] >TRACE:   clssnmUpdateNodeState: node 2, state (1/3) unique (1465243343/1465243343) prevConuni(0) birth (0/1) (old/new)

[    CSSD]2016-06-07 04:02:43.570 [13] >USER:    clssnmHandleUpdate: SYNC(1) from node(2) completed

[    CSSD]2016-06-07 04:02:43.570 [13] >USER:    clssnmHandleUpdate: NODE 1 (essdb3) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2016-06-07 04:02:43.570 [13] >USER:    clssnmHandleUpdate: NODE 2 (essdb4) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2016-06-07 04:02:43.570 [13] >TRACE:   clssnmHandleUpdate: diskTimeout set to (597000)ms

[    CSSD]2016-06-07 04:02:43.620 [1] >USER:    NMEVENT_SUSPEND [00][00][00][00]

[    CSSD]2016-06-07 04:02:43.622 [21] >TRACE:   clssgmReconfigThread:  started for reconfig (1)

[    CSSD]2016-06-07 04:02:43.622 [21] >USER:    NMEVENT_RECONFIG [00][00][00][06]

[    CSSD]2016-06-07 04:02:43.622 [21] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 1

[    CSSD]2016-06-07 04:02:43.624 [17] >TRACE:   clssgmInitialRecv: (6000000000ad6ed0) accepted a new connection from node 2 born at 1 active (2, 2), vers (10,3,1,2)

[    CSSD]2016-06-07 04:02:43.624 [17] >TRACE:   clssgmInitialRecv: conns done (2/2)

[    CSSD]2016-06-07 04:02:43.624 [21] >TRACE:   clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)

[    CSSD]2016-06-07 04:02:43.624 [21] >TRACE:   clssgmChangeMasterNode: requeued 0 RPCs

[    CSSD]2016-06-07 04:02:43.624 [21] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2016-06-07 04:02:43.624 [21] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 2 nodes


[    CSSD]CLSS-3001: local node number 1, master node number 1


[    CSSD]2016-06-07 04:02:43.631 [21] >TRACE:   clssgmReconfigThread:  completed for reconfig(1), with status(1)

[    CSSD]2016-06-07 04:02:43.751 [14] >TRACE:   clssgmCommonAddMember: clsomon joined (1/0x1000000/#CSS_CLSSOMON)

[    CSSD]2016-06-07 04:08:21.574 [17] >WARNING: clssgmAssignMemberNo(): grock(CRSDMAIN) memberNo(2) already assigned



evmd日志:

2016-06-07 04:02:51.086: [    EVMD][1] EVMD Starting

2016-06-07 04:02:51.086: [    EVMD][1] 

Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2007, Oracle.  All rights reserved

2016-06-07 04:02:51.086: [    EVMD][1] Initializing OCR

2016-06-07 04:02:51.108: [    EVMD][1] Active Version from OCR:10.2.0.4.0

2016-06-07 04:02:51.108: [    EVMD][1] Active Version and Software Version are same

2016-06-07 04:02:51.108: [    EVMD][1] Initializing Diagnostics Settings

2016-06-07 04:02:51.112: [    EVMD][1] ENV Logging level for Module: allcomp  0

2016-06-07 04:02:51.115: [    EVMD][1] ENV Logging level for Module: default  0

2016-06-07 04:02:51.117: [    EVMD][1] ENV Logging level for Module: COMMCRS  0

2016-06-07 04:02:51.119: [    EVMD][1] ENV Logging level for Module: COMMNS  0

2016-06-07 04:02:51.122: [    EVMD][1] ENV Logging level for Module: EVMD  0

2016-06-07 04:02:51.125: [    EVMD][1] ENV Logging level for Module: EVMDMAIN  0

2016-06-07 04:02:51.127: [    EVMD][1] ENV Logging level for Module: EVMCOMM  0

2016-06-07 04:02:51.130: [    EVMD][1] ENV Logging level for Module: EVMEVT  0

2016-06-07 04:02:51.132: [    EVMD][1] ENV Logging level for Module: EVMAPP  0

2016-06-07 04:02:51.133: [    EVMD][1] ENV Logging level for Module: EVMAGENT  0

2016-06-07 04:02:51.135: [    EVMD][1] ENV Logging level for Module: CRSOCR  0

2016-06-07 04:02:51.136: [    EVMD][1] ENV Logging level for Module: CLUCLS  0

2016-06-07 04:02:51.138: [    EVMD][1] ENV Logging level for Module: OCRRAW  0

2016-06-07 04:02:51.141: [    EVMD][1] ENV Logging level for Module: OCROSD  0

2016-06-07 04:02:51.142: [    EVMD][1] ENV Logging level for Module: OCRAPI  0

2016-06-07 04:02:51.144: [    EVMD][1] ENV Logging level for Module: OCRUTL  0

2016-06-07 04:02:51.146: [    EVMD][1] ENV Logging level for Module: OCRMSG  0

2016-06-07 04:02:51.148: [    EVMD][1] ENV Logging level for Module: OCRCLI  0

2016-06-07 04:02:51.149: [    EVMD][1] ENV Logging level for Module: CSSCLNT  0

2016-06-07 04:02:51.150: [    EVMD][1] Creating pidfile  /oracle/product/10.2/crs/evm/init/essdb3.pid

[  clsdmt][4]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=essdb3DBG_EVMD))

2016-06-07 04:02:51.162: [    EVMD][1] Authorization database built successfully.

2016-06-07 04:02:53.370: [  EVMEVT][1][ENTER] EVM Listening on: 160045173

2016-06-07 04:02:53.373: [ COMMCRS][6]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49897))


2016-06-07 04:02:53.393: [  EVMAPP][1] EVMD Started

2016-06-07 04:02:53.393: [ COMMCRS][9]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49898))


2016-06-07 04:02:53.396: [  EVMEVT][12] Default port busy, listening at (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.3)(PORT=49905))

2016-06-07 04:02:53.397: [  OCRCLI][12]proac_set_value: Response message returned with failure keyname = [SYSTEM.evm.cevmport.essdb3], retcode = 5

2016-06-07 04:02:53.397: [  EVMEVT][12][PANIC] EVMD exiting. Couldn't update OCR key rc = 5

2016-06-07 04:02:53.397: [    EVMD][12][PANIC] EVMD exiting.

2016-06-07 04:02:53.398: [    EVMD][12] Done.


2016-06-07 04:02:54.500: [    EVMD][1] EVMD Starting

2016-06-07 04:02:54.501: [    EVMD][1] 

Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2007, Oracle.  All rights reserved

2016-06-07 04:02:54.501: [    EVMD][1] Initializing OCR

2016-06-07 04:02:54.522: [    EVMD][1] Active Version from OCR:10.2.0.4.0

2016-06-07 04:02:54.522: [    EVMD][1] Active Version and Software Version are same

2016-06-07 04:02:54.523: [    EVMD][1] Initializing Diagnostics Settings

2016-06-07 04:02:54.526: [    EVMD][1] ENV Logging level for Module: allcomp  0

2016-06-07 04:02:54.528: [    EVMD][1] ENV Logging level for Module: default  0

2016-06-07 04:02:54.530: [    EVMD][1] ENV Logging level for Module: COMMCRS  0

2016-06-07 04:02:54.531: [    EVMD][1] ENV Logging level for Module: COMMNS  0

2016-06-07 04:02:54.533: [    EVMD][1] ENV Logging level for Module: EVMD  0

2016-06-07 04:02:54.534: [    EVMD][1] ENV Logging level for Module: EVMDMAIN  0

2016-06-07 04:02:54.536: [    EVMD][1] ENV Logging level for Module: EVMCOMM  0

2016-06-07 04:02:54.539: [    EVMD][1] ENV Logging level for Module: EVMEVT  0

2016-06-07 04:02:54.541: [    EVMD][1] ENV Logging level for Module: EVMAPP  0

2016-06-07 04:02:54.543: [    EVMD][1] ENV Logging level for Module: EVMAGENT  0

2016-06-07 04:02:54.544: [    EVMD][1] ENV Logging level for Module: CRSOCR  0

2016-06-07 04:02:54.546: [    EVMD][1] ENV Logging level for Module: CLUCLS  0

2016-06-07 04:02:54.548: [    EVMD][1] ENV Logging level for Module: OCRRAW  0

2016-06-07 04:02:54.549: [    EVMD][1] ENV Logging level for Module: OCROSD  0

2016-06-07 04:02:54.551: [    EVMD][1] ENV Logging level for Module: OCRAPI  0

2016-06-07 04:02:54.552: [    EVMD][1] ENV Logging level for Module: OCRUTL  0

2016-06-07 04:02:54.554: [    EVMD][1] ENV Logging level for Module: OCRMSG  0

2016-06-07 04:02:54.556: [    EVMD][1] ENV Logging level for Module: OCRCLI  0

2016-06-07 04:02:54.558: [    EVMD][1] ENV Logging level for Module: CSSCLNT  0

2016-06-07 04:02:54.559: [    EVMD][1] Creating pidfile  /oracle/product/10.2/crs/evm/init/essdb3.pid

[  clsdmt][4]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=essdb3DBG_EVMD))

2016-06-07 04:02:54.574: [    EVMD][1] Authorization database built successfully.

2016-06-07 04:02:56.731: [  EVMEVT][1][ENTER] EVM Listening on: 160045173

2016-06-07 04:02:56.734: [ COMMCRS][6]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49897))


2016-06-07 04:02:56.759: [ COMMCRS][9]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49898))


2016-06-07 04:02:56.759: [  EVMAPP][1] EVMD Started

2016-06-07 04:02:56.765: [  EVMEVT][12] Default port busy, listening at (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.3)(PORT=49907))

2016-06-07 04:02:56.767: [  OCRCLI][12]proac_set_value: Response message returned with failure keyname = [SYSTEM.evm.cevmport.essdb3], retcode = 5

2016-06-07 04:02:56.767: [  EVMEVT][12][PANIC] EVMD exiting. Couldn't update OCR key rc = 5

2016-06-07 04:02:56.767: [    EVMD][12][PANIC] EVMD exiting.

2016-06-07 04:02:56.768: [    EVMD][12] Done.


alert日志:

2016-06-07 04:02:44.208

[evmd(9218)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:02:47.748

[evmd(9682)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:02:51.151

[evmd(9788)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:02:54.559

[evmd(9894)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:02:57.988

[evmd(10005)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:03:01.329

[evmd(10111)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:03:04.720

[evmd(10222)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:03:08.117

[evmd(10339)]CRS-1401:EVMD started on node essdb3.

2016-06-07 04:03:11.435

[evmd(10447)]CRS-1401:EVMD started on node essdb3.


主机的CPU资源也使用的特别紧张,基本上是evmlogger进程占用的top 5

Evmd进程不断重启原因深究_第1张图片

怀疑是OCR方面的问题,试着看看权限

essdb3_oracle[/oraclelog/admin/essdb2/bdump]$ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     524044
         Used space (kbytes)      :       4372
         Available space (kbytes) :     519672
         ID                       : 1612984325
         Device/File Name         : /dev/essdb2vg1/rdb2vg1_512_ocr
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/essdb2vg2/rdb2vg2_512_ocr
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

essdb3_oracle[/oraclelog/admin/essdb2/bdump]$ll /dev/essdb2vg1/rdb2vg1_512_ocr
crw-r-----   1 root       oinstall   128 0x0200bc 2011年5月18日 /dev/essdb2vg1/rdb2vg1_512_ocr
essdb3_oracle[/oraclelog/admin/essdb2/bdump]$ll /dev/essdb2vg2/rdb2vg2_512_ocr
crw-r-----   1 root       oinstall   128 0x0300bc 2011年5月18日 /dev/essdb2vg2/rdb2vg2_512_ocr


通过分析essdb3节点的evmd日志发现evmd进程一直无法启动反复出现如下evmd进程默认端口被占用错误信息,如下:

2016-06-07 04:02:56.734: [ COMMCRS][6]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49897))

2016-06-07 04:02:56.759: [ COMMCRS][9]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49898))

日志信息Oracle Bug 8200527描述一致,BUG 8200527 信息,如下:

Evmd进程不断重启原因深究_第2张图片


MOS文档 EVMD Coredumps When Starting Up (Doc ID 1091803.1)  信息,有下面的建议:

Evmd进程不断重启原因深究_第3张图片

检查占用端口49897,49898的进程,并将集群重启。evmd恢复正常,不在重启;




你可能感兴趣的:(Evmd进程不断重启原因深究)