AIX: ORA-29770: LMS0 (OSID 123) is hung for more than 70 seconds in 'gcs remote message'

参考自:

AIX: ORA-29770: LMS0 (OSID 123) is hung for more than 70 seconds in 'gcs remote message' (Doc ID 2237182.1)

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
IBM AIX on POWER Systems (64-bit)

SYMPTOMS

Instance1 terminated by lmhb as LMS0 waiting for "gcs remote message" for too long:

Sun Jan 01 15:14:38 2017
Archived Log entry 33282 added for thread 1 sequence 10837 ID 0xffffffffff7b47ac dest 1:
Sun Jan 01 15:25:02 2017
LMS0 (ospid: 33621970) waits for event 'gcs remote message' for 83 secs. 
Sun Jan 01 15:25:17 2017
Errors in file /u01/app/oracle/diag/rdbms/fcubsprd/FCUBSPRD1/trace/FCUBSPRD1_lmhb_30148200_FCUBSSTB.trc (incident=1184194):
ORA-29770: global enqueue process LMS0 (OSID 33621970) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/fcubsprd/FCUBSPRD1/incident/incdir_1184194/FCUBSPRD1_lmhb_30148200_i1184194.trc
Sun Jan 01 15:25:23 2017
Sweep [inc][1184194]: completed
Sweep [inc2][1184194]: completed
Sun Jan 01 15:25:28 2017
ERROR: Some process(s) is not making progress.
LMHB (ospid: 30148200) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process(s) is not making progress.
LMHB (ospid: 30148200): terminating the instance due to error 29770

LMS0 trace has no entry for the period

 

FCUBSPRD1_lmhb_30148200_FCUBSSTB.trc:

*** 2017-01-01 15:25:02.621
....
LMS0 (ospid: 33621970) has no heartbeats for 86 sec. (threshold 70 sec)
: waiting for event 'gcs remote message' for 83 secs with wait_id 121809912.
===[ Wait Chain ]===
Wait chain is empty.
==============================
Dumping PROCESS LMS0 (ospid: 33621970) States
==============================
===[ System Load State ]===
CPU Total 72 Raw 72 Core 18 Socket -1
Load normal: Cur 2718 Highmark 331776 (10.61 1296.00) 
===[ Latch State ]===
Not in Latch Get
===[ Session State Object ]===
----------------------------------------
SO: 0x7000115b1c5e3f0, type: 4, owner: 0x7000115917f01f0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x7000115917f01f0, name=session, file=ksu.h LINE:12729 ID:, pg=0
(session) sid: 1093 ser: 1 trans: 0x0, creator: 0x7000115917f01f0
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x409) -/-/INC
DID: , short-term DID:
txn branch: 0x0
edition#: 0 oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'gcs remote message'
waittime=0x1e, poll=0x0, event=0x0
wait_id=121809912 seq_num=45901 snap_id=1
wait times: snap=1 min 23 sec, exc=1 min 23 sec, total=1 min 23 sec

....

*** 2017-01-01 15:25:02.624
Process diagnostic dump for oracle@padc2dbs01 (LMS0), OS id=33621970,
pid: 13, proc_ser: 1, sid: 1093, sess_ser: 1
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 15:25:02 ]
NOTE: scheduling delay has not been sampled for 0.323812 secs 0.000000 secs from [ 15:24:58 - 15:25:03 ], 5 sec avg
0.000000 secs from [ 15:24:02 - 15:25:03 ], 1 min avg
0.003528 secs from [ 15:20:03 - 15:25:03 ], 5 min avg
loadavg : 10.62 11.33 11.31
swap info: free_mem = 34902.60M rsv = 596.00M
alloc = 2533.36M avail = 152576.00M swap_free = 150042.64M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
240001 A oracle 33621970 1 0 39 20 af39ae590 249848 Dec 19 - 181:11 ora_lms0_FCUBSPRD1

*** 2017-01-01 15:25:07.637
Short stack dump: ORA-32516: cannot wait for process 'Unix process pid: 33621970, image: oracle@padc2dbs01 (LMS0)' to finish executing ORADEBUG command 'SHORT_STACK'; wait time exceeds 4940 ms 

-------------------------------------------------------------------------------
Process diagnostic dump actual duration=5.012000 sec
(max dump time=5.000000 sec)

....
*** 2017-01-01 15:25:17.697
==============================
LMS0 (ospid: 33621970) has not moved for 101 sec (1483277117.1483277016)
Incident 1184194 created, dump file: /u01/app/oracle/diag/rdbms/fcubsprd/FCUBSPRD1/incident/incdir_1184194/FCUBSPRD1_lmhb_30148200_i1184194.trc
ORA-29770: global enqueue process LMS0 (OSID 33621970) is hung for more than 70 seconds

ORA-32515: cannot issue ORADEBUG command 'SHORT_STACK' to process 'Unix process pid: 33621970, image: oracle@padc2dbs01 (LMS0)'; prior command execution time exceeds 4955 ms

Other LMS

OSD pid info: Unix process pid: 19923988, image: oracle@padc2dbs01 (LMS1)
....
0: waiting for 'gcs remote message'
waittime=0x1e, poll=0x0, event=0x0
wait_id=121728350 seq_num=29877 snap_id=1
wait times: snap=0.368767 sec, exc=0.368767 sec, total=0.368767 sec

 

 

CAUSE

Missing OS fix ENABLING 'TCP_FASTLO' OPTION COULD LEAD TO MEMORY LEAK

SOLUTION

Apply the relevant AIX fix:

http://www-01.ibm.com/support/docview.wss?uid=isg1fixinfo153400

7100 TL3 SP5 7100-03-05-1524 IV66228

6100 TL9 SP6 6100-09-06-1543 IV67463

6100 TL9 SP5 6100-09-05-1524 IV67463


你可能感兴趣的:(Oracle,DB,AIX)