来源于:
Oracle Cluster failed to start with ASM instance getting ORA-00443 (文档 ID 2000868.1)
Cluster was not starting on one node:
# ./crsctl start cluster CRS-2672: Attempting to start 'ora.asm' on 'tcpepsd2' CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-00443: background process "LMD0" did not start . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/tcpepsd2/agent/ohasd/oraagent_grid/oraagent_grid.log". CRS-2674: Start of 'ora.asm' on 'tcpepsd2' failed CRS-2679: Attempting to clean 'ora.asm' on 'tcpepsd2' CRS-2681: Clean of 'ora.asm' on 'tcpepsd2' succeeded CRS-4000: Command Start failed, or completed with errors.
oraagent_grid.log ================= 2015-04-09 12:40:05.848: [ora.asm][36] {0:0:109} [start] InstAgent::start exception } 2015-04-09 12:40:05.849: [ AGENT][36] {0:0:109} UserErrorException: Locale is 2015-04-09 12:40:05.849: [ora.asm][36] {0:0:109} [start] clsnUtils::error Exception type=2 string= CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-00443: background process "LMD0" did not start . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/tcpepsd2/agent/ohasd/oraagent_grid/oraagent_grid.log". 2015-04-09 12:40:05.849: [ AGFW][36] {0:0:109} sending status msg [CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-00443: background process "LMD0" did not start . For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0/grid/log/tcpepsd2/agent/ohasd/oraagent_grid/oraagent_grid.log". ] for start for resource: ora.asm 1 1 2015-04-09 12:40:05.849: [ora.asm][36] {0:0:109} [start] (:CLSN00107:) clsn_agent::start }
Tried starting the ASM instance manually but failed:
$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.3.0 Production on Wed Apr 15 11:37:47 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to an idle instance. SQL> startup ORA-00443: background process "LMD0" did not start alert_+ASM2.log =============== Fri Apr 17 11:45:33 2015 NOTE: No asm libraries found in the system MEMORY_TARGET defaulting to 285212672. * instance_number obtained from CSS = 2, checking for the existence of node 0... * node 0 does not exist. instance_number = 2 Starting ORACLE instance (normal) ---------- ---------- Fri Apr 17 11:50:55 2015 LMON started with pid=9, OS id=5163 Fri Apr 17 11:52:27 2015 Process LMD0 died, see its trace file <<<<< USER (ospid: 4932): terminating the instance due to error 443 Instance terminated by USER, pid = 4932
+ASM2_lmd0_5165.trc =================== *** 2015-04-17 11:50:55.846 Async driver not configured : errno=13 kjmdmi: pmon timed out in attaching. *** 2015-04-17 11:51:55.899 Process diagnostic dump for oracle@TCPEPSD2 (PMON), OS id=5149, pid: 2, proc_ser: 1, sid: 1, sess_ser: 1 ------------------------------------------------------------------------------- os thread scheduling delay history: (sampling every 1.000000 secs) 0.000000 secs at [ 11:51:55 ] NOTE: scheduling delay has not been sampled for 0.380300 secs 0.001209 secs from [ 11:51:51 - 11:51:56 ], 5 sec avg 0.000346 secs from [ 11:50:56 - 11:51:56 ], 1 min avg 0.000340 secs from [ 11:50:55 - 11:51:56 ], 5 min avg loadavg : 0.03 0.03 0.07 Swapinfo : Avail = 40154.05Mb Used = 14179.29Mb Swap free = 25974.76Mb Kernel rsvd = 1118.85Mb Free Mem = 6279.48Mb F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME COMD 1401 S grid 5149 1 0 154 20 e0000003ab571100 86537 e0000001c1ee2040 11:50:54 ? 0:00 asm_pmon_+ASM2 Short stack dump: ksedsts()+544<-ksdxfstk()+48<-ksdxcb()+3216<-sspuser()+688<-<-_poll_sys()+48<-_poll()+224<-_res_send()+3072<-_res_query()+336<-_res_querydomain()+512<-_res_search()+1792<-C000000000BECA80<-_nss_dns_getipnodebyname()+64<-C000000000BE9AC0<-nss_search()+1056<-__getipnodebyname_r()+816<-_getipnodebyname()+352<-_getaddrinfo()+976<-snlinGetAddrInfo()+576<-nttbnd2addr()+704<-ntacbbnd2addr()+736<-ntacbnd2addr()+272<-nsc2addr()+464<-nscall1()+400<-nscall()+1952<-nsgrcOpen()+688<-nsgrDo()+96<-nsgrrg_Register()+448<-kmmlrl()+10528<-ksucln()+7424<-ksbrdp()+2736<-opirip()+1296<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+576<-main()+336
We can see OS syscalls in the above call stack:
poll()+224<-_res_send()+3072<-_res_query()+336<-_res_querydomain()+512<-_res_search()+1792<-C000000000BECA80<-_nss_dns_getipnodebyname()+64<-C000000000BE9AC0<-nss_search()+1056<-__getipnodebyname_r()+816<-_getipnodebyname()+352<-_getaddrinfo()
This means that the process is trying to resolve the host alias and is not able to complete that operation. Normally this suggest we may have an OS configuration issue in the DNS or LDAP configuration, if this is used.
Checking further, found nslookup failed:
# nslookup TCPEPSD2 *** Can't find server name for address 203.176.113.82: Timed out *** Can't find server name for address 203.176.113.84: Timed out *** Default servers are not available Using /etc/hosts on: TCPEPSD2 looking up FILES Name: TCPEPSD2 Address: 10.30.2.120
There were some changes made on /etc/resolve.conf. After removing them, crs has come up fine.