这篇文章不是标题党,是在实际工作中真切的案例。
场景:这是一套Windows Server 2008 R2 X64的系统,跑了一套10.2.0.5.0的oracle物理备库,运行一切正常。在客户的要求下,需要调整该服务器的机器名。
步骤:整理好调整的思路后,开始执行操作[包括停备库,ASM实例,修改hosts文件、tnsnames.ora文件等],在客户IT人员修改完机器名并重启服务器之后,发现悲剧的一幕,机器无法正常启动,不过客户端倒是可以ping通服务器,但是无法通过远程桌面连接。
怎么办呢?经过分析和定位,感觉极有可能出问题的地方就是OracleCSService这个服务,而且该服务的启动类型是自动启动。也就是说该服务项会加载到windows系统的启动项里,随着操作系统的启动而启动,而该服务又是hard-coded,应该是同机器名进行“捆绑”的,由于修改了机器名,导致OracleCSService服务项不能正常启动,进而导致操作系统无法正常启动。
找到解决问题的思路之后,可以尝试重启服务器,进入安全模式,禁用该服务,然后重启机器,结果该机器已经无法再次进入安全模式,之前进去过,原因未知,客户IT硬件人员操作。
于是,一边尝试可以进入安全模式的方法,一边估计下下策的重装Windows系统,重建Dataguard的方案。结果,更为不可思议的是,服务器特么自己能够正常启动了,大家什么都没操作。接下来,就登录上去,果断重建了OracleCSService服务:
删除该服务:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.
All
rights reserved.
C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat
usage: crssetup <config
add
=
""
deconfig=
""
del=
""
help=
""
ladd=
""
ldel=
""
lres=
""
shutdown=
""
upgrade=
""
>
config - configure
and
startup the cluster
on
nodes
add
-
add
specified nodes
to
the cluster
del -
delete
the specified nodes
from
the cluster
deconfig - wipe
out
all
cluster configuration information
ldel -
local
css
delete
from
oracle home
lres -
local
css home reset
to
new oracle home
ladd -
local
css
add
to
oracle home
shutdown - shutdown the selected nodes
upgrade - upgrade the specified nodes
help - print
out
this information
C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat deconfig
GetConfiguredClusterNodes: failed
to
initialize subsystem, rc(21)
failed
to
determine remaining nodes
in
the cluster
failed during critical configuration information
please supply <-
force
>
option
to
continue
C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat deconfig -
force
GetConfiguredClusterNodes: failed
to
initialize subsystem, rc(21)
failed
to
determine remaining nodes
in
the cluster
failed during critical configuration information
<-
force
>
option
specified, continuing
Step 1: shutting down node apps
failed executing
check
for
CRS resources
[ 2 ] The system cannot find the file specified.
failed executing
check
for
CRS resources
failure determining CRS resources state, continuing due
to
FORCE
option
DEBRESTDDB Removing node apps
PRKC-1056 : Failed
to
get the hostname
for
node DEBRESTDDB
PRKH-1010 : Unable
to
communicate
with
CRS services.
[Communications Error(Native: prsr_initCLSS:[3])]
DEBRESTDDB Removing ONS configuration
failed
to
remove ONS configuration
[ 2 ] The system cannot find the file specified.
DEBRESTDDB failed
to
execute
removal
of
ONS configuration
failuring during
delete
of
node apps, continuing
Step 2: shutting down
local
CRS stack
DEBRESTDDB failed
to
located service OracleEVMService, err(1060)
failed
to
stop CRS stack
on
all
nodes
to
be removed, continuing
Step 3: removing CRS stack
from
requested nodes
Step 4: stopping extra CRS services
Step 5: cleanup up registry keys
Step 6: perform cleanup
of
the OCR repository C:\oracle\product\10.2.0\db_1\cdata\localhost\
local
.ocr
successful deconfiguration
of
the cluster
C:\Users\Administrator></config>
|
重建该服务:
1
2
3
4
5
6
7
8
9
10
|
C:\Users\Administrator>C:\oracle\product\10.2.0\db_1\BIN\localconfig.bat
add
Step 1: creating new OCR repository
Successfully accumulated necessary OCR keys.
Creating OCR keys
for
user
'administrator'
, privgrp
''
..
Operation successful.
Step 2: creating new CSS service
successfully created
local
CSS service
successfully added CSS
to
home
C:\Users\Administrator>
|
最后,启动ASM实例,启动物理备库,打开同主库的同步,完成同步。
值得记住的地方:
① 不要轻易修改机器名,除非必要。修改之前,一定一定要理清楚checklist,不可像本例中遗漏了OracleCSService服务项的重建;
② 对于生产环境的各种操作,真的要三思而后行;
③ 写这篇记录小文的时候,发现Metalink上有该案例的详细操作说明哇:How to change the Hostname when Oracle 10G and ASM are used [ID 422729.1]