下午同事发来微信说在装Oracle 11.2.0.4的RAC时,GI已经正常安装了。但在安装数据库软件的时候,图形界面不显示两台主机的主机名,没办法继续往下安装。由于不在一起,于是百度相应的文章来看,但是相关的文章也不多。找到一篇https://blog.csdn.net/MFW333/article/details/71122990,说是inventory.xml文件中缺少了CRS = TRUE。
于是转给同事去检查,反馈的结果是配置文件中是有这一项配置的:
那问题出在哪里了呢,看来不是百度一下就可以解决的。要来服务器信息,自己登录上去看。其实一开始我觉得是/etc/hosts或hostname配置有错误导致的,但是登录上去之后发现配置都没有问题。于是自己执行./runInstaller实际看看,到了Grid Installation Options这一步时确是看不到主机的node name信息。由于匆忙没有截图,同学们可以自己脑补一下就可以了。
心想没有node name信息,看看日志有没有什么报错呢。但是很遗憾,没有日志。没有日志信息,就看不到报错;没有报错,问题就很难排查。那就想,我直接点击下一步看看会是什么效果呢,结果出来了我想要的报错信息:[INS-08109] Unexpected error occurred while validating inputs at state 'nodeSelectionPage'.
details里居然写没有多余的信息,让联系support或查询手册。support是没有的,那先去MOS上查查相关文档吧,还真有跟这个一模一样的报错信息的文档。INS-08109 Unexpected error occurred While Validating Inputs At State 'nodeSelectionPage' (文档 ID 948382.1)。文档里写了7种原因会导致的这种问题,并给出了解决办法
1. ORA_CRS_HOME environment variable is set
OUI calls $GRID_HOME/srvm/admin/getcrshome to get Grid Infrastructure home, if environment variable ORA_CRS_HOME is set to anything but GRID home, the error will be reported. In 11gR2, it's not supported to setup environment variable ORA_CRS_HOME.
The solution is to unset the environment variable and restart OUI.
unset ORA_CRS_HOME
2. Database user can not access olr.loc in /etc/oracle or /var/opt/oracle
OUI calls "crsctl query crs activersion" to get active version of clusterware. If it fails to read olr.loc (if grid and database are owned differently), the issue can happen. From strace/truss:
open("/etc/oracle/olr.loc", O_RDONLY) = -1 EACCES (Permission denied)
The solution is to ensure that 'oracle' user has read permission on olr.loc in /etc/oracle or /var/opt/oracle.
3. ORACLE_HOME environment variable not set to GRID_HOME
This problem will occur while installing 11.2.0.4 RAC RDBMS software in a 12c Grid Infrastructure environment.
The solution is to set ORACLE_HOME environment variable pointing to GRID_HOME before installing RDBMS software.
export ORACLE_HOME=
4. Database user does not exist on all nodes
This problem will occur if database user does not exist on all nodes in the cluster. For example, 4-node cluster, trying to install a new RAC DB home on two nodes only so the new DB user is created on two nodes. The solution is to have the same DB user on all nodes.
5. NLS_LANG environment variable is set to Japanese_Japan.JA16SJIS etc
If NLS_LANG is set to "Japanese_Japan.JA16SJIS" etc, the issue will happen as output of "crsctl query crs activeversion" garbles.
The solution is to unset NLS_LANG.
6. ORA_NLS10 environment variable is set
environment variable is not necessary for Oracle 11g/12c
7. Hostnames in /etc/hosts are incorrect
This error is generated when trying to add new nodes into the cluster if the entries for the hostnames
in /etc/hosts is incorrect.
看来看去,好像只有第二条比较符合我们当前遇到的问题,于是去机器看olr.loc的权限情况
又执行了命令crsctl query crs activeversion,也是可以执行成功的。(这里其实mos的文章里写错了,activeversion写成了activersion。)
再执行olsnode -n,也是可以识别到两个节点
集群也是正常状态
看起来也是有读权限的。那问题怎么解决呢?又看了几篇MOS上的文档,又百度了一些贴子,感觉一般别人遇到的问题就已经在上面7条里已经列出来了。但我遇到的问题却没有列出来。难道又是BUG?想想应该不可能,这么点小事就出BUG,那Oracle也不可能一下强大到现在。那问题到底出现在哪里呢?
其实这时我陷入到了思维定式里了,把一些我认为不可能出问题的地方忽略掉了,恰恰这次出问题的,正是这个地方。
其实还是权限问题,只不过不只是olr.loc的权限问题,而是整个oracle用户的权限问题。看了很多文章都说是权限问题,我就想,那我看看oracle用户配置的uid和所属组吧,当我执行完id oracle命令看到结果时,瞬间明白了问题的根本原因,就是权限问题。
熟悉oracle的同学应该一眼就能看出来,这个oracle用户配置的所属组是完全错的。这也就成了造成此次问题的关键。把oracle用户删掉重新创建,问题完美解决:
写这篇博客,一个是给大家提供一个解决问题的思路,再一个就是提醒自己,不是“你以为的就是你以为的”,还是需要仔细的去验证分析。