本文对云上搭建RAC环境和常规架构搭建环境做比对,对云上搭建RAC遇到的问题进行分析和解决。本文仅适用于学习交流,并无法应用到生产使用中。
操作系统版本:Centos7.2
数据库版本: 11.2.0.4
ECS1 IP 172.102.2.150
ECS2 IP 172.102.2.151
RAC的网络规划
ip | 名称 |
---|---|
10.10.10.101 | rac1 |
10.10.10.102 | rac2 |
10.10.10.103 | rac1-vip |
10.10.10.104 | rac2-vip |
192.168.100.101 | rac1-priv |
192.168.100.102 | rac2-priv |
10.10.10.105 | scan-ip |
Oracle的rac环境要求公网和私网能够可以组播,而ECS上使用的vpc网络,无法实现组播功能,这就需要使用第三方软件N2N实现同一个VPC下服务器的网络可以组播通信。
安装
wget https://github.com/ntop/n2n/archive/master.zip
unzip master.zip
cd n2n-master/
make
make PREFIX=/opt/n2n install
启动超级节点
在节点1上执行
nohup /opt/n2n/sbin/supernode -l 65530 &
启动虚拟网卡
节点1
/opt/n2n/sbin/edge -d edge0 -a 10.10.10.101 -s 255.255.255.0 -c dtstack -k dtstack -l 172.102.2.150:65530 -E -r
/opt/n2n/sbin/edge -d edge1 -a 192.168.100.101 -s 255.255.255.0 -c dtstack -k dtstack -l 172.102.2.150:65530 -E –r
节点2
/opt/n2n/sbin/edge -d edge0 -a 10.10.10.102 -s 255.255.255.0 -c dtstack -k dtstack -l 172.102.2.150:65530 -E -r
/opt/n2n/sbin/edge -d edge1 -a 192.168.100.102 -s 255.255.255.0 -c dtstack -k dtstack -l 172.102.2.150:65530 -E -r
查看网络
#ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.102.2.150 netmask 255.255.255.0 broadcast 172.102.2.255
ether 00:16:4f:02:03:2f txqueuelen 1000 (Ethernet)
RX packets 9972204 bytes 7370730206 (6.8 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10323628 bytes 13053626825 (12.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
edge0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1400
inet 10.10.10.101 netmask 255.255.255.0 broadcast 10.10.10.255
ether b6:f8:b3:29:91:54 txqueuelen 1000 (Ethernet)
RX packets 3976751 bytes 357822921 (341.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 7668656 bytes 10636528983 (9.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
edge1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1400
inet 192.168.100.101 netmask 255.255.255.0 broadcast 192.168.100.255
ether aa:26:8e:9f:a1:4f txqueuelen 1000 (Ethernet)
RX packets 2016215 bytes 1345879442 (1.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1587796 bytes 805075807 (767.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
如果要删掉或修改虚拟网络,查找杀掉相关进程即可
[grid@rac1 ~]$ ps -ef|grep edg
root 10499 1 0 Jun19 ? 00:15:01 /opt/n2n/sbin/edge -d edge0 -a 10.10.10.101 -s 255.255.255.0 -c dtstack -k dtstack -l 172.102.2.150:65530 -E -r
root 10507 1 0 Jun19 ? 00:04:38 /opt/n2n/sbin/edge -d edge1 -a 192.168.100.101 -s 255.255.255.0 -c dtstack -k dtstack -l 172.102.2.150:65530 -E -r
在后面 runcluvfy.sh 检测网络的日志如下
Checking subnet "172.102.2.0" for multicast communication with multicast group "230.0.1.0"...
PRVG-11134 : Interface "172.102.2.151" on node "rac2" is not able to communicate with interface "172.102.2.150" on node "rac1"
PRVG-11134 : Interface "172.102.2.150" on node "rac1" is not able to communicate with interface "172.102.2.151" on node "rac2"
Checking subnet "172.102.2.0" for multicast communication with multicast group "224.0.0.251"...
PRVG-11134 : Interface "172.102.2.151" on node "rac2" is not able to communicate with interface "172.102.2.150" on node "rac1"
PRVG-11134 : Interface "172.102.2.150" on node "rac1" is not able to communicate with interface "172.102.2.151" on node "rac2"
Checking subnet "10.10.10.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "10.10.10.0" for multicast communication with multicast group "230.0.1.0" passed.
Checking subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0" passed.
可以看出172.102.2网段还是无法实现multicast,但虚拟出来的网络10.10.10、192.168.100 multicast communication是成功的。
咨询了阿里的同学
只能通过裸设备直接挂载
用fdisk对磁盘进行分区,过程略
修改udev文件
#vi /etc/udev/rules.d/60-raw.rules
ACTION=="add", KERNEL=="vdb1",RUN+="/bin/raw /dev/raw/raw1 %N"
ACTION=="add", KERNEL=="vdc1",RUN+="/bin/raw /dev/raw/raw2 %N"
ACTION=="add", KERNEL=="raw[1-5]", OWNER="grid", GROUP="asmadmin", MODE="660"
#udevadm control --reload-rules
因为是测试,没有严格用3个盘来创建ORC磁盘组,我只用了vdb1做决策盘。
创建裸设备并配置在启动时加载
#/bin/raw /dev/raw/raw1 /dev/vdb1
#/bin/raw /dev/raw/raw2 /dev/vdc1
vi /etc/rc.d/rc.local
/bin/raw /dev/raw/raw1 /dev/vdb1
/bin/raw /dev/raw/raw2 /dev/vdc1
在GI安装的检测过程中会出现共享盘告警,点击detail显示PRVF-5150错误。
PRVF-5150 : Path … is not a valid path on all nodes
遇到这个错误直接忽略报错即可
在安装GI最后要执行脚本,报错如下
Using configuration parameter file: /u01/app/grid/product/11.2.0/grid_1/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Adding Clusterware entries to inittab
ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2019-06-20 09:42:14.653:
[client(29322)]CRS-2101:The OLR was formatted using version 3.
2019-06-20 09:52:51.519:
[ohasd(29958)]CRS-0715:Oracle High Availability Service has timed out waiting for init.ohasd to be started.
/u01/app/grid/product/11.2.0/grid_1/perl/bin/perl -I/u01/app/grid/product/11.2.0/grid_1/perl/lib -I/u01/app/grid/product/11.2.0/grid_1/crs/install /u01/app/grid/product/11.2.0/grid_1/crs/install/rootcrs.pl execution failed
解决办法
#nohup /etc/rc.d/init.d/init.ohasd run &
再次执行root.sh
#/u01/app/grid/product/11.2.0/grid_1/root.sh
在RAC节点的网速只有10M/S,并且公网私网只是在逻辑上进行了区分,物理上还是在同一个网络中,无论在性能还是在安全性上都无法达到生产环境的标准。