在rocks cluster服务器系统当中,默认的计算节点名称是compute-0-0, compute-0-1 … 这样的名称,加之安装过程中不注意,或者在后期的运维过程当中出现某一个节点需要重装,导致节点名称与机架服务器的上下顺序完全无法对应,具体对应情况如下:
1 compute-0-9
2 compute-0-0
3 compute-0-2
4 compute-0-3
5 master
6 compute-0-12
7 compute-0-4
8 compute-0-5
9 compute-0-6
10 compute-0-7
11 compute-0-11
这种混乱的顺序十分影响后期的维护,因此,重新明明节点名称十分有必要。然而,节点名称的更改不像重命名文件那样轻而易举,需要一系列操作的配合使用,主要还是在修改hostname上费功夫,过程如下
修改前后的主机名称,使用命令 rocks list host可以查看所有节点主机名称
修改脚本如下:
rocksSetName()
{
rocks set host name compute-0-9 node1
rocks set host name compute-0-0 node2
rocks set host name compute-0-2 node3
rocks set host name compute-0-3 node4
rocks set host name compute-0-12 node5
rocks set host name compute-0-4 node6
rocks set host name compute-0-5 node7
rocks set host name compute-0-6 node8
rocks set host name compute-0-7 node9
rocks set host name compute-0-11 node10
for i in {1..10}
do
rocks set host interface name node$i eth0 node$i
done
}
上图当中显示的是更改之后的结果
在系统文件/etc/hosts记录的是管理节点和计算节点的内网地址和主机名称,使用vim命令做相应的修改,注意ip地址和主机名称的对应
这是管理节点上的主机名称修改,与此相对应的,每一个计算节点上都需要做相应的修改,运行如下脚本:
nodeHostname()
{
# modify the hostname
for i in {1..10}
do
ssh node$i "hostname node$i.local"
ssh node$i sed -i "/HOSTNAME/c\HOSTNAME=node$i.local" /etc/sysconfig/network
done
# modify the files /etc/hosts
ssh node2 'sed -i "/172.16.255.254/c\172.16.255.254 node2.local node2 " /etc/hosts'
ssh node1 'sed -i "/172.16.255.245/c\172.16.255.245 node1.local node1 " /etc/hosts'
ssh node10 'sed -i "/172.16.255.246/c\172.16.255.246 node10.local node10" /etc/hosts'
ssh node3 'sed -i "/172.16.255.252/c\172.16.255.252 node3.local node3 " /etc/hosts'
ssh node4 'sed -i "/172.16.255.251/c\172.16.255.251 node4.local node4 " /etc/hosts'
ssh node5 'sed -i "/172.16.255.244/c\172.16.255.244 node5.local node5 " /etc/hosts'
ssh node6 'sed -i "/172.16.255.250/c\172.16.255.250 node6.local node6 " /etc/hosts'
ssh node7 'sed -i "/172.16.255.249/c\172.16.255.249 node7.local node7 " /etc/hosts'
ssh node8 'sed -i "/172.16.255.248/c\172.16.255.248 node8.local node8 " /etc/hosts'
ssh node9 'sed -i "/172.16.255.247/c\172.16.255.247 node9.local node9 " /etc/hosts'
for i in {1..10}
do
ssh node$i "mv /opt/gridengine/default/spool/compute* /opt/gridengine/default/spool/node$i"
done
}
在/opt/gridengine/default/common/local_conf目录下存放着计算节点文件,具体如下:
在修改文件名称之后,文件的内容也需要随之修改,如下所示:
不仅仅在这个目录,在另外两个目录下同样存放着类似文件,分别是
/opt/gridengine/default/spool/qmaster/admin_hosts , /opt/gridengine/default/spool/qmaster/exec_hosts
修改方法类似,可以使用sed命令做批量修改。
完成以上任务以后重启管理节点上的sgeqmaster守护进程以及所有计算节点下的sgeexecd进程,脚本如下:
restartSge_execd()
{
/etc/init.d/sgeqmaster.cecs01 stop
/etc/init.d/sgeqmaster.cecs01 start
for i in {1..10}
do
ssh node$i "/etc/init.d/sgeexecd.cecs01 stop"
ssh node$i "/etc/init.d/sgeexecd.cecs01 start"
done
}
顺利的话可以看到屏幕输出一系列重启成功的输出
运行qhost, 可以看到如下结果:
运行rocks命令, 比如 rocks sync config 或者所有计算节点运行同一个命令: rocks run host ls, 能够
得出正常的输出结果即说明rocks管理脚本同样没有问题。
以下是完整的脚本内容,ip地址部分需要根据具体情况做修改,admin_hosts等几个目录下的文件名称和内容修改可以手动或者用sed命令进行修改:
#!/usr/bin/bash
#rocksSetName{{{
rocksSetName()
{
rocks set host name compute-0-9 node1
rocks set host name compute-0-0 node2
rocks set host name compute-0-2 node3
rocks set host name compute-0-3 node4
rocks set host name compute-0-12 node5
rocks set host name compute-0-4 node6
rocks set host name compute-0-5 node7
rocks set host name compute-0-6 node8
rocks set host name compute-0-7 node9
rocks set host name compute-0-11 node10
for i in {1..10}
do
rocks set host interface name node$i eth0 node$i
done
}
#}}}
#nodeHostname{{{
nodeHostname()
{
# modify the hostname
for i in {1..10}
do
ssh node$i "hostname node$i.local"
ssh node$i sed -i "/HOSTNAME/c\HOSTNAME=node$i.local" /etc/sysconfig/network
done
# modify the files /etc/hosts
ssh node2 'sed -i "/172.16.255.254/c\172.16.255.254 node2.local node2 " /etc/hosts'
ssh node1 'sed -i "/172.16.255.245/c\172.16.255.245 node1.local node1 " /etc/hosts'
ssh node10 'sed -i "/172.16.255.246/c\172.16.255.246 node10.local node10" /etc/hosts'
ssh node3 'sed -i "/172.16.255.252/c\172.16.255.252 node3.local node3 " /etc/hosts'
ssh node4 'sed -i "/172.16.255.251/c\172.16.255.251 node4.local node4 " /etc/hosts'
ssh node5 'sed -i "/172.16.255.244/c\172.16.255.244 node5.local node5 " /etc/hosts'
ssh node6 'sed -i "/172.16.255.250/c\172.16.255.250 node6.local node6 " /etc/hosts'
ssh node7 'sed -i "/172.16.255.249/c\172.16.255.249 node7.local node7 " /etc/hosts'
ssh node8 'sed -i "/172.16.255.248/c\172.16.255.248 node8.local node8 " /etc/hosts'
ssh node9 'sed -i "/172.16.255.247/c\172.16.255.247 node9.local node9 " /etc/hosts'
for i in {1..10}
do
ssh node$i "mv /opt/gridengine/default/spool/compute* /opt/gridengine/default/spool/node$i"
done
}
#}}}
#restartSge_execd{{{
restartSge_execd()
{
/etc/init.d/sgeqmaster.cecs01 stop
/etc/init.d/sgeqmaster.cecs01 start
for i in {1..10}
do
ssh node$i "/etc/init.d/sgeexecd.cecs01 stop"
ssh node$i "/etc/init.d/sgeexecd.cecs01 start"
done
}
#}}}
rocksSetName
nodeHostname
restartSge_execd