MOSIX集群(一)�C安装
目的: 集群节点内进程能根据负载情况自动迁移
用vmware安装一台rhel5(192.168.100.5)
# 下载MOSIX和kernel代码,准备编译
# 解压到指定目录
[root@rhel5 ~]# tar xjvf MOSIX-2.24.2.2.tbz -C /usr/src/
[root@rhel5 ~]# tar xzvf linux-2.6.26.tar.gz -C /usr/src/
#进入源代码所在目录
[root@rhel5 ~]# cd /usr/src/
#由于other/patch-2.6.26的目标路径是linux-2.6.26.1,做个连接吧(可能是mosix没有为2.6.26单独写patch…,不过还是支持的)
[root@rhel5 src]# ln -s linux-2.6.26/ ./linux-2.6.26.1
#给kernel打上mosix补丁
[root@rhel5 src]# patch -p0 < /usr/src/mosix-2.24.2.2/other/patch-2.6.26
#进入源代码目录,开始编译
[root@rhel5 src]# cd linux-2.6.26
#生成配置文件
[root@rhel5 linux-2.6.26]# make menuconfig
#生成依赖关系
[root@rhel5 linux-2.6.26]# make dep
#编译内核
[root@rhel5 linux-2.6.26]# make bzImage
#编译内核模块
[root@rhel5 linux-2.6.26]# make modules
#安装内核模块
[root@rhel5 linux-2.6.26]# make modules_install
#安装内核
[root@rhel5 linux-2.6.26]# make install
#进入mosix目录
[root@rhel5 mosix-2.24.2.2]# cd ../mosix-2.24.2.2
#安装mosix,一路回车,只用安装,记得把你常用级别的mosix服务打开就可以了.配置以后再说
[root@rhel5 mosix-2.24.2.2]# ./mosix.install
关机以后,用rhel5(192.168.100.5)克隆出slave(192.168.100.6)
安装完成
MOSIX-2.24.2.2/linux-2.6.26集群(二)�C配置
将rhel5和slave开启,开机的时候,在grub界面按回车,然后选择2.6.26内核启动
slave启动以后,把ip地址,机器名改好(应为是由rhel5克隆得到的嘛)
[reel5]
#配置mosix
[root@rhel5 ~]# mosconf
MOSIX CONFIGURATION
===================
If this is your cluster's file-server and you want to configure MOSIX
for a set of nodes with a common root, please type their common root
directory. Otherwise, if you want to configure the node that you are
running on, just press <ENTER> :-
What would you like to configure?
=================================
1. Which nodes are in this cluster (ESSENTIAL)
2. Authentication (ESSENTIAL)
3. Logical node numbering (recommended)
4. Queueing policies (recommended)
5. Freezing policies
6. Miscellaneous policies
7. Become part of a multi-cluster organizational Grid
Configure what :- 1
There are no nodes in your cluster yet:
=======================================
To add a new set of nodes to your cluster, type 'n'.
To turn on advanced options, type '+'.
For help, type 'h'.
To save and exit, type 'q'. (to abandon all changes and exit, type 'Q')
Option :- n <==添加节点
Adding new node(s) to the cluster:
First host-name or IP address :- 192.168.100.5 <==节点ip
Number of nodes :- 1 <==节点数
Nodes in your cluster:
======================
1. 192.168.100.5
To add a new set of nodes to your cluster, type 'n'.
To modify an entry, type its number.
To delete an entry, type 'd' followed by that entry-number (eg. d1).
To turn on advanced options, type '+'.
For help, type 'h'.
To save and exit, type 'q'. (to abandon all changes and exit, type 'Q')
Option :- n <==添加节点
Adding new node(s) to the cluster:
First host-name or IP address :- 192.168.100.6 <==节点ip
Number of nodes :- 1 <==节点数
Nodes in your cluster:
======================
1. 192.168.100.5
2. 192.168.100.6
To add a new set of nodes to your cluster, type 'n'.
To modify an entry, type its number.
To delete an entry, type 'd' followed by that entry-number (eg. d2).
To turn on advanced options, type '+'.
For help, type 'h'.
To save and exit, type 'q'. (to abandon all changes and exit, type 'Q')
Option :- q <==保存退出
Cluster configuration was saved.
OK to also update the logical node numbers [Y/n]? y
Suggesting to assign '192.168.100.5'
as the central queue manager for the cluster
(but be cautious if you mix 32-bit and 64-bit nodes in the same cluster)
OK to update it now [Y/n]?
What would you like to configure next?
======================================
1. Which nodes are in this cluster
2. Authentication (ESSENTIAL)
3. Logical node numbering
4. Queueing policies
5. Freezing policies
6. Miscellaneous policies
7. Become part of a multi-cluster organizational Grid
q. Exit
Configure what :- 2 <==设置密码
MOSIX Authentication:
=====================
To protect your MOSIX cluster from abuse, preventing unauthorized
persons from gaining control over your computers, you need to set
up a secret cluster-protection key. This key can include any
characters, but must be identical throughout your cluster.
Your secret cluster-protection key: xxxx <==输入密码
Your key is 5 characters long.
(in the future, please consider a longer one)
To allow your users to send batch-jobs to other nodes in the cluster,
you must set up a secret batch-client key. This key can include any
characters, but must match the 'batch-server' key on the node(s) that
can receive batch-jobs from this node.
Your secret batch-client key: xxxx <==输入密码
Your key is 5 characters long.
(in the future, please consider a longer one)
For this node to accept batch jobs,
you must set up a secret batch-server key. This key can include any
characters, but must match the 'batch-client' key on the sending nodes.
To make your batch-server key the same as your batch-client key, type '+'.
Your secret batch-server key: xxxx <==输入密码
Your key is 5 characters long.
(in the future, please consider a longer one)
#保持退出
[root@rhel5 ~]# service mosix restart
[root@slave ~]# mosconf
....
#操作同rhel5一样
#重启服务
[root@slave ~]# service mosix restart
#看看状态吧
[root@slave ~]# service mosix status
This MOSIX node is: 192.168.100.6 (no features)
Nodes in cluster:
=================
192.168.100.5: proximate
192.168.100.6: proximate
Status: Running Normally (32-bits)
Load: 0.01 (equivalent to about 0.0066 CPU processes)
Speed: 6650 units
CPUS: 1
Frozen: 0
Util: 100%
Avail: YES
Procs: Running 0 MOSIX processes
Accept: Yes, will welcome processes from here
Memory: Available 461MB/503MB
Swap: Available 0.9GB/0.9GB
Daemons:
Master Daemon: Up
MOSIX Daemon : Up
Queue Manager: Up
Remote Daemon: Up
Postal Daemon: Up
Guest processes from other clusters in the grid: 0/8
#我比较喜欢看看端口是不是起来了
#TCP/IP ports 249-253 and UDP/IP ports 249-250 must be available for MOSIX
[root@slave ~]# netstat -antu | grep -E "24|25"
tcp 0 0 0.0.0.0:2401 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:249 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:250 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:251 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:252 0.0.0.0:* LISTEN
udp 0 0 0.0.0.0:249 0.0.0.0:*
udp 0 0 0.0.0.0:250 0.0.0.0:*
#好了,装完了
MOSIX-2.24.2.2/linux-2.6.26集群(三)�C应用测试
#先在rehl5和slave上各开启一个终端,运行mon命令,检查
[root@rhel5 ~]# mon
#2个节点上应该都是闲置的吧
#为了能出些效果,做点费cpu的脚本,还必须是多线程的,
#mosix能够迁移的最小单位是进程,而不是指令或者函数,
#所以单进程负载再高也没意义
[root@rhel5 ~]# cat a.sh << EOF
awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' &
awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' &
awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' &
awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' &
awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' &
awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' &
EOF
[root@rhel5 ~]# chmod +x a.sh
#在rhel5上运行a.sh,也就是产生6个进程了
[root@rhel5 ~]# mosrun -e ./a.sh
#开始观察2个节点上的mon画面,刚开始rhel负载很高,然后slave的负载也起来了,能够看到
#能够看到在rhel5上,awk的6个进程还在,但是只有3个在运行,还有3个的状态是T(stop),哈哈,应该是迁移了
[root@rhel5 ~]# ps -aux | grep awk
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ
root 25648 0.6 0.0 0 0 pts/0 T 16:16 0:00 [awk]
root 25650 0.4 0.0 0 0 pts/0 T 16:16 0:00 [awk]
root 25652 32.0 0.7 4168 3812 pts/0 R 16:16 0:37 awk BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}
root 25654 32.0 0.7 4168 3816 pts/0 R 16:16 0:37 awk BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}
root 25656 32.0 0.7 4168 3816 pts/0 R 16:16 0:37 awk BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}
root 25658 1.4 0.0 0 0 pts/0 T 16:16 0:01 [awk]
root 25665 0.0 0.1 3860 624 pts/0 R+ 16:18 0:00 grep awk
#到slave上top看看吧,明显看到有3个叫remoted的进程占用了cpu,这个就是迁移过来的状态吧
top - 16:19:19 up 3:10, 3 users, load average: 2.78, 1.18, 0.44
Tasks: 99 total, 5 running, 94 sleeping, 0 stopped, 0 zombie
Cpu(s): 99.3%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si
Mem: 515376k total, 423576k used, 91800k free, 107980k buff
Swap: 1048568k total, 0k used, 1048568k free, 234028k cach
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16929 root 20 0 4168 3936 0 R 33.2 0.8 0:48.13 remoted
16925 root 20 0 4168 3932 0 R 32.9 0.8 0:50.57 remoted
16927 root 20 0 4168 3932 0 R 32.9 0.8 0:50.13 remoted
1 root 20 0 2036 664 572 S 0.0 0.1 0:01.36 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migratio
4 root 15 -5 0 0 0 S 0.0 0.0 0:02.00 ksoftirq
##############全文测试结束############