openmosix学习心得,openmosix和pbs系统的差异

openmosix( www.openmosix.org) 是一个开源项目,前身是MOSIX,openmosix是MOSIX的开源实现。一般我们把HPC集群称为Beowulf集群,而MOSIX类型的集群则 和HPC集群完全不同。openmosix的集群是一个kernel patch,它在kernel这一层实现任务均衡的作用。比如说,一个集群有10个节点,那么,如果我们要压制10首MP3的话,那么只需要将这个任务提 交到任何一个节点,那么openmosix就会自动将这个任务分发到10个节点上,而且任务迁移的时间都是second级别的。由于openmosix是 kernel一级的,所以,对于上层application来说,根本无需改动,只要这种任务是openmosix friendly的,openmosix就会自动为我们调度这些任务,将任务分发到CPU负载低的机器上。而且,openmosix还有auto discovery的功能,即,新加入一个节点的话,openmosix能自动发现他,并将任务调度过来。

openmosix的安装非常简单,目前openmosix还只支持2.4的内核,不支持2.6的,只要我们有一个2.4内核的源码树,然后用 patch将openmosix的源码patch上去,然后重新编译生成内核,用新内核引导系统即可。在cluster中每台机器上都换成 openmosix的内核即可。

需要注意的是,openmosix无法将一个串行应用改成并行!openmosix只是可以将一个集群组成一个逻辑上的SMP的机器,然后透明的 分发任务而已。这里有一篇openmosix官方的FAQ,非常的经典,里面说了openmosix能作什么,不能做什么,在安装(即编译新内核)的时候 要用什么版本的gcc等等,非常详细和实用:

附件1

再下面讲述openmosix和PBS系统的对比时,这里还要提到一个有意思的东西。在openmosix的网站上可以看到有这么一个 project:CHeckPOint,也就是CHPO,他能很好的实现checkpoint的功能,很强哦,但是这个软件是For MOSIX集群的,所以,HPC集群无法使用


讲到这里,我就有一个疑问了,就是openmosix和pbs系统有什么不同呢?用户可以提交任务给PBS系统,由PBS负责在集群中调度任务,看起来和 openmosix差不多啊,为此,我发了一个邮件到openmosix的maillist,这里是老外的回答,总结起来说,有这么几点:

1. 使用PBS系统,必须用户自己定义任务的逻辑,而且必须使用PBS系统的qsub等命令来提交。而使用openmosix,这是完全透明的。对于一些我们无法修正或自定义任务特性的应用软件而言,PBS是无能为力的。

2. 对于Interactive的程序,PBS系统处理的都比较麻烦,openmosix完全没有这个问题。

3. PBS系统提交了一个任务以后,就不管这个任务了,哪怕这个节点上CPU负载提高了,原先被提交到这个节点上的任务还会继续在该节点上执行。而 openmosix不同,如果openmosix发现该节点的CPU负载太高,而cluster中有CPU Free的节点时,他会再次将任务做迁移。换句话说,openmosix时时刻刻都在“调度”任务。

这里是老外的一些回答,供参考:

Not sure exactly what a PBS is but form what you describe it would
run on top of an openmosix cluster. OpenMosix creates what is called a
Single System Image. This wya your jobs do not have to know anything
about a cluster, they do not have to be submitted to a scheduler, you
just run and forget. The cluster will automatically shift the load
around to get best cpu per job usage. A scheduler on the other hand
requires either the jobs to understnad how to do parrallel work or for
the submitter to pre-split the job to make use of the cluster. That is
a bit of a simple explantation but Ithin it should give you a good
idea of the difference.

================================================================

While having similar aims, the way batch systems like the grid engine
achieve them is quite different from the openMosix approach. Roughly
speaking, batch systems start jobs on free nodes by "ssh-ing" to the
given node (or do something a bit more clever but still somehow
equivalent). This is also the reason why they have bring their own job
management tools (e.g. qstat, qsub etc.) -- jobs just have to be
processes on your local node. This does also limit the kind of jobs
you can use with batch systems, as they're usually unable to execute
interactive or X applications.

Contrary to that, openMosix engages in a much lower level. It is a
kernel patch that allows processes running locally on a node to be
transmitted to another node transparently during runtime. The last
part is quite important, as it has some interesting consequences:

* If there is a load inequality among the cluster nodes, it can be
equalized much smoother by simply migrating some jobs to the idling
machines within seconds. Batch systems can only equalize load by
starting new jobs which isn't as elegant and, more important, will
fail if the queue is empty.

* You don't have to use special job management tools, as openMosix
can migrated nearly every process on your node (ok, there are some
limitations: no multithreading, no shared memory, but for oM's use
case this is a weak limitation).

* oM does also work with interactive and X applications. For instance
if you have a graphical fractal generator which is creating high
load on your login machine, oM could easily migrate it to an idling
machine without you noticing it.

So, oM is, despite some limitations, way more elegant. Give it a try.

================================================================

By using the openMosix kernel along side other clustering apps, a more
generalised beowulf style cluster can be built to cater for all types of
use.

I have used PBS and found it tricky to set up jobs to run quickly across
nodes but that would NOT mean that you cannot use PBS along side openMosix.

If a job you schedule for a particular node is openMosix firiendly, then
openMosix could cause that particular job to migrate on to a faster free
node and if your particular job spawns sub processes that are openMosix
friendly, then each one of those processes could infact migrate in order to
get 100% CPU usage from all the openMosix nodes in your network.

ie
PBS spawns 10 openMosix friendly processes for 1 node on the network,
openMosix would migrate each of those processes to a different node.
If one node is then used for something else, then openMosix could migrate
the process again to find the maximum CPU use for that process.
Without openMosix, PBS would only allow you to set the same 10 processes to
run across 10 nodes and stay where they run.

Quote from Andreas Sch?er:
So, oM is, despite some limitations, way more elegant. Give it a try.

*Yes, and use it along side PBS and other clustering apps.*

I also use DSH from within cron...
My cron scheduler runs on a designated master node, jobs are set using
crontab -e from any node and DSH is used to run them.

eg:

0 * * * * dsh -c -m 192.168.1.20 -m 192.168.1.21 /home/mydir/myscript.sh
Would cause the script 'myscript.sh' to run hourly on nodes 192.168.1.20 and
21 concurrently.
(please note that /home and /var/spool/cron/ are available from NFS )
If myscript.sh contains oM friendly processes, then those too will migrate
across the network to other nodes.
 

你可能感兴趣的:(open)