OpenMPI设置集群环境

OpenMPI设置集群环境

安装准备

首先准备两个机器,比如 host1 和 host2,设置这两个机器可以互相免密钥登录(Linux SSH 免密码登录)

修改两个机器的/etc/hosts文件,加入两个机器的信息,比如:

172.17.0.2  test1
172.17.0.3  test2

安装openmpi

下载

$ wget -c https://www.open-mpi.org/software/ompi/v1.10/downloads/openmpi-1.10.3.tar.gz

安装

$ tar zxvf openmpi-1.10.3.tar.gz
$ cd openmpi-1.10.3
$ ./configure --prefix=/opt/openmpi
$ make
$ sudo make install

设置环境PATH和LD_LIBRARY_PATH

手动运行下面的命令

$ export PATH=$PATH:/opt/openmpi/bin
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib/

并且将其写入~/.bashrc文件中,这样mpiexec在远程机器上运行的时候就会自动source环境了。

PATH=$PATH:/opt/openmpi/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib/
export PATH LD_LIBRARY_PATH

测试

$ cd examples     (源代码目录)

$ make

$ mpirun -np 10 hello_c
$ mpirun -np 10 ring_c

$ mpirun -np 3 printenv

在集群中运行mpi作业

首先创建一个集群机器列表,并指定每个机器的slots数,比如文件名hostfile,内如如下:

test1    slots=2
test2    slots=2

运行mpi作业

$ mpiexec --hostfile hosts -np 4 hello_c
Hello, world, I am 1 of 4, (Open MPI v1.10.3, package: Open MPI jhadmin@test1 Distribution, ident: 1.10.3, repo rev: v1.10.2-251-g9acf492, Jun 14, 2016, 124)
Hello, world, I am 0 of 4, (Open MPI v1.10.3, package: Open MPI jhadmin@test1 Distribution, ident: 1.10.3, repo rev: v1.10.2-251-g9acf492, Jun 14, 2016, 124)
Hello, world, I am 2 of 4, (Open MPI v1.10.3, package: Open MPI jhadmin@test2 Distribution, ident: 1.10.3, repo rev: v1.10.2-251-g9acf492, Jun 14, 2016, 124)
Hello, world, I am 3 of 4, (Open MPI v1.10.3, package: Open MPI jhadmin@test2 Distribution, ident: 1.10.3, repo rev: v1.10.2-251-g9acf492, Jun 14, 2016, 124)

问题及说明

如果在运行命令“mpiexec –hostfile hosts -np 4 hello_c”出现下面错误的时候

bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------

可以使用下面命令来查看是不是环境变量没有设置对,开始我就是环境变量 PATH 和 LD_LIBRARY_PATH 设置有问题,才出现上面的错误。

mpiexec --hostfile hosts -np 4 printenv

检查~/.bashrc文件,指定正确的路径即可。


转载请以链接形式标明本文链接
本文链接:http://blog.csdn.net/kongxx/article/details/52227572

你可能感兴趣的:(mpi)