MPI的安装配置问题汇总

 一、IA64上配MPICH2遇到的一些问题:

 

(1)/home/zhxue/mpich2-1.1.1p1/configure  -prefix=/opt/app/mpich2/ 2>&1 tee c.txt

 

configure: error:
The nemesis channel was selected yet no native atomic primitives are
available on this platform.  OpenPA can emulate atomic primitives using
locks by specifying --with-atomic-primitives=no but performance will be
very poor.  This override should only be specified for correctness
testing purposes.
configure: error: /home/zhxue/mpich2-1.1.1p1/src/mpid/ch3/channels/nemesis/configure failed for channels/nemesis
configure: error: Configure of src/mpid/ch3 failed!

 

解决办法如下:

http://trac.mcs.anl.gov/projects/mpich2/ticket/764

 

(2)When I  run the follwing command, it prompts error messages:

 

 [root@c2402 root]# mpdboot -n 2 -f /opt/app/mpd.hosts
mpdboot_c2402 (handle_mpd_output 415): failed to connect to mpd on c2403

 

It fails since firewall prevent mpd. You can set an port arrange in your mpd.conf file, and open the range in

/etc/sysconfig/iptables, and then service restart iptables . The mpd.conf file looks like the follwing:

 
MPD_PORT_RANGE=55000:56000

 

 

 

 

When you encounter "no port" error message, please ensure you have installed python 2.6 or above version.

 

 

In addition, this command will launch other nodes in the mpd.hosts. When you execute mpd &  on other nodes, it will prompt error message when you use mpdboot command.

 

 

 二、在Mellanox上配MVAPICH2遇到的一些问题:

 (1)总是提示找不到网卡驱动

后来安装了OFED,这个程序把网卡驱动还有MVAPICH2等全部装上了,但必须在2.6.18内核上装,其他内核没装上。

 

(2)节点之间无法通讯

在每个节点上:

service openibd start

service opensmd start

就可以了。之前没启动opensmd,总是只能和自己通讯,无法和另外一个节点通讯。

 

(3)root用户可以,非root不行的问题

无密码互通配置好后,用root用户可以运行,zhxue用户不行

后来找到真正的原因了:

在/etc/security/limits.conf中加入如下:

 

#begin by zhxue


*               soft    memlock          unlimited
*               hard    memlock          unlimited

#end by zhxue

 

 

运行如下命令成功

[zhxue@mpi002 /]$ /usr/mpi/gcc/mvapich2-1.6/bin/mpiexec   -np 50  -hosts mpi002,mpi006  /home/zhxue/mpiprog/cpi

 

Process 2 of 50 is on mpi002
Process 6 of 50 is on mpi002
Process 11 of 50 is on mpi006
Process 10 of 50 is on mpi002

。。。。。。。。。。。。。。。

Process 17 of 50 is on mpi006
Process 16 of 50 is on mpi002
pi is approximately 3.1415926544231274, Error is 0.0000000008333343
wall clock time = 0.830070

 

 

 

 

 

(4)mpirun_rsh命令:mpi006(本地安装)节点可以,mpi002(无盘系统mpi006的完全拷贝)节点不行

 

 

[zhxue@mpi006 mvapich2-1.6]$ /usr/mpi/gcc/mvapich-1.2.0/bin/mpirun_rsh  -hostfile /home/zhxue/mpiprog/mpi.hosts  -np 2   /home/zhxue/mpiprog/cpi
Process 0 of 1 is on mpi006
Process 0 of 1 is on mpi002
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000500
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000524

 

 

但这个结果并不是把一个任务分成多个进程放到多个节点上运行。。。。

 

[zhxue@mpi002 /]$ /usr/mpi/gcc/mvapich-1.2.0/bin/mpirun_rsh  -hostfile /home/zhxue/mpiprog/mpi.hosts  -np 2   /home/zhxue/mpiprog/cpi

Child exited abnormally!
Killing remote processes...Signal 15 received.
DONE

 

 

 

 

想debug,于是:

[root@mpi002 mpiprog]# /usr/mpi/gcc/mvapich-1.2.0/bin/mpirun_rsh -debug -hostfile /home/zhxue/mpiprog/mpi.hosts  -np 2   /home/zhxue/mpiprog/cpi

 

debug enabled !
RSH/SSH command failed!: No such file or directory
RSH/SSH command failed!: No such file or directory

Child exited abnormally!
Killing remote processes...DONE

但是在mpi006上执行相同的命令,也会出错,与上述结果一摸一样,差点被这个误导了,debug没什么用啊。

 

 

(5)unable to change wdir 问题

 

[root@mpi002 ~]# su zhxue
[zhxue@mpi002 root]$ pwd
/root
[zhxue@mpi002 root]$ /usr/mpi/gcc/mvapich2-1.6/bin/mpiexec   -np 10  -hosts mpi002,mpi006  /home/zhxue/mpiprog/cpi
[proxy:0:0@mpi002] launch_procs (./pm/pmiserv/pmip_cb.c:665): unable to change wdir to /root (Permission denied)
Killed

 解决方案:

 [zhxue@mpi002 root]$ cd /home/zhxue
[zhxue@mpi002 ~]$ /usr/mpi/gcc/mvapich2-1.6/bin/mpiexec   -np 2  -hosts mpi002,mpi006  /home/zhxue/mpiprog/cpi
Process 0 of 2 is on mpi002
Process 1 of 2 is on mpi006
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.000290

 

 

 

你可能感兴趣的:(python,command,File,service,performance,通讯)