搭建mpi并行运算中遇到的问题与解决方案
1,[root@localhost ~]# mpdtrace
configuration file /etc/mpd.conf is accessible by others
change permissions to allow read and write access only by you
解决:
[root@localhost ~]# chmod 600 /etc/mpd.conf
2,[root@localhost ~]# mpdboot -n 1 -f mpd.hosts
mpdboot_localhost.localdomain (handle_mpd_output 414): from mpd on localhost.localdomain, invalid port info:
no_port
解决:
是因为 mpd.conf 等文件权限问题造成的,需要设置为 600权限
3,[root@localhost ~]# mpdtrace
mpdroot: perror msg: No such file or directory
mpdroot: cannot connect to local mpd at: /tmp/mpd2.console_root
probable cause: no mpd daemon on this machine
possible cause: unix socket /tmp/mpd2.console_root has been removed
mpdtrace (__init__ 1204): forked process failed; status=255
解决:
mpdboot服务没有起来,mpdboot -n 1 -f mpd.hosts
4,在测试过程中,经常出现 mpd 进程无法与某个节点建立连接或者无法通信的问题,出现这种问题一是要检查该节点单独启动 mpd 是否成功,如果成功,则问题一般出现在防火墙的配置上
5,[root@localhost examples]# mpiexec -n 5 ./cpi
mpiexec_localhost.localdomain (mpiexec 392): no msg recvd from mpd when expecting ack of request
[root@localhost examples]# mpiexec -n 5 ./cpi
Process 3 of 5 is on localhost.localdomain
Process 4 of 5 is on localhost.localdomain
Process 0 of 5 is on localhost.localdomain
Process 1 of 5 is on localhost.localdomain
Process 2 of 5 is on localhost.localdomain
pi is approximately 3.1415926544231230, Error is 0.0000000008333298
wall clock time = 0.005338
[root@localhost examples]#
解决:可能是资源忙之类的,有的时候正常有的时候异常