Centos-7 5节点 MPI集群配置

一、环境说明

主机名

IP

内存

IOS

硬盘

CPU核数

hw-master

10.2.152.230

512G

Centos-7.5

2*6T

64

hw-node01

10.2.152.231

512G

Centos-7.5

2*6T

64

hw-node02

10.2.152.232

512G

Centos-7.5

2*6T

64

hw-node03

10.2.152.233

512G

Centos-7.5

2*6T

64

hw-node04

10.2.152.234

512G

Centos-7.5

2*6T

64

二、安装MPICH

(1)下载源码 :http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz

(2)编译安装

$:tar -xzvf mpich-3.1.4.tar.gz -C /home/share/mpich
$:cd /home/share/mpich/mpich-3.1.4
$:./configure --prefix=/usr/local/mpich
$:make && make install

(3)安装后加入环境变量到/etc/profile文件,并执行 source /etc/profile

PATH=$PATH:/usr/local/mpich/bin
MANPATH=$MANPATH:/usr/local/mpich/man
export PATH MANPATH

(4)单节点测试

  • 复制源代码包下的examples目录到安装目录下
$:cd /home/share/mpich/mpich-3.1.4
$:cp -r examples/ /usr/local/mpich
  • 执行测试
$:cd /usr/local/mpich
$:mpirun -np 10 ./examples/cpi
输出:
Process 1 of 10 is on hw-master
Process 2 of 10 is on hw-master
Process 3 of 10 is on hw-master
Process 4 of 10 is on hw-master
Process 5 of 10 is on hw-master
Process 6 of 10 is on hw-master
Process 7 of 10 is on hw-master
Process 8 of 10 is on hw-master
Process 9 of 10 is on hw-master
Process 0 of 10 is on hw-master
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.000058

三、配置集群

(1)配置ssh免密码登录

$:ssh-keygen -t rsa
$:ssh-copy-id [email protected]
$:ssh-copy-id [email protected]
$:ssh-copy-id [email protected]
$:ssh-copy-id [email protected]
$:ssh-copy-id [email protected]

(2)关闭防火墙和SELinux

关闭防火墙
启动: 		systemctl start firewalld
关闭: 		systemctl stop firewalld
查看状态: 	systemctl status firewalld 
开机禁用:	systemctl disable firewalld
开机启用: 	systemctl enable firewalld
关闭Selinux
查看
	$: getenforce
	    Disabled表示关闭;Enforcing表示开启
	$:/usr/sbin/sestatus  -v
	    SELinux status:            disabled
临时关闭
	setenforce 1   //设置SELinux 成为enforcing模式 (开)
	setenforce 0   //设置SELinux 成为permissive模式(关)
永久关闭
	$:vi  /etc/selinux/config
将SELINUX=enforcing改为SELINUX=disabled 
	设置后需要重启才能生效

(3)修改/etc/hosts文件

$:cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.2.152.230	hw-master
10.2.152.231	hw-node01
10.2.152.232	hw-node02
10.2.152.233	hw-node03
10.2.152.234	hw-node04

(4)在主节点的目录下增加servers文件,记录集群的机器名和对应的进程数

$:cd /home/share/mpich
$:cat servers 
hw-master:2
hw-node01:2
hw-node02:2
hw-node03:2
hw-node04:2

(5)把5个节点的/usr/local/mpich/example/cpi这个计算圆周率的可执行文件复制到/home/share/mpich目录下

(6)在主节点的/home/share/mpich目录下执行

$:mpiexec -n 10 -f servers ./cpi
输出:
Process 6 of 10 is on hw-node03
Process 0 of 10 is on hw-master
Process 7 of 10 is on hw-node03
Process 1 of 10 is on hw-master
Process 8 of 10 is on hw-node04
Process 9 of 10 is on hw-node04
Process 2 of 10 is on hw-node01
Process 3 of 10 is on hw-node01
Process 4 of 10 is on hw-node02
Process 5 of 10 is on hw-node02
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.002362

到此说明集群已经搭建成功!!!!

 

你可能感兴趣的:(运维)