Slurm提交MPI作业

Slurm提交MPI作业

首先准备一个MPI程序,这里使用python语言的mpi4py库写了一个

helloworld.py

#!/usr/bin/env python
"""
Parallel Hello World
"""

from mpi4py import MPI
import sys
import time

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

sys.stdout.write("Hello, World! I am process %d of %d on %s.\n" % (rank, size, name))
time.sleep(300)

Slurm提交作业脚本

helloworld.sh

#!/bin/sh

#SBATCH -o /apps/mpi/myjob.out
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
mpirun python /apps/mpi/helloworld.py

Slurm提交MPI作业

$ sbatch helloworld.sh

查看MPI作业信息

查看MPI作业状态

$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                40   control hellowor  jhadmin  R       3:06      2 centos6x[1-2]

查看MPI作业详细信息

$ scontrol show jobs
JobId=40 JobName=helloworld.sh
   UserId=jhadmin(500) GroupId=jhadmin(500) MCS_label=N/A
   Priority=4294901724 Nice=0 Account=(null) QOS=(null)
   JobState=COMPLETED Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:05:01 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2016-09-12T04:27:00 EligibleTime=2016-09-12T04:27:00
   StartTime=2016-09-12T04:27:00 EndTime=2016-09-12T04:32:01 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=control AllocNode:Sid=centos6x1:2239
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=centos6x[1-2]
   BatchHost=centos6x1
   NumNodes=2 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,node=2
   Socks/Node=* NtasksPerN:B:S:C=2:0:*:* CoreSpec=*
   MinCPUsNode=2 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/apps/mpi/helloworld.sh
   WorkDir=/apps/mpi
   StdErr=/apps/mpi/myjob.out
   StdIn=/dev/null
   StdOut=/apps/mpi/myjob.out
   Power=

MPI输出信息

$ cat /apps/mpi/myjob.out 
srun: cluster configuration lacks support for cpu binding
Hello, World! I am process 0 of 4 on centos6x1.
Hello, World! I am process 1 of 4 on centos6x1.
Hello, World! I am process 2 of 4 on centos6x2.
Hello, World! I am process 3 of 4 on centos6x2.

作业进程信息

centos6x1

pstree -apl 6290
slurmstepd,6290    
  ├─slurm_script,6294 /tmp/slurmd/job00040/slurm_script
  │   └─mpirun,6295 python /apps/mpi/helloworld.py
  │       ├─python,6306 /apps/mpi/helloworld.py
  │       │   └─{python},6309
  │       ├─python,6307 /apps/mpi/helloworld.py
  │       │   └─{python},6308
  │       ├─srun,6297 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid37944
  │       │   ├─srun,6300 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid37944
  │       │   ├─{srun},6301
  │       │   ├─{srun},6302
  │       │   └─{srun},6303
  │       └─{mpirun},6296
  ├─{slurmstepd},6292
  └─{slurmstepd},6293

centos6x2

pstree -apl 4655
slurmstepd,4655  
  ├─orted,4660 -mca orte_ess_jobid 3794403328 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri"3794403
  │   ├─python,4663 /apps/mpi/helloworld.py
  │   │   └─{python},4665
  │   └─python,4664 /apps/mpi/helloworld.py
  │       └─{python},4666
  ├─{slurmstepd},4657
  ├─{slurmstepd},4658
  └─{slurmstepd},4659

另一种方式提交MPI作业

$ salloc -n 8 mpiexec python /apps/mpi/helloworld.py
...
Hello, World! I am process 1 of 8 on centos6x1.
Hello, World! I am process 0 of 8 on centos6x1.
Hello, World! I am process 3 of 8 on centos6x1.
Hello, World! I am process 2 of 8 on centos6x1.
Hello, World! I am process 4 of 8 on centos6x2.
Hello, World! I am process 6 of 8 on centos6x2.
Hello, World! I am process 7 of 8 on centos6x2.
Hello, World! I am process 5 of 8 on centos6x2.

作业进程信息

centos6x1

$ pstree -apl 8212
salloc,8212 -n 8 mpiexec python /apps/mpi/helloworld.py
  ├─mpiexec,8216 python /apps/mpi/helloworld.py
  │   ├─python,8227 /apps/mpi/helloworld.py
  │   │   └─{python},8231
  │   ├─python,8228 /apps/mpi/helloworld.py
  │   │   └─{python},8232
  │   ├─python,8229 /apps/mpi/helloworld.py
  │   │   └─{python},8233
  │   ├─python,8230 /apps/mpi/helloworld.py
  │   │   └─{python},8234
  │   ├─srun,8218 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid36682
  │   │   ├─srun,8221 --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=1 --nodelist=centos6x2 --ntasks=1 orted -mca orte_ess_jobid36682
  │   │   ├─{srun},8222
  │   │   ├─{srun},8223
  │   │   └─{srun},8224
  │   └─{mpiexec},8217
  └─{salloc},8213

centos6x2

$ pstree -apl 6356
slurmstepd,6356  
  ├─orted,6369 -mca orte_ess_jobid 3668246528 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri"3668246
  │   ├─python,6372 /apps/mpi/helloworld.py
  │   │   └─{python},6376
  │   ├─python,6373 /apps/mpi/helloworld.py
  │   │   └─{python},6378
  │   ├─python,6374 /apps/mpi/helloworld.py
  │   │   └─{python},6377
  │   └─python,6375 /apps/mpi/helloworld.py
  │       └─{python},6379
  ├─{slurmstepd},6366
  ├─{slurmstepd},6367
  └─{slurmstepd},6368

转载请以链接形式标明本文链接
本文链接:http://blog.csdn.net/kongxx/article/details/52592677

你可能感兴趣的:(SLURM)