深腾7000上用bsub命令提交MPI作业

小规模作业运行在厚节点队列上,配置:

这三个队列节点都一样,16路4核 共64核 Xeon X7350 2.93GHz, 512G内存
x64_small 共2个节点 1-8核 6小时
x64_3950 共5个节点 1-64核 6小时
x64_3950_long 共11个节点 1-64核 144小时 

x64_small就是用来给小作业运行的。
如果需要运行稍大作业的话,不妨用x64_3950或x64_3950_long,事实上这两个队列的资源使用情况相对x64_small更为空闲一 些

中大规模作业运行在刀片队列,限制核心数>= 64


作业运行时间1分钟,需要2个CPU核心,单个节点上使用1个CPU核心,提交到x64_small 队列,标准输出文件为zlt.out,错误输出文件为zlt.err,运行程序名为comm:

[scwangj@LB270108 zjl]$ bsub -W 1 -a intemmpi -n 2 -R "span[ptile=1]" -q x64_small -o zlt.out -e zlt.err mpirun.lsf ./comm
Job <78607> is submitted to queue <x64_small>.


一次提交多个作业,写个bash脚本submit.sh[其实不必这样,命令叠加的方式也不错]:

#!/bin/bash
for i in 50 60 70 80 90 100
do
    bsub -W 6 -a intemmpi -n $i -R span[ptile=1] -q x64_blades -o $i.out -e $i.err ./matrix
done


查看作业:

[scwangj@LB270210 zl]$ bjobs -u scwangj
JOBID    USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
82504    scwangj PEND  x64_blades lb270210                ./matrix   Jul  4 12:47
82505    scwangj PEND  x64_blades lb270210                ./matrix   Jul  4 12:47
82506    scwangj PEND  x64_blades lb270210                ./matrix   Jul  4 12:47
82507    scwangj PEND  x64_blades lb270210                ./matrix   Jul  4 12:47
82487    scwangj PEND  x64_small  lb270210                ./matrix   Jul  4 12:35
[scwangj@LB270210 zl]$ 


附:

[scwangj@v3903 20x20x100]$ cat submit.sh 
#!/bin/bash
for i in 1 2 3 4 5 6 7 8
do
    bsub -W 5:40 -a intelmpi -n $i -R span[ptile=2] -q x64_small -o $i.out -e $i.err mpirun.lsf ./simple
done
[scwangj@v3903 20x20x100]$ cd ..
[scwangj@v3903 ddm]$ ls
10.err  18.err  1.err  20x20x100  2.out  9.out  bsubmpi   ddm.sh  fluid.grd   serial  solveuss.F  solvewss.F  stagsimple.F  submit.log  tdma.F    uc.fun  variable.mod
10.out  18.out  1.out  2.err      9.err  a.sh   bsub.txt  del.sh  ppoisson.F  simple  solvevss.F  s.sh        submit2.sh    submit.sh   time.dat  uc.nam
[scwangj@v3903 ddm]$ cat submit.sh 
#!/bin/bash
for i in 1 2 3 4 5 6 7 8
do
    bsub -W 5:40 -a intelmpi -n $i -R span[ptile=2] -q x64_small -o $i.out -e $i.err mpirun.lsf ./simple
done
[scwangj@v3903 ddm]$ cat submit2.sh 
#!/bin/bash
for i in  9 10 11 12 13 14 15 16 17 18
do
    bsub -W 5:40 -a intelmpi -n $i -R span[ptile=9] -q x64_3950 -o $i.out -e $i.err mpirun.lsf ./simple
done
[scwangj@v3903 ddm]$ bjobs -u scwangj
JOBID    USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
82725    scwangj RUN   x64_3950   v3903       9*t3701     * ./simple Jul  4 20:39
                                              2*t3601
82726    scwangj RUN   x64_3950   v3903       9*t3802     * ./simple Jul  4 20:39
                                              3*t4102
82727    scwangj RUN   x64_3950   v3903       9*t3701     * ./simple Jul  4 20:39
                                              4*t3802
82728    scwangj RUN   x64_3950   v3903       9*t3601     * ./simple Jul  4 20:39
                                              5*t4102
82729    scwangj RUN   x64_3950   v3903       9*t3701     * ./simple Jul  4 20:39
                                              6*t3802
82730    scwangj RUN   x64_3950   v3903       9*t3601     * ./simple Jul  4 20:39
                                              7*t4102
82731    scwangj RUN   x64_3950   v3903       9*t3701     * ./simple Jul  4 20:39
                                              8*t3802
82745    scwangj RUN   x64_3950   v3903       9*t3701     * ./simple Jul  4 20:41
82634    scwangj RUN   x64_small  v3903       t4601       * ./simple Jul  4 16:59
82635    scwangj RUN   x64_small  v3903       1*t4601     * ./simple Jul  4 16:59
                                              1*t3701
82710    scwangj RUN   x64_small  v3903       t4601       * ./simple Jul  4 20:37
82711    scwangj RUN   x64_small  v3903       2*t3701     * ./simple Jul  4 20:37
82746    scwangj PEND  x64_3950   v3903                   * ./simple Jul  4 20:41
82619    scwangj PEND  x64_small  lb270210                * ./matrix Jul  4 16:53
[scwangj@v3903 ddm]$ 


你可能感兴趣的:(脚本,user,bash,360,作业)