slurm交互式提交作业

文章目录

  • 申请CPU
  • 申请GPU

申请CPU

集群的计算节点默认不允许用户直接登录,对需要交互式处理的程序,在登录到集群后,使用salloc命令分配节点,然后再ssh到分配的节点上进行处理:

[jessy@workstation ~]$ salloc
salloc: Granted job allocation 684
salloc: Waiting for resource configuration
salloc: Nodes cpu1 are ready for job
[jessy@workstation ~]$ ssh cpu1
Warning: Permanently added 'cpu1,192.168.0.3' (ECDSA) to the list of known hosts.
Last login: Tue Dec  6 20:02:41 2022 from 192.168.0.1
[jessy@cpu1 ~]$ exit
logout
Connection to cpu1 closed.
[jessy@workstation ~]$ exit
exit
salloc: Relinquishing job allocation 684
[jessy@workstation ~]$ 

计算完成后,使用exit命令退出节点,注意需要exit两次,第一次exit是从计算节点退出到登录节点第二次exit是释放所申请的资源。

申请GPU

GPU申请方式同样有两种

  1. 通过sbatch提交作业的方式
    这种方式在我的另一篇文章里已经写过:https://blog.csdn.net/qq_43718758/article/details/128129733
  2. 使用salloc交互运行作业
    这里主要介绍如何salloc申请GPU
[jessy@workstation ~]$ salloc -N 1 -n 1 --gres=gpu:1 -t 1:00:00 -p gpu
salloc: Granted job allocation 689
salloc: Waiting for resource configuration
salloc: Nodes gpu1 are ready for job
[jessy@workstation ~]$ ssh gpu1
Warning: Permanently added 'gpu1,192.168.0.4' (ECDSA) to the list of known hosts.
[jessy@gpu1 ~]$ nvidia-smi
Wed Dec  7 17:03:44 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 308...  Off  | 00000000:3E:00.0 Off |                  N/A |
| 30%   30C    P8    10W / 350W |      0MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[jessy@gpu1 ~]$ 

输入:salloc -N 1 -n 1 --gres=gpu:1 -t 1:00:00 -p gpu
输入:ssh gpu1进入gpu,
此时,输入nvidia-smi就可以查看显卡了
同样的,退出时需要输入两次exit

[jessy@gpu1 ~]$ exit
logout
Connection to gpu1 closed.
[jessy@workstation ~]$ exit
exit
salloc: Relinquishing job allocation 691
[jessy@workstation ~]$ 

以下是申请gpu命令的几个变形

salloc -N 1 -n 2 --gres=gpu:1 -p gpu

你可能感兴趣的:(集群,linux,运维,服务器)