NVIDIA当“No running processes found”时,为什么我的CUDA GPU-Util~70%?——开启Persistence

问题:

在配置了2个Tesla K80卡的系统之后,我注意到在运行时nvidia-smi4个GPU中的一个处于高负荷状态,尽管“没有找到正在运行的进程”。为什么会发生这种情况,我该如何纠正?

以下是来自的输出nvidia-smi

➜  compute-0-1: ~/> nvidia-smi
Mon Sep 26 14:48:00 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.77                 Driver Version: 361.77                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:05:00.0     Off |                    0 |
| N/A   34C    P0    57W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 0000:06:00.0     Off |                    0 |
| N/A   26C    P0    76W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 0000:85:00.0     Off |                    0 |
| N/A   33C    P0    60W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 0000:86:00.0     Off |                    0 |
| N/A   24C    P0    74W / 149W |      0MiB / 11441MiB |     71%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

回答:

这个nvidia论坛解决了这个问题。要更正此问题,请启用持久性模式(Persistence Mode):

sudo nvidia-smi -pm 1

运行此命令后,nvidia-smi结果如下:

➜  compute-0-1: ~/> nvidia-smi            Mon Sep 26 14:55:21 2016    
Mon Sep 26 14:55:21 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.77                 Driver Version: 361.77                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:05:00.0     Off |                    0 |
| N/A   36C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:06:00.0     Off |                    0 |
| N/A   28C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:85:00.0     Off |                    0 |
| N/A   37C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:86:00.0     Off |                    0 |
| N/A   27C    P8    72W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


转载自:
https://serverfault.com/questions/809038/why-is-my-cuda-gpu-util-70-when-there-are-no-running-processes-found

你可能感兴趣的:(系统)