【小伟哥AI之路】nvidia-smi之nvidia-persistenced卡顿加速详解

当初接触nvidia显卡让我一头雾水

ubuntu系统安装、nvidia显卡驱动安装遇到的坑

什么循环桌面不能进系统,tesla k80 的算力问题 升级为现在的 GeForce RTX 2080 Ti 

在了解nvidia-smi命令后

具体nvidia-smi命令详情见

【小伟哥AI之路】nvidia-smi之GPU状态监测命令详解

进入正题

如果显卡数量在4张以上,在nvidia-smi信息后会非常的慢,非常的卡。尤其在只在乎计算量服务器的时候。

我试过把8张卡 tesla K80 显卡一个个拆下来 8张、7张 6/5/4/3/2/1 试试nvidia-smi 结果速度都一样。

需要4到5分钟时间,甚至都不出直接死机。

举例说明:

在执行TensorFlow训练推理过程调用cuda进行gpu调用时,往往会出现超时,报出错误。

在安装好CUDA、CUDNN、NVIDIA driver之后,使用mxnet框架的时候出现该错误

【err】开启Persistence-M模式-Check failed: err == CUBLAS_STATUS_SUCCESS (1 vs. 0) : Create cublas handle failed

terminate called after throwing an instance of 'dmlc::Error'
  what():  [16:42:29] /home/travis/build/dmlc/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:115: Check failed: err == CUBLAS_STATUS_SUCCESS (1 vs. 0) : Create cublas handle failed
Stack trace:
  *************
  [bt] (6) ~/miniconda3/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f8622101678]
  [bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f86731206ba]
  [bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8672e5641d]
Aborted (core dumped)

解决方法

  开始的时候以为是driver没有安装好,但是使用nvidia-smi之后可以显示GPU信息,只是疑惑Persistence-M为什么是off呢。。。然后就将其状态改为ON试试,就可以了。

nvidia-smi -pm 1
进入正题如下命令,可以加速nvidia-smi的加载速度
sudo nvidia-persistenced --persistence-mode

------------------------------------------------------------------

以下是nvidia-persistenced的解释

nvidia-persistenced - A daemon to maintain persistent software state in the NVIDIA driver.
The  nvidia-persistenced utility is used to enable persistent software state in the NVIDIA
       driver.  When persistence mode is enabled, the daemon prevents the driver  from  releasing
       device  state  when  the  device  is not in use.  This can improve the startup time of new
       clients in this scenario.

OPTIONS

       -v, --version
              Print the utility version and exit.

       -h, --help
              Print usage information for the command line options and exit.

       -V, --verbose
              Controls how much information is printed. By default, nvidia-persistenced will only
              print  errors  and warnings to syslog for unexpected events, as well as startup and
              shutdown notices. Specifying this flag will cause nvidia-persistenced to also print
              notices to syslog on state transitions, such as when persistence mode is enabled or
              disabled, and informational messages on startup and exit.

       -u USERNAME, --user=USERNAME
              Runs nvidia-persistenced with the user permissions of the  user  specified  by  the
              USERNAME    argument.    This    user    must    have    write    access   to   the
              /var/run/nvidia-persistenced  directory.  If  this  directory   does   not   exist,
              nvidia-persistenced  will  attempt  to create it prior to changing the process user
              and group IDs.

       --persistence-mode, --no-persistence-mode
              By default, nvidia-persistenced starts  with  persistence  mode  disabled  for  all
              devices.  Use  '--persistence-mode' to force persistence mode on for all devices on
              startup.

       --nvidia-cfg-path=PATH
              The nvidia-cfg library is used to communicate with  the  NVIDIA  kernel  module  to
              query   and   manage   GPUs   in   the   system.   This   library  is  required  by
              nvidia-persistenced. This option tells nvidia-persistenced where to look  for  this
              library  (in case it cannot find it on its own). This option should normally not be
              needed.
When installed by nvidia-installer , sample init scripts to start the daemon for  some  of
       the   more   prevalent   init   systems   are   installed   as   the   compressed  tarball
       /usr/share/doc/NVIDIA_GLX-1.0/sample/nvidia-persistenced-init.tar.bz2.  These init scripts
       should  be  customized  to the user's distribution and installed in the proper location by
       the user to run nvidia-persistenced on system initialization.

       Once the init script is installed so that the daemon is running, users should not normally
       need  to manually interact with nvidia-persistenced: the NVIDIA management utilities, such
       as nvidia-smi, can communicate with it automatically as necessary  to  manage  persistence
       mode.

       The  daemon  does  not  require  root  privileges  to  run,  and  may  safely be run as an
       unprivileged user, given that  its  runtime  directory,  /var/run/nvidia-persistenced,  is
       created for and owned by that user prior to starting the daemon.  nvidia-persistenced also
       requires read and write access to the NVIDIA character device files.  If  the  permissions
       of   the   device  files  have  been  altered  through  any  of  the  NVreg_DeviceFileUID,
       NVreg_DeviceFile_GID,   or   NVreg_DeviceFileMode   NVIDIA    kernel    module    options,
       nvidia-persistenced will need to run as a suitable user.

       If  the  daemon  is started with root privileges, the --user option may be used instead to
       indicate that the daemon should drop its privileges and run as the  specified  user  after
       setting  up its runtime directory.  Using this option may cause the daemon to be unable to
       remove the /var/run/nvidia-persistenced directory when it is killed, if the specified user
       does  not  have  write permissions to /var/run.  In this case, directory removal should be
       handled  by  a  post-execution  script.   See  the  sample  init   scripts   provided   in
       /usr/share/doc/NVIDIA_GLX-1.0/sample/nvidia-persistenced-init.tar.bz2 for examples of this
       behavior.

       The daemon indirectly utilizes nvidia-modprobe via the  nvidia-cfg  library  to  load  the
       NVIDIA  kernel  module  and  create the NVIDIA character device files after the daemon has
       dropped its root privileges, if it had any to  begin  with.   If  nvidia-modprobe  is  not
       installed,  the  daemon  may  not  be  able  to  start properly if it is not run with root
       privileges.

       The source code to nvidia-persistenced is available here: 〈ftp://download.nvidia.com/
       XFree86/nvidia-persistenced/〉

 

EXAMPLES

       nvidia-persistenced
              Starts  the NVIDIA Persistence Daemon with persistence mode disabled for all NVIDIA
              devices.

       nvidia-persistenced --persistence-mode
              Starts the NVIDIA Persistence Daemon with persistence mode enabled for  all  NVIDIA
              devices.

       nvidia-persistenced --user=foo
              Starts the NVIDIA Persistence Daemon so that it will run as user 'foo'.

 

你可能感兴趣的:(AI人工智能)