当初接触nvidia显卡让我一头雾水
ubuntu系统安装、nvidia显卡驱动安装遇到的坑
什么循环桌面不能进系统,tesla k80 的算力问题 升级为现在的 GeForce RTX 2080 Ti
在了解nvidia-smi命令后
具体nvidia-smi命令详情见
进入正题
如果显卡数量在4张以上,在nvidia-smi信息后会非常的慢,非常的卡。尤其在只在乎计算量服务器的时候。
我试过把8张卡 tesla K80 显卡一个个拆下来 8张、7张 6/5/4/3/2/1 试试nvidia-smi 结果速度都一样。
需要4到5分钟时间,甚至都不出直接死机。
举例说明:
在执行TensorFlow训练推理过程调用cuda进行gpu调用时,往往会出现超时,报出错误。
在安装好CUDA、CUDNN、NVIDIA driver之后,使用mxnet框架的时候出现该错误
terminate called after throwing an instance of 'dmlc::Error'
what(): [16:42:29] /home/travis/build/dmlc/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:115: Check failed: err == CUBLAS_STATUS_SUCCESS (1 vs. 0) : Create cublas handle failed
Stack trace:
*************
[bt] (6) ~/miniconda3/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f8622101678]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f86731206ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8672e5641d]
Aborted (core dumped)
解决方法
开始的时候以为是driver没有安装好,但是使用nvidia-smi之后可以显示GPU信息,只是疑惑Persistence-M为什么是off呢。。。然后就将其状态改为ON试试,就可以了。
nvidia-smi -pm 1
进入正题如下命令,可以加速nvidia-smi的加载速度
sudo nvidia-persistenced --persistence-mode
------------------------------------------------------------------
以下是nvidia-persistenced的解释
nvidia-persistenced - A daemon to maintain persistent software state in the NVIDIA driver.
The nvidia-persistenced utility is used to enable persistent software state in the NVIDIA driver. When persistence mode is enabled, the daemon prevents the driver from releasing device state when the device is not in use. This can improve the startup time of new clients in this scenario.
OPTIONS
-v, --version Print the utility version and exit. -h, --help Print usage information for the command line options and exit. -V, --verbose Controls how much information is printed. By default, nvidia-persistenced will only print errors and warnings to syslog for unexpected events, as well as startup and shutdown notices. Specifying this flag will cause nvidia-persistenced to also print notices to syslog on state transitions, such as when persistence mode is enabled or disabled, and informational messages on startup and exit. -u USERNAME, --user=USERNAME Runs nvidia-persistenced with the user permissions of the user specified by the USERNAME argument. This user must have write access to the /var/run/nvidia-persistenced directory. If this directory does not exist, nvidia-persistenced will attempt to create it prior to changing the process user and group IDs. --persistence-mode, --no-persistence-mode By default, nvidia-persistenced starts with persistence mode disabled for all devices. Use '--persistence-mode' to force persistence mode on for all devices on startup. --nvidia-cfg-path=PATH The nvidia-cfg library is used to communicate with the NVIDIA kernel module to query and manage GPUs in the system. This library is required by nvidia-persistenced. This option tells nvidia-persistenced where to look for this library (in case it cannot find it on its own). This option should normally not be needed.
When installed by nvidia-installer , sample init scripts to start the daemon for some of the more prevalent init systems are installed as the compressed tarball /usr/share/doc/NVIDIA_GLX-1.0/sample/nvidia-persistenced-init.tar.bz2. These init scripts should be customized to the user's distribution and installed in the proper location by the user to run nvidia-persistenced on system initialization. Once the init script is installed so that the daemon is running, users should not normally need to manually interact with nvidia-persistenced: the NVIDIA management utilities, such as nvidia-smi, can communicate with it automatically as necessary to manage persistence mode. The daemon does not require root privileges to run, and may safely be run as an unprivileged user, given that its runtime directory, /var/run/nvidia-persistenced, is created for and owned by that user prior to starting the daemon. nvidia-persistenced also requires read and write access to the NVIDIA character device files. If the permissions of the device files have been altered through any of the NVreg_DeviceFileUID, NVreg_DeviceFile_GID, or NVreg_DeviceFileMode NVIDIA kernel module options, nvidia-persistenced will need to run as a suitable user. If the daemon is started with root privileges, the --user option may be used instead to indicate that the daemon should drop its privileges and run as the specified user after setting up its runtime directory. Using this option may cause the daemon to be unable to remove the /var/run/nvidia-persistenced directory when it is killed, if the specified user does not have write permissions to /var/run. In this case, directory removal should be handled by a post-execution script. See the sample init scripts provided in /usr/share/doc/NVIDIA_GLX-1.0/sample/nvidia-persistenced-init.tar.bz2 for examples of this behavior. The daemon indirectly utilizes nvidia-modprobe via the nvidia-cfg library to load the NVIDIA kernel module and create the NVIDIA character device files after the daemon has dropped its root privileges, if it had any to begin with. If nvidia-modprobe is not installed, the daemon may not be able to start properly if it is not run with root privileges. The source code to nvidia-persistenced is available here: 〈ftp://download.nvidia.com/ XFree86/nvidia-persistenced/〉
EXAMPLES
nvidia-persistenced Starts the NVIDIA Persistence Daemon with persistence mode disabled for all NVIDIA devices. nvidia-persistenced --persistence-mode Starts the NVIDIA Persistence Daemon with persistence mode enabled for all NVIDIA devices. nvidia-persistenced --user=foo Starts the NVIDIA Persistence Daemon so that it will run as user 'foo'.