nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
执行情况:
resnet50 (Model) (None, 2048) 23587712
_________________________________________________________________
dense (Dense) (None, 1024) 2098176
_________________________________________________________________
dropout (Dropout) (None, 1024) 0
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 25,687,938
Trainable params: 25,634,818
Non-trainable params: 53,120
_________________________________________________________________
Train for 2 steps, validate for 1 steps
Epoch 1/50
2020-10-12 16:46:46.980388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-12 16:46:47.348629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-12 16:46:48.700629: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720395: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720496: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{
{node model/resnet50/conv1_conv/Conv2D}}]]
2020-10-12 16:46:48.733878: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
2020-10-12 16:46:48.734055: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/downloads/libpng/lib::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2020-10-12 16:46:48.734080: W tensorflow/core/profiler/lib/profiler_session.cc:192] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.
1/2 [==============>...............] - ETA: 9s2020-10-12 16:46:48.735185: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 0 kernel records, 0 memcpy records.
2020-10-12 16:46:48.735327: E tensorflow/core/platform/default/device_tracer.cc:70] CUPTI error: CUPTI could not be loaded or symbol could not be found.
Traceback (most recent call last):
File "train.py", line 168, in
trainer.train()
File "train.py", line 131, in train
callbacks=callbacks)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
total_epochs=epochs)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
batch_outs = execution_function(iterator)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
distributed_function(input_fn))
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
result = self._call(*args, **kwds)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
return self._stateless_fn(*args, **kwds)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
self.captured_inputs)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/resnet50/conv1_conv/Conv2D (defined at /home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_23211]
Function call stack:
distributed_function
错误代码:
2020-10-12 16:46:48.700629: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720395: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720496: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{
{node model/resnet50/conv1_conv/Conv2D}}]]
2020-10-12 16:46:48.734055: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/downloads/libpng/lib::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2020-10-12 16:46:48.734080: W tensorflow/core/profiler/lib/profiler_session.cc:192] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.
2020-10-12 16:46:48.735327: E tensorflow/core/platform/default/device_tracer.cc:70] CUPTI error: CUPTI could not be loaded or symbol could not be found.
Traceback (most recent call last):
File "train.py", line 168, in
trainer.train()
File "train.py", line 131, in train
callbacks=callbacks)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
total_epochs=epochs)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
batch_outs = execution_function(iterator)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
distributed_function(input_fn))
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
result = self._call(*args, **kwds)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
return self._stateless_fn(*args, **kwds)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
self.captured_inputs)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/resnet50/conv1_conv/Conv2D (defined at /home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_23211]
经过在网上的一番查阅,在解决以下错误代码时,成功解决所有问题:
W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/downloads/libpng/lib::/usr/local/cuda/lib64:/usr/local/cuda/lib64
问题分析
解决过程:
查看系统环境变量:
$ more /etc/profile
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
if [ "$PS1" ]; then
if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then
# The file bash.bashrc already sets the default PS1.
# PS1='\h:\w\$ '
if [ -f /etc/bash.bashrc ]; then
. /etc/bash.bashrc
fi
else
if [ "`id -u`" -eq 0 ]; then
PS1='# '
else
PS1='$ '
fi
fi
fi
if [ -d /etc/profile.d ]; then
for i in /etc/profile.d/*.sh; do
if [ -r $i ]; then
. $i
fi
done
unset i
fi
export PATH=/home/ubuntu/anaconda3/bin:$PATH
export PATH=$PATH:/usr/local/cuda
export LD_LIBRARY_PATH=$LA_LIBRARY_PATH:/usr/local/cuda/lib64
export JAVA_HOME=/home/ubuntu/downloads/jdk1.8.0_121
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/ubuntu/downloads/libpng/lib:$LD_LIBRARY_PATH
ubuntu more command: more command is used to view the text files in the command prompt, displaying one screen at a time in case the file is large (For example log files). The more command also allows the user do scroll up and down through the page. The syntax along with options and command is as follows. Another application of more is to use it with some other command after a pipe. When the output is large, we can use more command to see output one by one.
ubuntu_shell_bash
/etc/profile
可以看到两条LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$LA_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=/home/ubuntu/downloads/libpng/lib:$LD_LIBRARY_PATH
进入这两个路径后发现没有 libcupti.so.10.0
$ cd /home/ubuntu/downloads/libpng/lib
(tf_v2) ubuntu@VM-0-12-ubuntu:~/downloads/libpng/lib$ ll
total 1932
drwxrwxr-x 3 ubuntu ubuntu 4096 Apr 10 2020 ./
drwxrwxr-x 6 ubuntu ubuntu 4096 Apr 10 2020 ../
-rw-r--r-- 1 ubuntu ubuntu 1252498 Apr 10 2020 libpng15.a
-rwxr-xr-x 1 ubuntu ubuntu 951 Apr 10 2020 libpng15.la*
lrwxrwxrwx 1 ubuntu ubuntu 19 Apr 10 2020 libpng15.so -> libpng15.so.15.30.0*
lrwxrwxrwx 1 ubuntu ubuntu 19 Apr 10 2020 libpng15.so.15 -> libpng15.so.15.30.0*
-rwxr-xr-x 1 ubuntu ubuntu 706016 Apr 10 2020 libpng15.so.15.30.0*
lrwxrwxrwx 1 ubuntu ubuntu 10 Apr 10 2020 libpng.a -> libpng15.a
lrwxrwxrwx 1 ubuntu ubuntu 11 Apr 10 2020 libpng.la -> libpng15.la*
lrwxrwxrwx 1 ubuntu ubuntu 11 Apr 10 2020 libpng.so -> libpng15.so*
drwxrwxr-x 2 ubuntu ubuntu 4096 Apr 10 2020 pkgconfig/
(tf_v2) ubuntu@VM-0-12-ubuntu:~/downloads/libpng/lib$ cd /usr/local/cuda/lib64
(tf_v2) ubuntu@VM-0-12-ubuntu:/usr/local/cuda/lib64$ ll
total 3057452
drwxr-xr-x 3 root root 4096 Jun 14 00:49 ./
drwxr-xr-x 19 root root 4096 Jun 14 00:48 ../
lrwxrwxrwx 1 root root 19 Jun 14 00:47 libaccinj64.so -> libaccinj64.so.10.0*
lrwxrwxrwx 1 root root 23 Jun 14 00:47 libaccinj64.so.10.0 -> libaccinj64.so.10.0.130*
-rwxr-xr-x 1 root root 7407024 Jun 14 00:47 libaccinj64.so.10.0.130*
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libcublas.so -> libcublas.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libcublas.so.10.0 -> libcublas.so.10.0.130*
-rwxr-xr-x 1 root root 70796360 Jun 14 00:47 libcublas.so.10.0.130*
-rw-r--r-- 1 root root 88190630 Jun 14 00:47 libcublas_static.a
-rw-r--r-- 1 root root 695156 Jun 14 00:47 libcudadevrt.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libcudart.so -> libcudart.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libcudart.so.10.0 -> libcudart.so.10.0.130*
-rwxr-xr-x 1 root root 495736 Jun 14 00:47 libcudart.so.10.0.130*
-rw-r--r-- 1 root root 955082 Jun 14 00:47 libcudart_static.a
-rwxr-xr-x 1 root root 391622760 Jun 14 00:49 libcudnn.so*
-rwxr-xr-x 1 root root 391622760 Jun 14 00:49 libcudnn.so.7*
-rwxr-xr-x 1 root root 391622760 Jun 14 00:49 libcudnn.so.7.6.5*
-rw-r--r-- 1 root root 390446312 Jun 14 00:49 libcudnn_static.a
lrwxrwxrwx 1 root root 16 Jun 14 00:47 libcufft.so -> libcufft.so.10.0*
lrwxrwxrwx 1 root root 20 Jun 14 00:47 libcufft.so.10.0 -> libcufft.so.10.0.145*
-rwxr-xr-x 1 root root 103177128 Jun 14 00:47 libcufft.so.10.0.145*
-rw-r--r-- 1 root root 123979550 Jun 14 00:47 libcufft_static.a
-rw-r--r-- 1 root root 109454136 Jun 14 00:47 libcufft_static_nocallback.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libcufftw.so -> libcufftw.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libcufftw.so.10.0 -> libcufftw.so.10.0.145*
-rwxr-xr-x 1 root root 561192 Jun 14 00:47 libcufftw.so.10.0.145*
-rw-r--r-- 1 root root 33250 Jun 14 00:47 libcufftw_static.a
lrwxrwxrwx 1 root root 18 Jun 14 00:47 libcuinj64.so -> libcuinj64.so.10.0*
lrwxrwxrwx 1 root root 22 Jun 14 00:47 libcuinj64.so.10.0 -> libcuinj64.so.10.0.130*
-rwxr-xr-x 1 root root 7792472 Jun 14 00:47 libcuinj64.so.10.0.130*
-rw-r--r-- 1 root root 31954 Jun 14 00:47 libculibos.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libcurand.so -> libcurand.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libcurand.so.10.0 -> libcurand.so.10.0.130*
-rwxr-xr-x 1 root root 60806128 Jun 14 00:47 libcurand.so.10.0.130*
-rw-r--r-- 1 root root 60723962 Jun 14 00:47 libcurand_static.a
lrwxrwxrwx 1 root root 19 Jun 14 00:47 libcusolver.so -> libcusolver.so.10.0*
lrwxrwxrwx 1 root root 23 Jun 14 00:47 libcusolver.so.10.0 -> libcusolver.so.10.0.130*
-rwxr-xr-x 1 root root 139257368 Jun 14 00:47 libcusolver.so.10.0.130*
-rw-r--r-- 1 root root 72147850 Jun 14 00:47 libcusolver_static.a
lrwxrwxrwx 1 root root 19 Jun 14 00:47 libcusparse.so -> libcusparse.so.10.0*
lrwxrwxrwx 1 root root 23 Jun 14 00:47 libcusparse.so.10.0 -> libcusparse.so.10.0.130*
-rwxr-xr-x 1 root root 59078736 Jun 14 00:47 libcusparse.so.10.0.130*
-rw-r--r-- 1 root root 67262190 Jun 14 00:47 libcusparse_static.a
-rw-r--r-- 1 root root 12722350 Jun 14 00:47 liblapack_static.a
-rw-r--r-- 1 root root 967976 Jun 14 00:47 libmetis_static.a
lrwxrwxrwx 1 root root 15 Jun 14 00:47 libnppc.so -> libnppc.so.10.0*
lrwxrwxrwx 1 root root 19 Jun 14 00:47 libnppc.so.10.0 -> libnppc.so.10.0.130*
-rwxr-xr-x 1 root root 553320 Jun 14 00:47 libnppc.so.10.0.130*
-rw-r--r-- 1 root root 26216 Jun 14 00:47 libnppc_static.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libnppial.so -> libnppial.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libnppial.so.10.0 -> libnppial.so.10.0.130*
-rwxr-xr-x 1 root root 10556304 Jun 14 00:47 libnppial.so.10.0.130*
-rw-r--r-- 1 root root 14040866 Jun 14 00:47 libnppial_static.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libnppicc.so -> libnppicc.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libnppicc.so.10.0 -> libnppicc.so.10.0.130*
-rwxr-xr-x 1 root root 3956688 Jun 14 00:47 libnppicc.so.10.0.130*
-rw-r--r-- 1 root root 4450012 Jun 14 00:47 libnppicc_static.a
lrwxrwxrwx 1 root root 18 Jun 14 00:47 libnppicom.so -> libnppicom.so.10.0*
lrwxrwxrwx 1 root root 22 Jun 14 00:47 libnppicom.so.10.0 -> libnppicom.so.10.0.130*
-rwxr-xr-x 1 root root 1348432 Jun 14 00:47 libnppicom.so.10.0.130*
-rw-r--r-- 1 root root 949372 Jun 14 00:47 libnppicom_static.a
lrwxrwxrwx 1 root root 18 Jun 14 00:47 libnppidei.so -> libnppidei.so.10.0*
lrwxrwxrwx 1 root root 22 Jun 14 00:47 libnppidei.so.10.0 -> libnppidei.so.10.0.130*
-rwxr-xr-x 1 root root 7215416 Jun 14 00:47 libnppidei.so.10.0.130*
-rw-r--r-- 1 root root 9207224 Jun 14 00:47 libnppidei_static.a
lrwxrwxrwx 1 root root 16 Jun 14 00:47 libnppif.so -> libnppif.so.10.0*
lrwxrwxrwx 1 root root 20 Jun 14 00:47 libnppif.so.10.0 -> libnppif.so.10.0.130*
-rwxr-xr-x 1 root root 47194064 Jun 14 00:47 libnppif.so.10.0.130*
-rw-r--r-- 1 root root 51287500 Jun 14 00:47 libnppif_static.a
lrwxrwxrwx 1 root root 16 Jun 14 00:47 libnppig.so -> libnppig.so.10.0*
lrwxrwxrwx 1 root root 20 Jun 14 00:47 libnppig.so.10.0 -> libnppig.so.10.0.130*
-rwxr-xr-x 1 root root 25033264 Jun 14 00:47 libnppig.so.10.0.130*
-rw-r--r-- 1 root root 27175822 Jun 14 00:47 libnppig_static.a
lrwxrwxrwx 1 root root 16 Jun 14 00:47 libnppim.so -> libnppim.so.10.0*
lrwxrwxrwx 1 root root 20 Jun 14 00:47 libnppim.so.10.0 -> libnppim.so.10.0.130*
-rwxr-xr-x 1 root root 6197800 Jun 14 00:47 libnppim.so.10.0.130*
-rw-r--r-- 1 root root 6150298 Jun 14 00:47 libnppim_static.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libnppist.so -> libnppist.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libnppist.so.10.0 -> libnppist.so.10.0.130*
-rwxr-xr-x 1 root root 16604560 Jun 14 00:47 libnppist.so.10.0.130*
-rw-r--r-- 1 root root 18732154 Jun 14 00:47 libnppist_static.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libnppisu.so -> libnppisu.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libnppisu.so.10.0 -> libnppisu.so.10.0.130*
-rwxr-xr-x 1 root root 544592 Jun 14 00:47 libnppisu.so.10.0.130*
-rw-r--r-- 1 root root 10690 Jun 14 00:47 libnppisu_static.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libnppitc.so -> libnppitc.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libnppitc.so.10.0 -> libnppitc.so.10.0.130*
-rwxr-xr-x 1 root root 2884112 Jun 14 00:47 libnppitc.so.10.0.130*
-rw-r--r-- 1 root root 3145290 Jun 14 00:47 libnppitc_static.a
lrwxrwxrwx 1 root root 15 Jun 14 00:47 libnpps.so -> libnpps.so.10.0*
lrwxrwxrwx 1 root root 19 Jun 14 00:47 libnpps.so.10.0 -> libnpps.so.10.0.130*
-rwxr-xr-x 1 root root 8424408 Jun 14 00:47 libnpps.so.10.0.130*
-rw-r--r-- 1 root root 9580914 Jun 14 00:47 libnpps_static.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libnvblas.so -> libnvblas.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libnvblas.so.10.0 -> libnvblas.so.10.0.130*
-rwxr-xr-x 1 root root 596080 Jun 14 00:47 libnvblas.so.10.0.130*
lrwxrwxrwx 1 root root 18 Jun 14 00:47 libnvgraph.so -> libnvgraph.so.10.0*
lrwxrwxrwx 1 root root 22 Jun 14 00:47 libnvgraph.so.10.0 -> libnvgraph.so.10.0.130*
-rwxr-xr-x 1 root root 88921848 Jun 14 00:47 libnvgraph.so.10.0.130*
-rw-r--r-- 1 root root 186926198 Jun 14 00:47 libnvgraph_static.a
lrwxrwxrwx 1 root root 17 Jun 14 00:47 libnvjpeg.so -> libnvjpeg.so.10.0*
lrwxrwxrwx 1 root root 21 Jun 14 00:47 libnvjpeg.so.10.0 -> libnvjpeg.so.10.0.130*
-rwxr-xr-x 1 root root 1089608 Jun 14 00:47 libnvjpeg.so.10.0.130*
-rw-r--r-- 1 root root 1001070 Jun 14 00:47 libnvjpeg_static.a
lrwxrwxrwx 1 root root 25 Jun 14 00:47 libnvrtc-builtins.so -> libnvrtc-builtins.so.10.0*
lrwxrwxrwx 1 root root 29 Jun 14 00:47 libnvrtc-builtins.so.10.0 -> libnvrtc-builtins.so.10.0.130*
-rwxr-xr-x 1 root root 4612768 Jun 14 00:47 libnvrtc-builtins.so.10.0.130*
lrwxrwxrwx 1 root root 16 Jun 14 00:47 libnvrtc.so -> libnvrtc.so.10.0*
lrwxrwxrwx 1 root root 20 Jun 14 00:47 libnvrtc.so.10.0 -> libnvrtc.so.10.0.130*
-rwxr-xr-x 1 root root 20332456 Jun 14 00:47 libnvrtc.so.10.0.130*
lrwxrwxrwx 1 root root 18 Jun 14 00:47 libnvToolsExt.so -> libnvToolsExt.so.1*
lrwxrwxrwx 1 root root 22 Jun 14 00:47 libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0*
-rwxr-xr-x 1 root root 37240 Jun 14 00:47 libnvToolsExt.so.1.0.0*
lrwxrwxrwx 1 root root 14 Jun 14 00:47 libOpenCL.so -> libOpenCL.so.1*
lrwxrwxrwx 1 root root 16 Jun 14 00:47 libOpenCL.so.1 -> libOpenCL.so.1.1*
-rwxr-xr-x 1 root root 27096 Jun 14 00:47 libOpenCL.so.1.1*
drwxr-xr-x 2 root root 4096 Jun 14 00:47 stubs/
ubuntu ll command: this command is used to list the detail information of files and folder of a current directory.
最终在/usr/local/cuda-10.0/extras/CUPTI/lib64/下找到了cuda-gdb-10.0.130.src.tar.gz
(base) ubuntu@VM-0-12-ubuntu:~$ cd /usr/local/cuda-10.0/extras/CUPTI/lib64/
(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ ll
total 6072
drwxr-xr-x 2 root root 4096 Jun 14 00:47 ./
drwxr-xr-x 5 root root 4096 Jun 14 00:47 ../
lrwxrwxrwx 1 root root 16 Jun 14 00:47 libcupti.so -> libcupti.so.10.0*
lrwxrwxrwx 1 root root 20 Jun 14 00:47 libcupti.so.10.0 -> libcupti.so.10.0.130*
-rwxr-xr-x 1 root root 6207480 Jun 14 00:47 libcupti.so.10.0.130*
因为但前不没有root权限,为当前用户新增环境变量,首先查看当前用户环境变量
(tf_v2) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ more ~/.bash_profile
if test -f .bashrc ; then
source .bashrc
fi
(tf_v2) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ more /etc/profile
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
if [ "$PS1" ]; then
if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then
# The file bash.bashrc already sets the default PS1.
# PS1='\h:\w\$ '
if [ -f /etc/bash.bashrc ]; then
. /etc/bash.bashrc
fi
else
if [ "`id -u`" -eq 0 ]; then
PS1='# '
else
PS1='$ '
fi
fi
fi
if [ -d /etc/profile.d ]; then
for i in /etc/profile.d/*.sh; do
if [ -r $i ]; then
. $i
fi
done
unset i
fi
export PATH=/home/ubuntu/anaconda3/bin:$PATH
export PATH=$PATH:/usr/local/cuda
export LD_LIBRARY_PATH=$LA_LIBRARY_PATH:/usr/local/cuda/lib64
export JAVA_HOME=/home/ubuntu/downloads/jdk1.8.0_121
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/ubuntu/downloads/libpng/lib:$LD_LIBRARY_PATH
通过vi键入
(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ vi ~/.bashrc
new a line:
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/cuda-10.0/extras/CUPTI/lib64
验证是否添加成功:
(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ source ~/.bashrc
(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ echo $LD_LIBRARY_PATH
LD_LIBRARY_PATH:/usr/local/cuda-10.0/extras/CUPTI/lib64
以后每次新建一个server session之后,都执行以下命令激活用户环境变量,再进行模型训练:
$ source ~/.bashrc
至此该问题成功解决!