Could not load dynamic library ‘libcupti.so.10.0‘; dlerror: libcupti.so.10.0...

环境

  • Ubuntu: 16.04
  • CUDA: 10.0
  • CUDNN: 7.6.5
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

执行情况:

resnet50 (Model)             (None, 2048)              23587712  
_________________________________________________________________
dense (Dense)                (None, 1024)              2098176   
_________________________________________________________________
dropout (Dropout)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 2050      
=================================================================
Total params: 25,687,938
Trainable params: 25,634,818
Non-trainable params: 53,120
_________________________________________________________________
Train for 2 steps, validate for 1 steps
Epoch 1/50
2020-10-12 16:46:46.980388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-10-12 16:46:47.348629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-12 16:46:48.700629: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720395: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720496: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{
    {node model/resnet50/conv1_conv/Conv2D}}]]
2020-10-12 16:46:48.733878: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.
2020-10-12 16:46:48.734055: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/downloads/libpng/lib::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2020-10-12 16:46:48.734080: W tensorflow/core/profiler/lib/profiler_session.cc:192] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.
1/2 [==============>...............] - ETA: 9s2020-10-12 16:46:48.735185: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 0 kernel records, 0 memcpy records.
2020-10-12 16:46:48.735327: E tensorflow/core/platform/default/device_tracer.cc:70] CUPTI error: CUPTI could not be loaded or symbol could not be found.
Traceback (most recent call last):
  File "train.py", line 168, in 
    trainer.train()
  File "train.py", line 131, in train
    callbacks=callbacks)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
    ctx=ctx)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node model/resnet50/conv1_conv/Conv2D (defined at /home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_23211]

Function call stack:
distributed_function

错误代码:

2020-10-12 16:46:48.700629: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720395: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-10-12 16:46:48.720496: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{
    {node model/resnet50/conv1_conv/Conv2D}}]]


2020-10-12 16:46:48.734055: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/downloads/libpng/lib::/usr/local/cuda/lib64:/usr/local/cuda/lib64
2020-10-12 16:46:48.734080: W tensorflow/core/profiler/lib/profiler_session.cc:192] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.


2020-10-12 16:46:48.735327: E tensorflow/core/platform/default/device_tracer.cc:70] CUPTI error: CUPTI could not be loaded or symbol could not be found.
Traceback (most recent call last):
  File "train.py", line 168, in 
    trainer.train()
  File "train.py", line 131, in train
    callbacks=callbacks)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
    ctx=ctx)
  File "/home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node model/resnet50/conv1_conv/Conv2D (defined at /home/ubuntu/anaconda3/envs/tf_v2/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_23211]

经过在网上的一番查阅,在解决以下错误代码时,成功解决所有问题:

W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/downloads/libpng/lib::/usr/local/cuda/lib64:/usr/local/cuda/lib64

问题分析

解决过程:

查看系统环境变量:

$ more /etc/profile
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).

if [ "$PS1" ]; then
  if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then
    # The file bash.bashrc already sets the default PS1.
    # PS1='\h:\w\$ '
    if [ -f /etc/bash.bashrc ]; then
      . /etc/bash.bashrc
    fi
  else
    if [ "`id -u`" -eq 0 ]; then
      PS1='# '
    else
      PS1='$ '
    fi
  fi
fi

if [ -d /etc/profile.d ]; then
  for i in /etc/profile.d/*.sh; do
    if [ -r $i ]; then
      . $i
    fi
  done
  unset i
fi
export PATH=/home/ubuntu/anaconda3/bin:$PATH
export PATH=$PATH:/usr/local/cuda
export LD_LIBRARY_PATH=$LA_LIBRARY_PATH:/usr/local/cuda/lib64
export JAVA_HOME=/home/ubuntu/downloads/jdk1.8.0_121
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/ubuntu/downloads/libpng/lib:$LD_LIBRARY_PATH

ubuntu more command: more command is used to view the text files in the command prompt, displaying one screen at a time in case the file is large (For example log files). The more command also allows the user do scroll up and down through the page. The syntax along with options and command is as follows. Another application of more is to use it with some other command after a pipe. When the output is large, we can use more command to see output one by one.

ubuntu_shell_bash

/etc/profile

可以看到两条LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$LA_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=/home/ubuntu/downloads/libpng/lib:$LD_LIBRARY_PATH

进入这两个路径后发现没有 libcupti.so.10.0

$ cd /home/ubuntu/downloads/libpng/lib
(tf_v2) ubuntu@VM-0-12-ubuntu:~/downloads/libpng/lib$ ll
total 1932
drwxrwxr-x 3 ubuntu ubuntu    4096 Apr 10  2020 ./
drwxrwxr-x 6 ubuntu ubuntu    4096 Apr 10  2020 ../
-rw-r--r-- 1 ubuntu ubuntu 1252498 Apr 10  2020 libpng15.a
-rwxr-xr-x 1 ubuntu ubuntu     951 Apr 10  2020 libpng15.la*
lrwxrwxrwx 1 ubuntu ubuntu      19 Apr 10  2020 libpng15.so -> libpng15.so.15.30.0*
lrwxrwxrwx 1 ubuntu ubuntu      19 Apr 10  2020 libpng15.so.15 -> libpng15.so.15.30.0*
-rwxr-xr-x 1 ubuntu ubuntu  706016 Apr 10  2020 libpng15.so.15.30.0*
lrwxrwxrwx 1 ubuntu ubuntu      10 Apr 10  2020 libpng.a -> libpng15.a
lrwxrwxrwx 1 ubuntu ubuntu      11 Apr 10  2020 libpng.la -> libpng15.la*
lrwxrwxrwx 1 ubuntu ubuntu      11 Apr 10  2020 libpng.so -> libpng15.so*
drwxrwxr-x 2 ubuntu ubuntu    4096 Apr 10  2020 pkgconfig/


(tf_v2) ubuntu@VM-0-12-ubuntu:~/downloads/libpng/lib$ cd /usr/local/cuda/lib64
(tf_v2) ubuntu@VM-0-12-ubuntu:/usr/local/cuda/lib64$ ll
total 3057452
drwxr-xr-x  3 root root      4096 Jun 14 00:49 ./
drwxr-xr-x 19 root root      4096 Jun 14 00:48 ../
lrwxrwxrwx  1 root root        19 Jun 14 00:47 libaccinj64.so -> libaccinj64.so.10.0*
lrwxrwxrwx  1 root root        23 Jun 14 00:47 libaccinj64.so.10.0 -> libaccinj64.so.10.0.130*
-rwxr-xr-x  1 root root   7407024 Jun 14 00:47 libaccinj64.so.10.0.130*
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libcublas.so -> libcublas.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libcublas.so.10.0 -> libcublas.so.10.0.130*
-rwxr-xr-x  1 root root  70796360 Jun 14 00:47 libcublas.so.10.0.130*
-rw-r--r--  1 root root  88190630 Jun 14 00:47 libcublas_static.a
-rw-r--r--  1 root root    695156 Jun 14 00:47 libcudadevrt.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libcudart.so -> libcudart.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libcudart.so.10.0 -> libcudart.so.10.0.130*
-rwxr-xr-x  1 root root    495736 Jun 14 00:47 libcudart.so.10.0.130*
-rw-r--r--  1 root root    955082 Jun 14 00:47 libcudart_static.a
-rwxr-xr-x  1 root root 391622760 Jun 14 00:49 libcudnn.so*
-rwxr-xr-x  1 root root 391622760 Jun 14 00:49 libcudnn.so.7*
-rwxr-xr-x  1 root root 391622760 Jun 14 00:49 libcudnn.so.7.6.5*
-rw-r--r--  1 root root 390446312 Jun 14 00:49 libcudnn_static.a
lrwxrwxrwx  1 root root        16 Jun 14 00:47 libcufft.so -> libcufft.so.10.0*
lrwxrwxrwx  1 root root        20 Jun 14 00:47 libcufft.so.10.0 -> libcufft.so.10.0.145*
-rwxr-xr-x  1 root root 103177128 Jun 14 00:47 libcufft.so.10.0.145*
-rw-r--r--  1 root root 123979550 Jun 14 00:47 libcufft_static.a
-rw-r--r--  1 root root 109454136 Jun 14 00:47 libcufft_static_nocallback.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libcufftw.so -> libcufftw.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libcufftw.so.10.0 -> libcufftw.so.10.0.145*
-rwxr-xr-x  1 root root    561192 Jun 14 00:47 libcufftw.so.10.0.145*
-rw-r--r--  1 root root     33250 Jun 14 00:47 libcufftw_static.a
lrwxrwxrwx  1 root root        18 Jun 14 00:47 libcuinj64.so -> libcuinj64.so.10.0*
lrwxrwxrwx  1 root root        22 Jun 14 00:47 libcuinj64.so.10.0 -> libcuinj64.so.10.0.130*
-rwxr-xr-x  1 root root   7792472 Jun 14 00:47 libcuinj64.so.10.0.130*
-rw-r--r--  1 root root     31954 Jun 14 00:47 libculibos.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libcurand.so -> libcurand.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libcurand.so.10.0 -> libcurand.so.10.0.130*
-rwxr-xr-x  1 root root  60806128 Jun 14 00:47 libcurand.so.10.0.130*
-rw-r--r--  1 root root  60723962 Jun 14 00:47 libcurand_static.a
lrwxrwxrwx  1 root root        19 Jun 14 00:47 libcusolver.so -> libcusolver.so.10.0*
lrwxrwxrwx  1 root root        23 Jun 14 00:47 libcusolver.so.10.0 -> libcusolver.so.10.0.130*
-rwxr-xr-x  1 root root 139257368 Jun 14 00:47 libcusolver.so.10.0.130*
-rw-r--r--  1 root root  72147850 Jun 14 00:47 libcusolver_static.a
lrwxrwxrwx  1 root root        19 Jun 14 00:47 libcusparse.so -> libcusparse.so.10.0*
lrwxrwxrwx  1 root root        23 Jun 14 00:47 libcusparse.so.10.0 -> libcusparse.so.10.0.130*
-rwxr-xr-x  1 root root  59078736 Jun 14 00:47 libcusparse.so.10.0.130*
-rw-r--r--  1 root root  67262190 Jun 14 00:47 libcusparse_static.a
-rw-r--r--  1 root root  12722350 Jun 14 00:47 liblapack_static.a
-rw-r--r--  1 root root    967976 Jun 14 00:47 libmetis_static.a
lrwxrwxrwx  1 root root        15 Jun 14 00:47 libnppc.so -> libnppc.so.10.0*
lrwxrwxrwx  1 root root        19 Jun 14 00:47 libnppc.so.10.0 -> libnppc.so.10.0.130*
-rwxr-xr-x  1 root root    553320 Jun 14 00:47 libnppc.so.10.0.130*
-rw-r--r--  1 root root     26216 Jun 14 00:47 libnppc_static.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libnppial.so -> libnppial.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libnppial.so.10.0 -> libnppial.so.10.0.130*
-rwxr-xr-x  1 root root  10556304 Jun 14 00:47 libnppial.so.10.0.130*
-rw-r--r--  1 root root  14040866 Jun 14 00:47 libnppial_static.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libnppicc.so -> libnppicc.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libnppicc.so.10.0 -> libnppicc.so.10.0.130*
-rwxr-xr-x  1 root root   3956688 Jun 14 00:47 libnppicc.so.10.0.130*
-rw-r--r--  1 root root   4450012 Jun 14 00:47 libnppicc_static.a
lrwxrwxrwx  1 root root        18 Jun 14 00:47 libnppicom.so -> libnppicom.so.10.0*
lrwxrwxrwx  1 root root        22 Jun 14 00:47 libnppicom.so.10.0 -> libnppicom.so.10.0.130*
-rwxr-xr-x  1 root root   1348432 Jun 14 00:47 libnppicom.so.10.0.130*
-rw-r--r--  1 root root    949372 Jun 14 00:47 libnppicom_static.a
lrwxrwxrwx  1 root root        18 Jun 14 00:47 libnppidei.so -> libnppidei.so.10.0*
lrwxrwxrwx  1 root root        22 Jun 14 00:47 libnppidei.so.10.0 -> libnppidei.so.10.0.130*
-rwxr-xr-x  1 root root   7215416 Jun 14 00:47 libnppidei.so.10.0.130*
-rw-r--r--  1 root root   9207224 Jun 14 00:47 libnppidei_static.a
lrwxrwxrwx  1 root root        16 Jun 14 00:47 libnppif.so -> libnppif.so.10.0*
lrwxrwxrwx  1 root root        20 Jun 14 00:47 libnppif.so.10.0 -> libnppif.so.10.0.130*
-rwxr-xr-x  1 root root  47194064 Jun 14 00:47 libnppif.so.10.0.130*
-rw-r--r--  1 root root  51287500 Jun 14 00:47 libnppif_static.a
lrwxrwxrwx  1 root root        16 Jun 14 00:47 libnppig.so -> libnppig.so.10.0*
lrwxrwxrwx  1 root root        20 Jun 14 00:47 libnppig.so.10.0 -> libnppig.so.10.0.130*
-rwxr-xr-x  1 root root  25033264 Jun 14 00:47 libnppig.so.10.0.130*
-rw-r--r--  1 root root  27175822 Jun 14 00:47 libnppig_static.a
lrwxrwxrwx  1 root root        16 Jun 14 00:47 libnppim.so -> libnppim.so.10.0*
lrwxrwxrwx  1 root root        20 Jun 14 00:47 libnppim.so.10.0 -> libnppim.so.10.0.130*
-rwxr-xr-x  1 root root   6197800 Jun 14 00:47 libnppim.so.10.0.130*
-rw-r--r--  1 root root   6150298 Jun 14 00:47 libnppim_static.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libnppist.so -> libnppist.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libnppist.so.10.0 -> libnppist.so.10.0.130*
-rwxr-xr-x  1 root root  16604560 Jun 14 00:47 libnppist.so.10.0.130*
-rw-r--r--  1 root root  18732154 Jun 14 00:47 libnppist_static.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libnppisu.so -> libnppisu.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libnppisu.so.10.0 -> libnppisu.so.10.0.130*
-rwxr-xr-x  1 root root    544592 Jun 14 00:47 libnppisu.so.10.0.130*
-rw-r--r--  1 root root     10690 Jun 14 00:47 libnppisu_static.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libnppitc.so -> libnppitc.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libnppitc.so.10.0 -> libnppitc.so.10.0.130*
-rwxr-xr-x  1 root root   2884112 Jun 14 00:47 libnppitc.so.10.0.130*
-rw-r--r--  1 root root   3145290 Jun 14 00:47 libnppitc_static.a
lrwxrwxrwx  1 root root        15 Jun 14 00:47 libnpps.so -> libnpps.so.10.0*
lrwxrwxrwx  1 root root        19 Jun 14 00:47 libnpps.so.10.0 -> libnpps.so.10.0.130*
-rwxr-xr-x  1 root root   8424408 Jun 14 00:47 libnpps.so.10.0.130*
-rw-r--r--  1 root root   9580914 Jun 14 00:47 libnpps_static.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libnvblas.so -> libnvblas.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libnvblas.so.10.0 -> libnvblas.so.10.0.130*
-rwxr-xr-x  1 root root    596080 Jun 14 00:47 libnvblas.so.10.0.130*
lrwxrwxrwx  1 root root        18 Jun 14 00:47 libnvgraph.so -> libnvgraph.so.10.0*
lrwxrwxrwx  1 root root        22 Jun 14 00:47 libnvgraph.so.10.0 -> libnvgraph.so.10.0.130*
-rwxr-xr-x  1 root root  88921848 Jun 14 00:47 libnvgraph.so.10.0.130*
-rw-r--r--  1 root root 186926198 Jun 14 00:47 libnvgraph_static.a
lrwxrwxrwx  1 root root        17 Jun 14 00:47 libnvjpeg.so -> libnvjpeg.so.10.0*
lrwxrwxrwx  1 root root        21 Jun 14 00:47 libnvjpeg.so.10.0 -> libnvjpeg.so.10.0.130*
-rwxr-xr-x  1 root root   1089608 Jun 14 00:47 libnvjpeg.so.10.0.130*
-rw-r--r--  1 root root   1001070 Jun 14 00:47 libnvjpeg_static.a
lrwxrwxrwx  1 root root        25 Jun 14 00:47 libnvrtc-builtins.so -> libnvrtc-builtins.so.10.0*
lrwxrwxrwx  1 root root        29 Jun 14 00:47 libnvrtc-builtins.so.10.0 -> libnvrtc-builtins.so.10.0.130*
-rwxr-xr-x  1 root root   4612768 Jun 14 00:47 libnvrtc-builtins.so.10.0.130*
lrwxrwxrwx  1 root root        16 Jun 14 00:47 libnvrtc.so -> libnvrtc.so.10.0*
lrwxrwxrwx  1 root root        20 Jun 14 00:47 libnvrtc.so.10.0 -> libnvrtc.so.10.0.130*
-rwxr-xr-x  1 root root  20332456 Jun 14 00:47 libnvrtc.so.10.0.130*
lrwxrwxrwx  1 root root        18 Jun 14 00:47 libnvToolsExt.so -> libnvToolsExt.so.1*
lrwxrwxrwx  1 root root        22 Jun 14 00:47 libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0*
-rwxr-xr-x  1 root root     37240 Jun 14 00:47 libnvToolsExt.so.1.0.0*
lrwxrwxrwx  1 root root        14 Jun 14 00:47 libOpenCL.so -> libOpenCL.so.1*
lrwxrwxrwx  1 root root        16 Jun 14 00:47 libOpenCL.so.1 -> libOpenCL.so.1.1*
-rwxr-xr-x  1 root root     27096 Jun 14 00:47 libOpenCL.so.1.1*
drwxr-xr-x  2 root root      4096 Jun 14 00:47 stubs/

ubuntu ll command: this command is used to list the detail information of files and folder of a current directory.

最终在/usr/local/cuda-10.0/extras/CUPTI/lib64/下找到了cuda-gdb-10.0.130.src.tar.gz

(base) ubuntu@VM-0-12-ubuntu:~$ cd /usr/local/cuda-10.0/extras/CUPTI/lib64/
(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ ll
total 6072
drwxr-xr-x 2 root root    4096 Jun 14 00:47 ./
drwxr-xr-x 5 root root    4096 Jun 14 00:47 ../
lrwxrwxrwx 1 root root      16 Jun 14 00:47 libcupti.so -> libcupti.so.10.0*
lrwxrwxrwx 1 root root      20 Jun 14 00:47 libcupti.so.10.0 -> libcupti.so.10.0.130*
-rwxr-xr-x 1 root root 6207480 Jun 14 00:47 libcupti.so.10.0.130*

因为但前不没有root权限,为当前用户新增环境变量,首先查看当前用户环境变量

(tf_v2) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ more ~/.bash_profile 
if test -f .bashrc ; then
source .bashrc
fi
(tf_v2) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ more /etc/profile
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).

if [ "$PS1" ]; then
  if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then
    # The file bash.bashrc already sets the default PS1.
    # PS1='\h:\w\$ '
    if [ -f /etc/bash.bashrc ]; then
      . /etc/bash.bashrc
    fi
  else
    if [ "`id -u`" -eq 0 ]; then
      PS1='# '
    else
      PS1='$ '
    fi
  fi
fi

if [ -d /etc/profile.d ]; then
  for i in /etc/profile.d/*.sh; do
    if [ -r $i ]; then
      . $i
    fi
  done
  unset i
fi
export PATH=/home/ubuntu/anaconda3/bin:$PATH
export PATH=$PATH:/usr/local/cuda
export LD_LIBRARY_PATH=$LA_LIBRARY_PATH:/usr/local/cuda/lib64
export JAVA_HOME=/home/ubuntu/downloads/jdk1.8.0_121
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/home/ubuntu/downloads/libpng/lib:$LD_LIBRARY_PATH

通过vi键入

(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ vi ~/.bashrc

new a line:
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/cuda-10.0/extras/CUPTI/lib64

验证是否添加成功:

(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ source ~/.bashrc
(base) ubuntu@VM-0-12-ubuntu:/usr/local/cuda-10.0/extras/CUPTI/lib64$ echo $LD_LIBRARY_PATH
LD_LIBRARY_PATH:/usr/local/cuda-10.0/extras/CUPTI/lib64

以后每次新建一个server session之后,都执行以下命令激活用户环境变量,再进行模型训练:

$ source ~/.bashrc

至此该问题成功解决!

 

References

  • Tensorflow CUDA - CUPTI error: CUPTI could not be loaded or symbol could not be found
  • could not dlopen DSO: libcupti.so.9.0 #336

你可能感兴趣的:(Ubuntu,TensorFlow,Deep,Learning,cuda,tensorflow,ubuntu,深度学习,神经网络)