【深度教程】Ubuntu16.04 安装环境配置 (CUDA+cuDNN+NCCL+OpenCV+Caffe)

Table of Content

  1. Overview
  2. CUDA
  3. cuDNN
  4. NCCL
  5. OpenCV
  6. Caffe

Overview

服务器环境:Ubuntu 16.04 Linux media50 4.10.0-30-generic Wed Aug 2 02:13:56 UTC 2017 x86_64 GNU/Linux

所用版本:CUDA 8.0 + cuDNN 6.0 + NCCL 2.3.7-1 + OpenCV 3.3.0 + Caffe 1.0


CUDA

  • CUDA Archive:https://developer.nvidia.com/cuda-toolkit-archive

CUDA Toolkit 8.0 GA2:https://developer.nvidia.com/cuda-80-ga2-download-archive

根据OS和CPU架构选择对应版本的.run file并下载

  • Installation Guide Linux :: CUDA Toolkit Documentation:https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html

安装NVIDIA显卡驱动

  • NVIDIA显卡驱动程序下载:https://www.nvidia.cn/Download/index.aspx?lang=cn

卸载原有旧版本驱动:(可选)

sudo apt-get remove nvidia* # 需要sudo权限
dpkg -l | grep ^rc | cut -d' ' -f3 | sudo xargs dpkg --purge # 删除所有冗余的配置文件 (rc)

赋予.run安装文件执行权限:

sudo chmod a+x NVIDIA-Linux-${***}.run

带参数运行,以避免出现循环登录的问题:Ref[5]

sudo ./NVIDIA-Linux-x86_64-375.20.run –no-x-check –no-nouveau-check –no-opengl-files
# –no-x-check 安装驱动时关闭X服务
# –no-nouveau-check 安装驱动时禁用nouveau
# –no-opengl-files 只安装驱动文件,不安装OpenGL文件

验证NVIDIA驱动是否安装成功:

nvidia-smi

安装CUDA

运行.run file文件:

sudo sh cuda8.0_${***}_linux.run # 需要sudo权限

一直按空格到100%后,输入accept接受条款;
输入no不安装NVIDIA驱动,因为已经安装过了;
输入y安装cuda8.0工具,回车使用默认路径安装cuda8.0工具;
输入y使用sudo命令,接着输入密码;
输入n不安装指向/usr/local/cuda的符号连接(也可选择y安装符号连接);
输入y安装cuda8.0的实例,可以用此来检验cuda是否安装成功,回车选择默认路径安装;
等待cuda安装成功… Ref[4]

安装完成后,打开profile文件:

sudo gedit /etc/profile # 需要sudo权限

在末尾处添加(注意不要有空格):

export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

[可选] 建立cuda目录对应到cuda-8.0的软链接:

sudo ln -s /usr/local/cuda-8.0 /usr/local/cuda # 需要sudo权限

测试CUDA安装

cd /usr/local/cuda-8.0/samples
sudo make all -j8 # 需要sudo权限
cd bin/x86_64/linux/release
./deviceQuery

或者

cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery
sudo make -j8 # 需要sudo权限
./deviceQuery

如果显示一些关于GPU的信息,则说明安装成功了

查看已安装CUDA版本

cat /usr/local/cuda-8.0/version.txt

cuDNN

  • cuDNN Archive:https://developer.nvidia.com/rdp/cudnn-archive

免注册直接下载:http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/cudnn-8.0-linux-x64-v6.0.tgz (v6.0 for CUDA 8.0)

  • cuDNN Installation Guide :: Deep Learning SDK Documentation:https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

安装cuDNN

解压缩:

tar zxvf cudnn-8.0-linux-x64-v${*}.tgz

复制文件至CUDA目录:

# 复制头文件cudnn.h到/usr/local/cuda-8.0/include/目录下
sudo cp cuda/include/cudnn.h /usr/local/cuda-8.0/include/
# 复制库文件lib*到/usr/local/cuda-8.0/lib64/目录下
sudo cp cuda/lib/lib* /usr/local/cuda-8.0/lib64/
# 均需要sudo权限

如果发现文件上有锁:Ref[4]

sudo chmod 777 ${file} # 赋予文件夹及子文件夹777权限,去掉锁后再复制

重置读写权限:Ref[6]

sudo chmod a+r /usr/local/cuda-8.0/include/cudnn.h

建立符号链接

创建并编辑链接文件:

sudo gedit /etc/ld.so.conf.d/cuda.conf # 需要sudo权限

增加下面一行:Ref[4]

/usr/local/cuda-8.0/lib64 

最后更新动态链接库:

ldconfig -v

关联环境变量

编辑.bashrc文件:

gedit ~/.bashrc

添加一行:Ref[6]

export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

更新环境:(需要重启终端)

source ~/.bashrc

查看已安装cuDNN版本

cat /usr/local/cuda-8.0/include/cudnn.h | grep CUDNN_MAJOR -A 2

NCCL

在多个GPU上运行Caffe需要使用到NVIDIA NCCL

  • NVIDIA/nccl on GitHub:https://github.com/NVIDIA/nccl/releases

git clone或下载tar.gz源文件,并解压

安装NCCL

在NCCL根目录下:Ref[13]

cd nccl-2.3.7-1
make CUDA_HOME=/usr/local/cuda-8.0 src.build -j8
# Install tools to create debian packages
sudo apt install build-essential devscripts debhelper
# Build NCCL deb package
make pkg.debian.build
# 查看生成的安装包
ls build/pkg/deb/

可见libnccl2_2.3.7-1+cuda8.0_amd64.deb和libnccl-dev_2.3.7-1+cuda8.0_amd64.deb

使用dpkg执行deb包的安装:

cd build/pkg/deb/
# 需要sudo权限
sudo dpkg --install libnccl2_2.3.7-1+cuda8.0_amd64.deb
sudo dpkg --install libnccl-dev_2.3.7-1+cuda8.0_amd64.deb

测试NCCL

  • NVIDIA/nccl-tests on GitHub:https://github.com/nvidia/nccl-tests

git clone或下载zip源文件,并解压

在nccl-tests-master根目录下:Ref[14]

cd nccl-tests-master
make CUDA_HOME=/usr/local/cuda-8.0 NCCL_HOME=/usr -j8
# 具体运行参数见 Ref[14]
./build/all_reduce_perf -b 8 -e 256M -f 2 -g 2

测试程序正常运行则表明NCCL安装成功


OpenCV

  • opencv/opencv on GitHub:https://github.com/opencv/opencv/releases

  • opencv/opencv_contrib on GitHub:https://github.com/opencv/opencv_contrib/releases

下载对应相同版本的tar.gz源文件

解压后将opencv_contrib存放在opencv目录下,以便安装

卸载旧版本OpenCV (可选)

进入源码build文件夹uninstall卸载:

cd ${OPENCV/SOURCE/PATH}
cd build
make uninstall

cd ..
rm -r build
sudo rm -r /usr/local/include/opencv2 /usr/local/include/opencv /usr/include/opencv /usr/include/opencv2 /usr/local/share/opencv /usr/local/share/OpenCV /usr/share/opencv /usr/share/OpenCV /usr/local/bin/opencv* /usr/local/lib/libopencv* # 需要sudo权限

后续清理:

# 删除/usr中所有opencv相关项 Ref[3]
cd /usr/
find . -name "*opencv*" | xargs sudo rm -r

# 移除Python相关
sudo apt-get remove opencv-doc opencv-data python-opencv # 需要sudo权限

安装OpenCV

针对CUDA 8.0以上版本,修改modules/(gpu)or(cudalegacy)/src/graphcuts.cpp第45行

#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER)

// Ref[1]
#if !defined (HAVE_CUDA) || defined (CUDA_DISABLER) || (CUDART_VERSION >= 8000)

源码编译安装命令:

cd ${OPENCV/SOURCE/PATH}
mkdir build
cd build

# WITHOUT opencv_contrib Ref[2]
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D CUDA_GENERATION=Kepler ..
# WITH opencv_contrib Ref[7]
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D CUDA_GENERATION=Kepler -D OPENCV_EXTRA_MODULES_PATH=${OPENCV/CONTRIB/SOURCE/PATH}/modules -D WITH_EIGEN=OFF ..
# 如果提示缺少ippicv或者protobuf,则下载相应版本放入错误提示的目录内

make -j8 # -jx 分配进程数x
sudo make install # 需要sudo权限

# 更新动态链接库
ldconfig -v

编译出错解决方案

  • make 提示出错 fatal error: opencv2/xfeatures2d/cuda.hpp或nonfree.hpp: No such file or directory

${OPENCV/CONTRIB/SOURCE/PATH}/modules/xfeatures2d/include/opencv2

下内容复制到

${OPENCV/SOURCE/PATH}/modules/features2d/include/opencv2
  • make 提示出错 fatal error: boostdesc_bgm_**.i或vgg_generated_***.i: No such file or directory

下载 boostdesc_bgm_**.i ×7 Ref[8]

依据https://github.com/opencv/opencv_contrib/blob/master/modules/xfeatures2d/cmake/download_boostdesc.cmake
得到下载地址https://raw.githubusercontent.com/opencv/opencv_3rdparty/34e4206aef44d50e6bbcd0ab06354b52e7466d26/boostdesc_bgm.i
替换末尾文件名即可逐一下载

下载 vgg_generated_***.i ×4

opencv/opencv_3rdparty:https://github.com/opencv/opencv_3rdparty/tree/contrib_xfeatures2d_vgg_20160317
或者
opencv-dlco:https://github.com/cbalint13/opencv-dlco/tree/master/workspace/opencv

下载完成后,可将所有相关文件置于

${OPENCV/SOURCE/PATH}/modules/features2d/include

而后重新make即可

查看已安装OpenCV版本

pkg-config --modversion opencv

Caffe

  • BVLC/caffe on GitHub:https://github.com/BVLC/caffe/releases

git clone或下载tar.gz源文件,并解压

修改Makefile.config

进入caffe目录,根据Makefile.config.example复制一份Makefile.config:

cd caffe-1.0
cp Makefile.config.example Makefile.config
gedit Makefile.config

根据内容提示,结合具体情况修改Makefile.config,典型修改示例如下:Ref[9]

  • 应用cuDNN
# USE_CUDNN := 1
↓
USE_CUDNN := 1
  • 应用OpenCV3
# OPENCV_VERSION := 3
↓
OPENCV_VERSION := 3
  • 更改Python lib路径
PYTHON_LIB := /usr/lib
↓
PYTHON_LIB := /usr/lib/x86_64-linux-gnu
  • 应用Python层接口
# WITH_PYTHON_LAYER := 1
↓
WITH_PYTHON_LAYER := 1
  • 修改include & lib路径
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
↓
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
  • 应用NCCL Ref[15]
# USE_NCCL := 1 
↓
USE_NCCL := 1 

修改其他配置文件

修改caffe目录下的Makefile文件:

NVCCFLAGS += -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
↓
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)

LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
↓
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

这是因为Ubuntu16.04的文件包含位置发生了变化,尤其是需要用到的hdf5的位置,所以需要更改这一路径

修改/usr/local/cuda/include/目录下的host_config.h文件:

#error-- unsupported GNU version! gcc versions later than 5 are not supported!
↓
// #error-- unsupported GNU version! gcc versions later than 5 are not supported!

修改caffe目录下的CMakeLists.txt文件:Ref[15]

caffe_option(USE_NCCL "Build Caffe with NCCL library support" OFF)
↓
caffe_option(USE_NCCL "Build Caffe with NCCL library support" ON)

编译安装Caffe

在caffe根目录下:Ref[11]

cd caffe-1.0

# 编译Python接口(可选)
sudo pip install -r python/requirements.txt # 需要sudo权限
make pycaffe -j8

# 编译测试Caffe
make all -j8
make test -j8
make runtest -j8

各项均是RUN……OK,即表示runtest成功

编译出错解决方案

  • make过程中OpenCV高低版本冲突问题:

/usr/bin/ld: warning: libopencv_imgproc.so.3.3, needed by /usr/local/lib/libopencv_imgcodecs.so, may conflict with libopencv_imgproc.so.2.4
/usr/bin/ld: warning: libopencv_core.so.3.3, needed by /usr/local/lib/libopencv_imgcodecs.so, may conflict with libopencv_core.so.2.4
/usr/bin/ld: .build_release/examples/cpp_classification/classification.o: undefined reference to symbol ‘_ZN2cv6String10deallocateEv’
//usr/local/lib/libopencv_core.so.3.3: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

解决方案:Ref[16] Ref[17]

sudo apt-get autoremove libopencv-dev libopencv-core-dev

验证:MNIST数据集分类测试

具体参见Ref[12],有比较详细的描述与解释:

# 将终端定位到Caffe根目录
cd $CAFFE_ROOT
# 下载MNIST数据库并解压缩
./data/mnist/get_mnist.sh # 亦可手动下载 http://yann.lecun.com/exdb/mnist/
# 将其转换成Lmdb数据库格式
./examples/mnist/create_mnist.sh
# 训练网络
./examples/mnist/train_lenet.sh

训练一共迭代10000次,最终正确率能够达到99.07%
在CAFFE_ROOT/examples/mnist/目录下新生成了lenet_iter_10000.solverstate、lenet_iter_10000.caffemodel、lenet_iter_5000.solverstate、lenet_iter_5000.caffemodel共4个文件

  • 运行create_mnist.sh时出错:Unable to open file data/mnist/train-images-idx3-ubyte

修改create_mnist.sh中的文件名,使与已下载的文件名保持一致(一般为 . 与 - 的区别)


BACK to TOP

References

[1] GraphCut deprecated in CUDA 7.5 and removed in 8.0:https://github.com/opencv/opencv/pull/6510/files

[2] Unsupported gpu architecture 'compute_11’解决方法:https://blog.csdn.net/sysuwuhongpeng/article/details/45485719

[3] Ubuntu16.04卸载opencv2.4.9并安装opencv3.2.0+contrib:https://www.cnblogs.com/mar-q/p/7490271.html

[4] Ubuntu16.04安装cuda及cudnn:https://blog.csdn.net/cdwxx1234/article/details/75121562

[5] 安装NVIDIA显卡驱动以及CUDA:https://blog.csdn.net/qq_28413479/article/details/76377184

[6] 最正确的姿势安装cudnn,网上大多数教程都太坑了:https://blog.csdn.net/lucifer_zzq/article/details/76675239

[7] ubuntu 下配置opencv3.10问题-Eigen/Eigenvalues: No such file or directory:https://blog.csdn.net/kekong0713/article/details/53674067

[8] fatal error: boostdesc_bgm.i: No such file or directory:https://github.com/opencv/opencv_contrib/issues/1301

[9] Ubuntu 16.04 安装配置Caffe 图文详解:https://www.linuxidc.com/Linux/2016-12/138870p2.htm

[10] Ubuntu16.04 Caffe 安装步骤记录(超详尽):https://blog.csdn.net/yhaolpz/article/details/71375762

[11] Caffe | Installation:http://caffe.berkeleyvision.org/installation.html

[12] Caffe | LeNet MNIST Tutorial:http://caffe.berkeleyvision.org/gathered/examples/mnist.html

[13] GitHub - NVIDIA/nccl: Optimized primitives for collective multi-GPU communication:https://github.com/NVIDIA/nccl

[14] GitHub - NVIDIA/nccl-tests: NCCL Tests:https://github.com/nvidia/nccl-tests

[15] caffe 问题集锦之使用cmake编译多GPU时,更改USE_NCCL=1无效:https://blog.csdn.net/u011394059/article/details/73732707

[16] Ubuntu16.04+cuda8.0+opencv3.0.0 caffe编译的那些坑:https://blog.csdn.net/w113691/article/details/80583246

[17] 来自Caffe用的OpenCV 3和NVIDIA位数:OpenCV的版本冲突(2.4 VS 3.0):https://stackoverrun.com/cn/q/9405758

你可能感兴趣的:(OpenCV,Machine,Learning,Other)