[加速] Opencv DNN通过 GPU 加速推理过程

Opencv DNN通过 GPU 加速推理过程

    • 简介
    • 安装必须要的库
    • 查看cuda版本和显卡型号
    • 下载 Opencv 和 Opencv Contrib
    • 准备虚拟环境
    • 编译和安装
    • GPU 推理

简介

最近在使用 Opencv HED 算法做边缘检测,在一张 1000X1000 大小的图片推理大概需要6秒,所以想通过GPU加速。之前一种通过 PYPI 安装 opencv-python ,这个版本是不支持GPU的,因此需要从源码编译安装。

参考了教程How to use OpenCV’s “dnn” module with NVIDIA GPUs, CUDA, and cuDNN
pyimagesearch 这个网站有很多篇文章介绍

环境是

Ubuntu 18.04
cuda 10.0 
cudnn 7.6.5 
GTX1080Ti
g++7
gcc7
cmake 3.14.1

同时注意Terminal能,cmake过程中会下载一些东西,网站被墙了。
另外尝试过使用cuda9.0 gcc5/g++5 和 TitanX显卡,都失败了。

安装 Opencv 4.2.0 大致流程如下。

安装必须要的库

Opencv 官方教程列的很详细

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install build-essential cmake unzip pkg-config
$ sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
$ sudo apt-get install libv4l-dev libxvidcore-dev libx264-dev
$ sudo apt-get install libgtk-3-dev
$ sudo apt-get install libatlas-base-dev gfortran
$ sudo apt-get install python3-dev
$ sudo apt-get install python3.8-dev # 用于python3.8

查看cuda版本和显卡型号

$ nvcc -V # 可以查看cuda版本

还需要显卡的 Compute Capability, 在 Nvidia官网查看
比如 GTX TitanX对应的型号是 5.2,但Opencv4.2.0 dnn 最低要求是5.3 即使编译安装成功,最后推理的时候也会报错。
最好使用 10 系列显卡。

下载 Opencv 和 Opencv Contrib

直接复制开始给的连接的代码就好

$ cd ~
$ wget -O opencv.zip https://github.com/opencv/opencv/archive/4.2.0.zip
$ wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.2.0.zip
$ unzip opencv.zip
$ unzip opencv_contrib.zip
$ mv opencv-4.2.0 opencv
$ mv opencv_contrib-4.2.0 opencv_contrib

结束后 opencvopencv_contrib 分别包含各自的源码

准备虚拟环境

$ conda create -n py38 python=3.8 numpy # py38 环境名,必须包含numpy
$ conda acitvate py38 # 切换到这个环境下面
$ sudo mv /usr/bin/python2.7 /usr/bin/pythonNO-temp
# 记得用完后改回来
$ sudo mv /usr/bin/pythonNO-temp /usr/bin/python2.7

编译和安装

之前解压了opencv和openc-contrib,先进入copencv文件夹里面

$ cd opencv
$ mkdir build
$ cd build

然后用cmake指令

$ cmake -D CMAKE_BUILD_TYPE=RELEASE \ 
	-D CMAKE_INSTALL_PREFIX=/usr/local \ 
	-D INSTALL_PYTHON_EXAMPLES=ON \
	-D INSTALL_C_EXAMPLES=OFF \
	-D OPENCV_ENABLE_NONFREE=ON \
	-D WITH_CUDA=ON \ 
	-D WITH_CUDNN=ON \
	-D OPENCV_DNN_CUDA=ON \
	-D ENABLE_FAST_MATH=1 \
	-D CUDA_FAST_MATH=1 \
	-D CUDA_ARCH_BIN=6.1 \  # Nvidia官网查的信息
	-D WITH_CUBLAS=1 \
	-D OPENCV_EXTRA_MODULES_PATH=/PATH/TO/pencv_contrib/modules \ # 解压的contrib/modules路径
    -D BUILD_NEW_PYTHON_SUPPORT=ON \
    -D BUILD_opencv_python3=ON \
    -D HAVE_opencv_python3=ON \
    -D PYTHON_DEFAULT_EXECUTABLE=$(which python) \
    -D PYTHON_EXECUTABLE=$(which python) \
    -D BUILD_opencv_python2=OFF \
    -D CMAKE_INSTALL_PREFIX=$(python3 -c "import sys; print(sys.prefix)") \
    -D PYTHON3_EXECUTABLE=$(which python3) \
    -D PYTHON3_INCLUDE_DIRS=$(python3 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
    -D PYTHON3_PACKAGES_PATH=$(python3 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
	-D BUILD_EXAMPLES=ON ..

需要注意终端输出包含python3的部分四个路径都是全的,例如

--   Python 3:
--     Interpreter:         /path/to/bin/python3 (ver 3.5.3)
--     Libraries:           /path/to/libpython3.5m.so (ver 3.5.3)
--     numpy:               /path/to/python3.5/site-packages/numpy/core/include (ver 1.18.1)
--     install path:        /path/to/python-3.5

结束后运行

$ make -j8
$ sudo make install

就可以安装GPU版本的opencv.

GPU 推理

以Opencv HED模型为例

import cv2
import os
import preprocess


class CropLayer(object):
    def __init__(self, params, blobs):
        # initialize our starting and ending (x, y)-coordinates of
        # the crop
        self.startX = 0
        self.startY = 0
        self.endX = 0
        self.endY = 0

    def getMemoryShapes(self, inputs):
        # the crop layer will receive two inputs -- we need to crop
        # the first input blob to match the shape of the second one,
        # keeping the batch size and number of channels
        (inputShape, targetShape) = (inputs[0], inputs[1])
        (batchSize, numChannels) = (inputShape[0], inputShape[1])
        (H, W) = (targetShape[2], targetShape[3])

        # compute the starting and ending crop coordinates
        self.startX = int((inputShape[3] - targetShape[3]) / 2)
        self.startY = int((inputShape[2] - targetShape[2]) / 2)
        self.endX = self.startX + W
        self.endY = self.startY + H

        # return the shape of the volume (we'll perform the actual
        # crop during the forward pass
        return [[batchSize, numChannels, H, W]]

    def forward(self, inputs):
        # use the derived (x, y)-coordinates to perform the crop
        return [inputs[0][:, :, self.startY:self.endY,
                self.startX:self.endX]]


class HED:
    def __init__(self,
                 model_dir,
                 cuda=True):
        """
        加载 HED 边缘检测网络
        :param model_dir: caffe 模型和配置参数存储位置
        """
        assert os.path.exists(model_dir), 'model_dir not exists.'
        proto_path = os.path.join(model_dir, 'deploy.prototxt')
        model_path = os.path.join(model_dir, 'hed_pretrained_bsds.caffemodel')
        net = cv2.dnn.readNetFromCaffe(proto_path, model_path)
        cv2.dnn_registerLayer('Crop', CropLayer)
        if cuda:
        """
        zh
        """
            net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
            net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
        self.net = net
        

    def process(self, img):
        """
        检测 BGR image 边缘
        :param img: BGR image
        :return:
            ret: ndarray
        """
        (H, W) = img.shape[:2]
        img = cv2.bilateralFilter(img, 7, 30, 4)
        blob = cv2.dnn.blobFromImage(img, scalefactor=1.0, size=(W, H),
                                     mean=(104.00698793, 116.66876762, 122.67891434),
                                     swapRB=False, crop=False)
        self.net.setInput(blob)
        ret = self.net.forward()
        ret = cv2.resize(ret[0, 0], (W, H))
        ret = (255 * ret).astype('uint8')
        return ret

代码中加上下面两行叫就可以在 CPU 推理

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

速度3.2s提升到 0.2s左右。

你可能感兴趣的:(深度学习,opencv,python,cuda)