最近在使用 Opencv HED 算法做边缘检测,在一张 1000X1000
大小的图片推理大概需要6秒,所以想通过GPU加速。之前一种通过 PYPI
安装 opencv-python
,这个版本是不支持GPU的,因此需要从源码编译安装。
参考了教程How to use OpenCV’s “dnn” module with NVIDIA GPUs, CUDA, and cuDNN
pyimagesearch 这个网站有很多篇文章介绍
环境是
Ubuntu 18.04
cuda 10.0
cudnn 7.6.5
GTX1080Ti
g++7
gcc7
cmake 3.14.1
同时注意Terminal能,cmake
过程中会下载一些东西,网站被墙了。
另外尝试过使用cuda9.0 gcc5/g++5
和 TitanX显卡,都失败了。
安装 Opencv 4.2.0
大致流程如下。
Opencv 官方教程列的很详细
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install build-essential cmake unzip pkg-config
$ sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
$ sudo apt-get install libv4l-dev libxvidcore-dev libx264-dev
$ sudo apt-get install libgtk-3-dev
$ sudo apt-get install libatlas-base-dev gfortran
$ sudo apt-get install python3-dev
$ sudo apt-get install python3.8-dev # 用于python3.8
$ nvcc -V # 可以查看cuda版本
还需要显卡的 Compute Capability, 在 Nvidia官网查看
比如 GTX TitanX对应的型号是 5.2,但Opencv4.2.0 dnn 最低要求是5.3 即使编译安装成功,最后推理的时候也会报错。
最好使用 10 系列显卡。
直接复制开始给的连接的代码就好
$ cd ~
$ wget -O opencv.zip https://github.com/opencv/opencv/archive/4.2.0.zip
$ wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.2.0.zip
$ unzip opencv.zip
$ unzip opencv_contrib.zip
$ mv opencv-4.2.0 opencv
$ mv opencv_contrib-4.2.0 opencv_contrib
结束后 opencv
和 opencv_contrib
分别包含各自的源码
$ conda create -n py38 python=3.8 numpy # py38 环境名,必须包含numpy
$ conda acitvate py38 # 切换到这个环境下面
$ sudo mv /usr/bin/python2.7 /usr/bin/pythonNO-temp
# 记得用完后改回来
$ sudo mv /usr/bin/pythonNO-temp /usr/bin/python2.7
之前解压了opencv和openc-contrib,先进入copencv文件夹里面
$ cd opencv
$ mkdir build
$ cd build
然后用cmake指令
$ cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D INSTALL_PYTHON_EXAMPLES=ON \
-D INSTALL_C_EXAMPLES=OFF \
-D OPENCV_ENABLE_NONFREE=ON \
-D WITH_CUDA=ON \
-D WITH_CUDNN=ON \
-D OPENCV_DNN_CUDA=ON \
-D ENABLE_FAST_MATH=1 \
-D CUDA_FAST_MATH=1 \
-D CUDA_ARCH_BIN=6.1 \ # Nvidia官网查的信息
-D WITH_CUBLAS=1 \
-D OPENCV_EXTRA_MODULES_PATH=/PATH/TO/pencv_contrib/modules \ # 解压的contrib/modules路径
-D BUILD_NEW_PYTHON_SUPPORT=ON \
-D BUILD_opencv_python3=ON \
-D HAVE_opencv_python3=ON \
-D PYTHON_DEFAULT_EXECUTABLE=$(which python) \
-D PYTHON_EXECUTABLE=$(which python) \
-D BUILD_opencv_python2=OFF \
-D CMAKE_INSTALL_PREFIX=$(python3 -c "import sys; print(sys.prefix)") \
-D PYTHON3_EXECUTABLE=$(which python3) \
-D PYTHON3_INCLUDE_DIRS=$(python3 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
-D PYTHON3_PACKAGES_PATH=$(python3 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
-D BUILD_EXAMPLES=ON ..
需要注意终端输出包含python3的部分四个路径都是全的,例如
-- Python 3:
-- Interpreter: /path/to/bin/python3 (ver 3.5.3)
-- Libraries: /path/to/libpython3.5m.so (ver 3.5.3)
-- numpy: /path/to/python3.5/site-packages/numpy/core/include (ver 1.18.1)
-- install path: /path/to/python-3.5
结束后运行
$ make -j8
$ sudo make install
就可以安装GPU版本的opencv.
以Opencv HED模型为例
import cv2
import os
import preprocess
class CropLayer(object):
def __init__(self, params, blobs):
# initialize our starting and ending (x, y)-coordinates of
# the crop
self.startX = 0
self.startY = 0
self.endX = 0
self.endY = 0
def getMemoryShapes(self, inputs):
# the crop layer will receive two inputs -- we need to crop
# the first input blob to match the shape of the second one,
# keeping the batch size and number of channels
(inputShape, targetShape) = (inputs[0], inputs[1])
(batchSize, numChannels) = (inputShape[0], inputShape[1])
(H, W) = (targetShape[2], targetShape[3])
# compute the starting and ending crop coordinates
self.startX = int((inputShape[3] - targetShape[3]) / 2)
self.startY = int((inputShape[2] - targetShape[2]) / 2)
self.endX = self.startX + W
self.endY = self.startY + H
# return the shape of the volume (we'll perform the actual
# crop during the forward pass
return [[batchSize, numChannels, H, W]]
def forward(self, inputs):
# use the derived (x, y)-coordinates to perform the crop
return [inputs[0][:, :, self.startY:self.endY,
self.startX:self.endX]]
class HED:
def __init__(self,
model_dir,
cuda=True):
"""
加载 HED 边缘检测网络
:param model_dir: caffe 模型和配置参数存储位置
"""
assert os.path.exists(model_dir), 'model_dir not exists.'
proto_path = os.path.join(model_dir, 'deploy.prototxt')
model_path = os.path.join(model_dir, 'hed_pretrained_bsds.caffemodel')
net = cv2.dnn.readNetFromCaffe(proto_path, model_path)
cv2.dnn_registerLayer('Crop', CropLayer)
if cuda:
"""
zh
"""
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
self.net = net
def process(self, img):
"""
检测 BGR image 边缘
:param img: BGR image
:return:
ret: ndarray
"""
(H, W) = img.shape[:2]
img = cv2.bilateralFilter(img, 7, 30, 4)
blob = cv2.dnn.blobFromImage(img, scalefactor=1.0, size=(W, H),
mean=(104.00698793, 116.66876762, 122.67891434),
swapRB=False, crop=False)
self.net.setInput(blob)
ret = self.net.forward()
ret = cv2.resize(ret[0, 0], (W, H))
ret = (255 * ret).astype('uint8')
return ret
代码中加上下面两行叫就可以在 CPU 推理
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
速度3.2s提升到 0.2s左右。