背 景 ‾ \underline{背景} 背景
使用caffe训练ENet模型,然后转为ncnn
- 代码地址:https://github.com/TimoSaemann/ENet
我本机的gpu型号是NVIDIA的Quadro P600
vulkan的使用和显卡的驱动是有关系的,
亲测可用版本
Driver Version: 440.36
不可用版本
Driver Version: 440.82
下载vulkan的安装包之后,解压缩tar -zxvf vulkanxxx,然后进入x86_64/bin执行
./vulkaninfo
必要的依赖库:
libGLX_nvidia.so.0
,而这个依赖库在440.82版本的驱动找不到,所以不可用。
ncnn在转换模型的时候都会经历这个操作,原理非常简单,就是把一些运算离线做好,能够提升一点效率。
install g++ cmake protobuf
(optional) download and install vulkan-sdk from https://vulkan.lunarg.com/sdk/home
$ wget https://sdk.lunarg.com/sdk/download/1.1.92.1/linux/vulkansdk-linux-x86_64-1.1.92.1.tar.gz?Human=true -O vulkansdk-linux-x86_64-1.1.92.1.tar.gz
$ tar -xf vulkansdk-linux-x86_64-1.1.92.1.tar.gz
# setup env
$ export VULKAN_SDK=`pwd`/1.1.92.1/x86_64
$ cd
$ mkdir -p build
$ cd build
# cmake option NCNN_VULKAN for enabling vulkan
$ cmake -DNCNN_VULKAN=ON ..
$ make -j4
$ make install
需要引用头文件
#include
必要的语句
// initialize when app starts
ncnn::create_gpu_instance();// line1
// enable vulkan compute feature before loading
ncnn::Net net;
net.opt.use_vulkan_compute = 1;// line2
// some choices of vulkan
net.opt.num_threads = 1;
net.opt.use_fp16_packed = false;
net.opt.use_fp16_storage = false;
net.opt.use_fp16_arithmetic = false;
net.opt.use_int8_storage = false;
net.opt.use_int8_arithmetic = false
// deinitialize when app exits
ncnn::destroy_gpu_instance();// line3
# ncnn
include_directories(/home/surui/Downloads/software/ncnn-master/build/install/include/ncnn)
link_directories(/home/surui/Downloads/software/ncnn-master/build/install/lib)
# ncnn vulkan
include_directories(/home/surui/Downloads/software/vulkansdk-linux-x86_64-1.1.92.1/x86_64/include)
link_directories(/home/surui/Downloads/software/vulkansdk-linux-x86_64-1.1.92.1/x86_64/lib)
构建头文件
#include
#include
#include
class Segmentation {
public:
Segmentation(const std::string& param_path, const std::string& model_path);
~Segmentation();
cv::Mat segment(const cv::Mat& img);
protected:
ncnn::Net model; // ncnn model
int resizeWidth;
int resizeHeight;
bool isResize;
};
编写源文件
Segmentation::Segmentation(const std::string ¶m_path, const std::string &model_path)
{
// initialize when app starts
ncnn::create_gpu_instance();
// enable vulkan compute feature before loading
model.opt.use_vulkan_compute = 1;
model.load_param(param_path.c_str());
model.load_model(model_path.c_str());
resizeWidth = 360;
resizeHeight = 480;
}
Segmentation::~Segmentation(){
ncnn::destroy_gpu_instance();
}
cv::Mat Segmentation::segment(const cv::Mat &img)
{
ncnn::Mat inputMat;
inputMat = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR, img.cols, img.rows, resizeWidth, resizeHeight);
ncnn::Extractor extractor = model.create_extractor();
extractor.set_num_threads(6);
extractor.input("data", inputMat);
ncnn::Mat outputMat;
extractor.extract("deconv6_0_0", outputMat);
cv::Mat predMask = cv::Mat::zeros(cv::Size(outputMat.w, outputMat.h), CV_8UC1);
// 分割为两类的情况
ncnn::Mat chn_0 = outputMat.channel(0);
ncnn::Mat chn_1 = outputMat.channel(1);
for(int i = 0; i < outputMat.h; ++i)
{
const float* pCh0 = chn_0.row(i);
const float* pCh1 = chn_1.row(i);
uchar *Mask = predMask.ptr<uchar>(i);
for(int j = 0; j < outputMat.w; ++j){
Mask[j] = pCh0[j] > pCh1[j] ? 0 : 255;
}
}
return Mask;
}
ENet latency : 1090.02 ms
ENet latency : 1097.74 ms
ENet latency : 1092.15 ms
[0 Quadro P600] queueC=2[8] queueT=1[2] buglssc=0
[0 Quadro P600] fp16p=1 fp16s=1 fp16a=0 int8s=1 int8a=1
ENet latency : 32.948 ms
ENet latency : 30.621 ms
ENet latency : 36.082 ms
在加速ENet分割模型上效果明显。
Q:
vkCreateInstance failed -9
A:
apt install mesa-vulkan-drivers