今天看到了一份ppt《OpenCV3_CVPR_2015_Speed》,看到了下面的一组数据,于是想研究研究OpenCV的OpenCL这个家伙。
可以看到利用OpenCL,算法的运行速度加速明显!
于是下载了opencv3.2源码并在Windows10(64bit)VS2013上进行编译,CMAKE配置如下:
测试环境:Windows10(64)+AMD GPU+OpenCV3.2
看两张上面的PPT大概知道opencv-opencl是怎么用的。
OKAY,环境配置好后那是相当的鸡冻,按照上面的示例代码写个原始版本和CL版本的小程序就能测试speed up的效果了!
demo1(执行1次灰度化、高斯模糊、canny算法)如下:
#include
#include
#include
#include
#include
#include
#include
#include
#include
using namespace cv;
using namespace cv::ocl;
using namespace std;
int main(int argc, char **argv){
// Test Normal OpenCV
cv::Mat f = cv::imread("C:\\Users\\KayChan\\Desktop\\kj_color_detect\\4.bmp", 1);
cv::Mat gray;
double t = 0.0;
t = (double)cv::getTickCount();
for (int i = 0; i < 1; i++){
cv::cvtColor(f, gray, cv::COLOR_BGR2GRAY);
cv::GaussianBlur(gray, gray, cv::Size(7, 7), 1.5);
cv::Canny(gray, gray, 0, i);
}
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
std::cout << "cpu time:" << t << std::endl;
// Test OpenCL
cv::UMat uf = cv::imread("C:\\Users\\KayChan\\Desktop\\kj_color_detect\\4.bmp", 1).getUMat(cv::ACCESS_READ);
cv::UMat ugray;
t = 0.0;
t = (double)cv::getTickCount();
for (int i = 0; i < 1; i++){
cv::cvtColor(uf, ugray, cv::COLOR_BGR2GRAY);
cv::GaussianBlur(ugray, ugray, cv::Size(7, 7), 1.5);
cv::Canny(ugray, ugray, 0, i);
}
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
std::cout << "gpu time:" << t << std::endl;
return 0;
}
运行结果:
将上面代码的循环改成100次,运行结果:
将上面代码的循环改成500次,运行结果:
问题来了,好像不对劲啊,CL运行1次的时间是0.724482,100次的时间是0.779403,而500次的时间确实6.68255···,什么情况!不应该啊···感觉这跨度有点大啊!
demo2(测试opencv的灰度模板匹配)如下:
#include
#include
#include
#include
#include
#include
#include
#include
#include
using namespace cv;
using namespace cv::ocl;
using namespace std;
void runMatchGrayUseCpu(int method);
void runMatchGrayUseGpu(int method);
int main(int argc, char **argv){
int method = CV_TM_SQDIFF_NORMED;
runMatchGrayUseCpu(method);
runMatchGrayUseGpu(method);
return 0;
}
void runMatchGrayUseCpu(int method){
double t = 0.0;
t = (double)cv::getTickCount();
// 1.get src image
cv::Mat src = cv::imread("C:\\Users\\KayChan\\Desktop\\kj_color_detect\\11.bmp", 1);
// 2.get template image
cv::Mat tmp = cv::imread("C:\\Users\\KayChan\\Desktop\\testimage\\tmp.png", 1);
// 3.gray image
cv::Mat gray_src, gray_tmp;
if (src.channels() == 1) gray_src = src;
else cv::cvtColor(src, gray_src, CV_RGB2GRAY);
if (tmp.channels() == 1) gray_tmp = tmp;
else cv::cvtColor(tmp, gray_tmp, CV_RGB2GRAY);
// 4.match
int result_cols = gray_src.cols - gray_tmp.cols + 1;
int result_rows = gray_src.rows - gray_tmp.rows + 1;
cv::Mat result = cv::Mat(result_cols, result_rows, CV_32FC1);
cv::matchTemplate(gray_src, gray_tmp, result, method);
cv::Point point;
double minVal, maxVal;
cv::Point minLoc, maxLoc;
cv::minMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc, cv::Mat());
switch (method){
case CV_TM_SQDIFF:
point = minLoc;
break;
case CV_TM_SQDIFF_NORMED:
point = minLoc;
break;
case CV_TM_CCORR:
case CV_TM_CCOEFF:
point = maxLoc;
break;
case CV_TM_CCORR_NORMED:
case CV_TM_CCOEFF_NORMED:
default:
point = maxLoc;
break;
}
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
std::cout << "======Test Match Template Use CPU======" << std::endl;
std::cout << "CPU time :" << t << " second" << std::endl;
std::cout << "obj.x :" << point.x << " obj.y :" << point.y << std::endl;
std::cout << " " << std::endl;
}
void runMatchGrayUseGpu(int method){
double t = 0.0;
t = (double)cv::getTickCount();
// 1.get src image
cv::UMat src = cv::imread("C:\\Users\\KayChan\\Desktop\\kj_color_detect\\11.bmp", 1).getUMat(cv::ACCESS_RW);
// 2.get template image
cv::UMat tmp = cv::imread("C:\\Users\\KayChan\\Desktop\\testimage\\tmp.png", 1).getUMat(cv::ACCESS_RW);
// 3.gray image
cv::UMat gray_src, gray_tmp;
if (src.channels() == 1) gray_src = src;
else cv::cvtColor(src, gray_src, CV_RGB2GRAY);
if (tmp.channels() == 1) gray_tmp = tmp;
else cv::cvtColor(tmp, gray_tmp, CV_RGB2GRAY);
// 4.match
int result_cols = gray_src.cols - gray_tmp.cols + 1;
int result_rows = gray_src.rows - gray_tmp.rows + 1;
cv::UMat result = cv::UMat(result_cols, result_rows, CV_32FC1);
cv::matchTemplate(gray_src, gray_tmp, result, method);
cv::Point point;
double minVal, maxVal;
cv::Point minLoc, maxLoc;
cv::minMaxLoc(result, &minVal, &maxVal, &minLoc, &maxLoc, cv::Mat());
switch (method){
case CV_TM_SQDIFF:
point = minLoc;
break;
case CV_TM_SQDIFF_NORMED:
point = minLoc;
break;
case CV_TM_CCORR:
case CV_TM_CCOEFF:
point = maxLoc;
break;
case CV_TM_CCORR_NORMED:
case CV_TM_CCOEFF_NORMED:
default:
point = maxLoc;
break;
}
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
std::cout << "======Test Match Template Use OpenCL======" << std::endl;
std::cout << "OpenCL time :" << t << " second" << std::endl;
std::cout << "obj.x :" << point.x << " obj.y :" << point.y << std::endl;
std::cout << " " << std::endl;
}
运行结果如下:
我只想说,我了个FUCK,居然比CPU的更慢···。而且慢的不是一个数量级的!
然后我以为是没开启OCL,UMat看文档应该是不需要想2.x那样开启的吧。索性也开启一下像2.X那样,加了一段代码代码如下:
#include
#include
#include
#include
#include
#include
#include
#include
#include
using namespace cv;
using namespace cv::ocl;
using namespace std;
void runMatchGrayUseCpu(int method);
void runMatchGrayUseGpu(int method);
int main(int argc, char **argv){
// 新增加的
//launch OpenCL environment...
std::vector plats;
cv::ocl::getPlatfomsInfo(plats);
const cv::ocl::PlatformInfo *platform = &plats[0];
cout<<"Platform name: "<name().c_str()<getDevice(current_device, 0);
cout<<"Device name: "<
运行结果如下:
MMP,看到没有,居然给Mat加速了近10倍(从1秒多编程0.1秒)···,没有给UMat加速。
不知各位大佬有什么想法,既然官方声称UMat支持了opencl,不应该这么差劲啊···,想想也应该是自己哪里做的不对。
不过晚上找到了一些原因,不知道是不是这些因素导致的,正在思考···,找出猫腻后再来分享。