目前深度学习很火,但是在移动端的表现并不是很好,尤其是缺乏GPU硬件支撑下,效率也是个很关键的因素。即使最近facebook开源的caffe2,号称轻量级和高度模块化,由于教程不是太多,所以过段时间再去熟悉。
本文项目来源:静态手势的识别,目前不管是数据集还是识别均没有较好的方案。在这种情况下,本文着力于找到一种合适的移动端深度学习运行方案。本文所涉及的基础知识(c++,安卓开发,linux编译安卓so,NDK,eclipse)。
1.深度学习框架选择:Caffe , tesorflow , mxnet , tinycnn。检测方案:离线,在线。
由于本人在pc端一直使用caffe进行手势训练,对caffe更加熟悉,所以选择caffe进行移植。
Tinycnn,纯c++11编写的神经网络库,稳定性和准确率需要验证。
Tensorflow,源码太大,编译复杂。
Caffe-mobile,https://github.com/solrex/caffe-mobile.git,依赖库较少,适合移动端。
Caffe-android-lib,https://github.com/sh1r0/caffe-android-lib.git,需要编译的库太多,编译较为繁琐。
2.开始
API LEVEL>=16。
在尝试多种lib之后,采用caffe-mobile。期间以前版本不支持一些layer,导致读取resnet的prototxt发生错误,向作者提交issue后,更新代码最终成功读取和识别。
编译环境:ubuntu14.04+caffe-mobile。
编译过程:编译过程会下载protobuf和openblas两个依赖库。参考read.md中build android部分。这里需要注意,编译脚本中设置好api level。这里有一个坑,源码和依赖库的api level一定要一致,否则就会编译错误,之前编译不过就是这个问题。根据需要运行的板子,选择一种对应的target。目前支持armeabi 和armeabi-v7a,armeabi-64。选择armeabi-v7a,会发现生成了libcaffe.a,libproto.a,libcaffejni.so。github原教程里是说最终只使用libcaffejni.so,意思是说在开发中不需要再开发jni了。但是我们实际上jni还有其他需要实现的功能,所以这里只使用libcaffe.a这个库。为了方便,把生成caffe.a的脚本中的static改成shared,把caffe核心库设置成动态库即可。
开发工具:Eclipse with windows。环境配置:NDK+opencv+caffe。
NDK:介绍见http://blog.chinaunix.net/uid-26524139-id-3376699.html。
配置:鉴于本文是面向广大小白的,所以就详细的说一下步骤吧。
A:eclipse opencv配置
如果没有下载过OpenCV4AndroidSDK, 先下载https://sourceforge.net/projects/opencvlibrary/files/opencv-android/,我用的是2.4.10。官方文档见:http://docs.opencv.org/2.4/doc/tutorials/introduction/android_binary_package/O4A_SDK.html。下载后解压至英文路径,防止中文路径错误,下面NDK同理。
然后在eclipse中Importopencv工程。如下图,选中目录,点击确定。
B:NDK 配置. 先下载https://developer.android.google.cn/ndk/downloads/index.html。下载后解压。
至此,即可新建android工程或者打开已有项目。
切换至C/C++ view,打开项目属性->在项目的属性框->Android中,点击Add,选择opencv,导入library。
jni的头文件目录包含:
C:jni编写
将上面生成的libcaffe.so拷贝至jni目录中,并建立include文件夹,将caffe的include头文件拷贝至该目录,如果编译时提示缺少头文件,请至git源码中拷贝。也有可能会提示缺少cuda等头文件,这是由于没有设置cpu宏,只需要define该宏即可。最后提示一下,jni的接口函数一定要和包名匹配,否则加载so会出错。
最后贴一下我的android.mk如下:
LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
LOCAL_MODULE := caffe
LOCAL_SRC_FILES := libcaffe.so
include $(PREBUILT_SHARED_LIBRARY)
include $(CLEAR_VARS)
#OPENCV_CAMERA_MODULES:=off
#OPENCV_INSTALL_MODULES:=off
OPENCV_LIB_TYPE:=STATIC
include C:/Works/OpenCV-2.4.10-android-sdk/sdk/native/jni/OpenCV.mk
LOCAL_MODULE := DetectionBasedTracker
LOCAL_SHARED_LIBRARIES += caffe
LOCAL_LDLIBS += -llog -ldl
LOCAL_C_INCLUDES += $(LOCAL_PATH)/include
LOCAL_SRC_FILES := caffe_jni.cpp
#caffe_mobile.cpp
include $(BUILD_SHARED_LIBRARY)
jni.cpp
#include
#include
#include "caffe/caffe.hpp"
#include "caffe_jni.h"
#define CPU_ONLY
#define USE_NEON_MATH
//#include "caffe_mobile.h"
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define LOG_TAG "FaceDetection/DetectionBasedTracker"
#define LOGD(...) ((void)__android_log_print(ANDROID_LOG_DEBUG, LOG_TAG, __VA_ARGS__))
#define LOGF(...) ((void)__android_log_print(ANDROID_LOG_FATAL, LOG_TAG, __VA_ARGS__))
#define LOGI(...) ((void)__android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__))
//using namespace std;
using namespace cv;
using namespace caffe;
using std::string;
/* Pair (label, confidence) representing a prediction. */
typedef std::pair Prediction;
class Classifier {
public:
Classifier();
string model_file;
string trained_file;
string mean_file;
string label_file;
void LoadModel();
std::vector Classify(const cv::Mat& img, int N = 5);
private:
void SetMean(const string& mean_file);
std::vector Predict(const cv::Mat& img);
void WrapInputLayer(std::vector* input_channels);
void Preprocess(const cv::Mat& img,
std::vector* input_channels);
private:
shared_ptr > net_;
cv::Size input_geometry_;
int num_channels_;
cv::Mat mean_;
std::vector labels_;
};
Classifier::Classifier()
{
}
Classifier Caffeclassifier; //caffe 分类器
void Classifier::LoadModel()
{
#ifdef CPU_ONLY
Caffe::set_mode(Caffe::CPU);
#else
Caffe::set_mode(Caffe::GPU);
#endif
LOGF("1,Load Model !");
/* Load the network. */
net_.reset(new Net(model_file, TEST));
LOGF("8888888888,Load Model !");
net_->CopyTrainedLayersFrom(trained_file);
LOGF("111,Load Model !");
LOGF("2,Load Model %d !" , net_->input_blobs().size());
Blob* input_layer = net_->input_blobs()[0];
LOGF("22,Load Model !");
num_channels_ = input_layer->channels();
input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
LOGF("3,Load Model !");
/* Load the binaryproto mean file. */
SetMean(mean_file);
LOGF("4,Load Model !");
/* Load labels. */
std::ifstream labels(label_file.c_str());
string line;
while (std::getline(labels, line))
labels_.push_back(string(line));
LOGF("5,Load Model !");
Blob* output_layer = net_->output_blobs()[0];
LOGF("6,Load Model !");
}
static bool PairCompare(const std::pair& lhs,
const std::pair& rhs) {
return lhs.first > rhs.first;
}
/* Return the indices of the top N values of vector v. */
static std::vector Argmax(const std::vector& v, int N) {
std::vector > pairs;
for (size_t i = 0; i < v.size(); ++i)
pairs.push_back(std::make_pair(v[i], static_cast(i)));
std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);
std::vector result;
for (int i = 0; i < N; ++i)
result.push_back(pairs[i].second);
return result;
}
/* Return the top N predictions. */
std::vector Classifier::Classify(const cv::Mat& img, int N) {
std::vector output = Predict(img);
N = std::min(labels_.size(), N);
std::vector maxN = Argmax(output, N);
std::vector predictions;
for (int i = 0; i < N; ++i) {
int idx = maxN[i];
predictions.push_back(std::make_pair(labels_[idx], output[idx]));
}
return predictions;
}
/* Load the mean file in binaryproto format. */
void Classifier::SetMean(const string& mean_file) {
/*BlobProto blob_proto;
LOGF("5,Load Model %s" , mean_file.c_str());
ReadProtoFromBinaryFileOrDie(mean_file.c_str(), &blob_proto);
LOGF("7,Load Model fail!");
// Convert from BlobProto to Blob
Blob mean_blob;
LOGF("7,Load Model %d" , num_channels_);
mean_blob.FromProto(blob_proto);
LOGF("8,Load Model %d" , num_channels_);
// The format of the mean file is planar 32-bit float BGR or grayscale.
std::vector channels;
float* data = mean_blob.mutable_cpu_data();
for (int i = 0; i < num_channels_; ++i) {
// Extract an individual channel.
cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, data);
channels.push_back(channel);
data += mean_blob.height() * mean_blob.width();
}
// Merge the separate channels into a single image.
cv::Mat mean;
cv::merge(channels, mean);*/
/* Compute the global mean pixel value and create a mean image
* filled with this value. */
cv::Scalar channel_mean;
channel_mean[0] = 127.311;
channel_mean[1] = 127.67;
channel_mean[2] = 130.743;
mean_ = cv::Mat(input_geometry_, CV_32FC3, channel_mean);
}
std::vector Classifier::Predict(const cv::Mat& img) {
Blob* input_layer = net_->input_blobs()[0];
input_layer->Reshape(1, num_channels_,
input_geometry_.height, input_geometry_.width);
/* Forward dimension change to all layers. */
net_->Reshape();
std::vector input_channels;
WrapInputLayer(&input_channels);
Preprocess(img, &input_channels);
net_->Forward();
/* Copy the output layer to a std::vector */
Blob* output_layer = net_->output_blobs()[0];
const float* begin = output_layer->cpu_data();
const float* end = begin + output_layer->channels();
return std::vector(begin, end);
}
/* Wrap the input layer of the network in separate cv::Mat objects
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void Classifier::WrapInputLayer(std::vector* input_channels) {
Blob* input_layer = net_->input_blobs()[0];
int width = input_layer->width();
int height = input_layer->height();
float* input_data = input_layer->mutable_cpu_data();
for (int i = 0; i < input_layer->channels(); ++i) {
cv::Mat channel(height, width, CV_32FC1, input_data);
input_channels->push_back(channel);
input_data += width * height;
}
}
void Classifier::Preprocess(const cv::Mat& img,
std::vector* input_channels) {
/* Convert the input image to the input image format of the network. */
cv::Mat sample;
if (img.channels() == 3 && num_channels_ == 1)
cv::cvtColor(img, sample, cv::COLOR_BGR2GRAY);
else if (img.channels() == 4 && num_channels_ == 1)
cv::cvtColor(img, sample, cv::COLOR_BGRA2GRAY);
else if (img.channels() == 4 && num_channels_ == 3)
cv::cvtColor(img, sample, cv::COLOR_BGRA2BGR);
else if (img.channels() == 1 && num_channels_ == 3)
cv::cvtColor(img, sample, cv::COLOR_GRAY2BGR);
else
sample = img;
cv::Mat sample_resized;
if (sample.size() != input_geometry_)
cv::resize(sample, sample_resized, input_geometry_);
else
sample_resized = sample;
cv::Mat sample_float;
if (num_channels_ == 3)
sample_resized.convertTo(sample_float, CV_32FC3);
else
sample_resized.convertTo(sample_float, CV_32FC1);
cv::Mat sample_normalized;
cv::subtract(sample_float, mean_, sample_normalized);
/* This operation will write the separate BGR planes directly to the
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
cv::split(sample_normalized, *input_channels);
CHECK(reinterpret_cast(input_channels->at(0).data)
== net_->input_blobs()[0]->cpu_data())
<< "Input channels are not wrapping the input layer of the network.";
}
Prediction TestModel(Mat img)
{
//调用caffe进行二次识别
double time1 = clock();
resize(img , img , cv::Size(100,100));
std::vector predictions = Caffeclassifier.Classify(img);
double time2 = clock();
double delay = (double) (time2 - time1) / CLOCKS_PER_SEC;
Prediction p = predictions[0];
LOGF("caffe predict , res=%s , prob=%f , time = %fs" , p.first.c_str() , p.second , delay);
return p;
}
JNIEXPORT jlong JNICALL Java_org_opencv_samples_facedetect_DetectionBasedTracker_nativeInitial
(JNIEnv * jenv, jclass, jstring jFilePath) //jbyteArray arrycascade
{
const char* jnamestr = jenv->GetStringUTFChars(jFilePath, NULL);
string strFilePath(jnamestr);
Caffeclassifier.model_file = strFilePath + "/deploy.prototxt";
Caffeclassifier.trained_file = strFilePath + "/caffemodel.caffemodel";
Caffeclassifier.mean_file = strFilePath + "/handnet_mean.binaryproto";
Caffeclassifier.label_file = strFilePath + "/synset_words.txt";
LOGF("3,%s" , Caffeclassifier.model_file.c_str());
Caffeclassifier.LoadModel();
LOGF("3,Load caffe Model success!");
return 1;
}
JNIEXPORT void JNICALL Java_org_opencv_samples_facedetect_DetectionBasedTracker_nativeDetect
(JNIEnv * jenv, jclass, jlong thiz, jlong imageRgba, jlong facesrect)
{
Mat temp = *((Mat*) imageRgba);
Mat matcolororiginal;
cvtColor(temp, matcolororiginal, CV_RGBA2BGR);
int timebegin = 0;
int timeend = 0;
timebegin = clock();
//读每一帧识别
Prediction p = TestModel(matcolororiginal);
timeend = clock();
double delaytime = (double) (timeend - timebegin) / CLOCKS_PER_SEC;
string result_text = format("Time=%f s ", delaytime);
putText(matcolororiginal, result_text, Point(0, 20*(1+1)), FONT_HERSHEY_PLAIN, 3.0, CV_RGB(0,255,0), 2.0);
result_text = format("res=%s , prob=%f", p.first.c_str() , p.second);
putText(matcolororiginal, result_text, Point(0, 50*(1+1)), FONT_HERSHEY_PLAIN, 3.0, CV_RGB(0,255,0), 2.0);
cvtColor(matcolororiginal, temp, CV_BGR2RGBA);
}
3.总结:Android平台可运行Caffe模型的方法(盗用了同事的周报):
a.opencvDNN模块,tinny-cnn,caffe-android-lib,caffe-mobile
b. opencvDNN模块PC上成功运行,速度较原生caffe慢7-8倍,执行精度与原生caffe相当
c.tinny-cnn库RK3288上成功运行,速度较原生caffe快,执行精度远低于原生caffe
d.caffe-android-lib是caffe原生android执行库,速度一般,Android 6.0可成功运行,4.4编译不成功
e.caffe-mobile是精简的caffe执行库,目前RK3288已成功移植,速度较快,简单网络60ms,复杂网络600ms
经过测试,在rk3288上,执行精度与原生caffe相当,执行速度较快
ps.第一次写博客,也是希望能帮助一些有需要的人吧。