小白的进阶之路—Caffe移动端的编译及jni开发

目前深度学习很火,但是在移动端的表现并不是很好,尤其是缺乏GPU硬件支撑下,效率也是个很关键的因素。即使最近facebook开源的caffe2,号称轻量级和高度模块化,由于教程不是太多,所以过段时间再去熟悉。

本文项目来源:静态手势的识别,目前不管是数据集还是识别均没有较好的方案。在这种情况下,本文着力于找到一种合适的移动端深度学习运行方案。本文所涉及的基础知识(c++,安卓开发,linux编译安卓so,NDK,eclipse)。

1.深度学习框架选择:Caffe , tesorflow , mxnet , tinycnn。检测方案:离线,在线。

由于本人在pc端一直使用caffe进行手势训练,对caffe更加熟悉,所以选择caffe进行移植。

Tinycnn,纯c++11编写的神经网络库,稳定性和准确率需要验证。

Tensorflow,源码太大,编译复杂。

Caffe-mobile,https://github.com/solrex/caffe-mobile.git,依赖库较少,适合移动端。

Caffe-android-lib,https://github.com/sh1r0/caffe-android-lib.git,需要编译的库太多,编译较为繁琐。

2.开始

API LEVEL>=16。

在尝试多种lib之后,采用caffe-mobile。期间以前版本不支持一些layer,导致读取resnet的prototxt发生错误,向作者提交issue后,更新代码最终成功读取和识别。

编译环境:ubuntu14.04+caffe-mobile。

编译过程:编译过程会下载protobuf和openblas两个依赖库。参考read.md中build android部分。这里需要注意,编译脚本中设置好api level。这里有一个坑,源码和依赖库的api  level一定要一致,否则就会编译错误,之前编译不过就是这个问题。根据需要运行的板子,选择一种对应的target。目前支持armeabi 和armeabi-v7a,armeabi-64。选择armeabi-v7a,会发现生成了libcaffe.a,libproto.a,libcaffejni.so。github原教程里是说最终只使用libcaffejni.so,意思是说在开发中不需要再开发jni了。但是我们实际上jni还有其他需要实现的功能,所以这里只使用libcaffe.a这个库。为了方便,把生成caffe.a的脚本中的static改成shared,把caffe核心库设置成动态库即可。


开发工具:Eclipse with windows。环境配置:NDK+opencv+caffe。

NDK:介绍见http://blog.chinaunix.net/uid-26524139-id-3376699.html。

配置:鉴于本文是面向广大小白的,所以就详细的说一下步骤吧。

A:eclipse opencv配置

如果没有下载过OpenCV4AndroidSDK, 先下载https://sourceforge.net/projects/opencvlibrary/files/opencv-android/,我用的是2.4.10。官方文档见:http://docs.opencv.org/2.4/doc/tutorials/introduction/android_binary_package/O4A_SDK.html。下载后解压至英文路径,防止中文路径错误,下面NDK同理。

然后在eclipse中Importopencv工程。如下图,选中目录,点击确定。


B:NDK 配置. 先下载https://developer.android.google.cn/ndk/downloads/index.html。下载后解压。

至此,即可新建android工程或者打开已有项目。

切换至C/C++ view,打开项目属性->在项目的属性框->Android中,点击Add,选择opencv,导入library。


jni的头文件目录包含:



C:jni编写

将上面生成的libcaffe.so拷贝至jni目录中,并建立include文件夹,将caffe的include头文件拷贝至该目录,如果编译时提示缺少头文件,请至git源码中拷贝。也有可能会提示缺少cuda等头文件,这是由于没有设置cpu宏,只需要define该宏即可。最后提示一下,jni的接口函数一定要和包名匹配,否则加载so会出错。

最后贴一下我的android.mk如下:

LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
LOCAL_MODULE    := caffe
LOCAL_SRC_FILES := libcaffe.so
include $(PREBUILT_SHARED_LIBRARY)

include $(CLEAR_VARS)
#OPENCV_CAMERA_MODULES:=off
#OPENCV_INSTALL_MODULES:=off
OPENCV_LIB_TYPE:=STATIC
include  C:/Works/OpenCV-2.4.10-android-sdk/sdk/native/jni/OpenCV.mk
LOCAL_MODULE := DetectionBasedTracker
LOCAL_SHARED_LIBRARIES += caffe
LOCAL_LDLIBS     += -llog -ldl
LOCAL_C_INCLUDES += $(LOCAL_PATH)/include

LOCAL_SRC_FILES  := caffe_jni.cpp
#caffe_mobile.cpp

include $(BUILD_SHARED_LIBRARY)
jni.cpp
#include 
#include 
#include "caffe/caffe.hpp"
#include "caffe_jni.h"

#define CPU_ONLY
#define USE_NEON_MATH

//#include "caffe_mobile.h"
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#define LOG_TAG "FaceDetection/DetectionBasedTracker"
#define LOGD(...) ((void)__android_log_print(ANDROID_LOG_DEBUG, LOG_TAG, __VA_ARGS__))
#define LOGF(...) ((void)__android_log_print(ANDROID_LOG_FATAL, LOG_TAG, __VA_ARGS__))
#define LOGI(...) ((void)__android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__))

//using namespace std;
using namespace cv;
using namespace caffe;
using std::string;

/* Pair (label, confidence) representing a prediction. */
typedef std::pair Prediction;

class Classifier {
 public:
  Classifier();
  string model_file;
  string trained_file;
  string mean_file;
  string label_file;

  void LoadModel();
  std::vector Classify(const cv::Mat& img, int N = 5);

 private:
  void SetMean(const string& mean_file);

  std::vector Predict(const cv::Mat& img);

  void WrapInputLayer(std::vector* input_channels);

  void Preprocess(const cv::Mat& img,
                  std::vector* input_channels);

 private:
  shared_ptr > net_;
  cv::Size input_geometry_;
  int num_channels_;
  cv::Mat mean_;
  std::vector labels_;
};

Classifier::Classifier()
{
}

Classifier Caffeclassifier; //caffe 分类器

void Classifier::LoadModel()
{
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif
  LOGF("1,Load  Model !");
  /* Load the network. */
  net_.reset(new Net(model_file, TEST));

  LOGF("8888888888,Load  Model !");
  net_->CopyTrainedLayersFrom(trained_file);

  LOGF("111,Load  Model !");
  LOGF("2,Load  Model  %d !" , net_->input_blobs().size());
  Blob* input_layer = net_->input_blobs()[0];
  LOGF("22,Load  Model !");
  num_channels_ = input_layer->channels();
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  LOGF("3,Load  Model !");
  /* Load the binaryproto mean file. */
  SetMean(mean_file);
  LOGF("4,Load  Model !");

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  LOGF("5,Load  Model !");
  Blob* output_layer = net_->output_blobs()[0];

  LOGF("6,Load  Model !");
}
static bool PairCompare(const std::pair& lhs,
                        const std::pair& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector Argmax(const std::vector& v, int N) {
  std::vector > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], static_cast(i)));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

/* Return the top N predictions. */
std::vector Classifier::Classify(const cv::Mat& img, int N) {
  std::vector output = Predict(img);

  N = std::min(labels_.size(), N);
  std::vector maxN = Argmax(output, N);
  std::vector predictions;
  for (int i = 0; i < N; ++i) {
    int idx = maxN[i];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
  }

  return predictions;
}

/* Load the mean file in binaryproto format. */
void Classifier::SetMean(const string& mean_file) {
 /*BlobProto blob_proto;
  LOGF("5,Load  Model %s" , mean_file.c_str());
  ReadProtoFromBinaryFileOrDie(mean_file.c_str(), &blob_proto);

  LOGF("7,Load  Model fail!");
  // Convert from BlobProto to Blob
  Blob mean_blob;
  LOGF("7,Load  Model %d" , num_channels_);
  mean_blob.FromProto(blob_proto);

  LOGF("8,Load  Model %d" , num_channels_);
 //  The format of the mean file is planar 32-bit float BGR or grayscale.
  std::vector channels;
  float* data = mean_blob.mutable_cpu_data();
  for (int i = 0; i < num_channels_; ++i) {
    // Extract an individual channel.
    cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, data);
    channels.push_back(channel);
    data += mean_blob.height() * mean_blob.width();
  }

  // Merge the separate channels into a single image.
  cv::Mat mean;
  cv::merge(channels, mean);*/

  /* Compute the global mean pixel value and create a mean image
   * filled with this value. */
	cv::Scalar channel_mean;
	channel_mean[0] = 127.311;
	channel_mean[1] = 127.67;
	channel_mean[2] = 130.743;
	mean_ = cv::Mat(input_geometry_, CV_32FC3, channel_mean);
}

std::vector Classifier::Predict(const cv::Mat& img) {
  Blob* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(1, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  std::vector input_channels;
  WrapInputLayer(&input_channels);

  Preprocess(img, &input_channels);

  net_->Forward();

  /* Copy the output layer to a std::vector */
  Blob* output_layer = net_->output_blobs()[0];
  const float* begin = output_layer->cpu_data();
  const float* end = begin + output_layer->channels();
  return std::vector(begin, end);
}

/* Wrap the input layer of the network in separate cv::Mat objects
 * (one per channel). This way we save one memcpy operation and we
 * don't need to rely on cudaMemcpy2D. The last preprocessing
 * operation will write the separate channels directly to the input
 * layer. */
void Classifier::WrapInputLayer(std::vector* input_channels) {
  Blob* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels(); ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

void Classifier::Preprocess(const cv::Mat& img,
                            std::vector* input_channels) {
  /* Convert the input image to the input image format of the network. */
  cv::Mat sample;
  if (img.channels() == 3 && num_channels_ == 1)
    cv::cvtColor(img, sample, cv::COLOR_BGR2GRAY);
  else if (img.channels() == 4 && num_channels_ == 1)
    cv::cvtColor(img, sample, cv::COLOR_BGRA2GRAY);
  else if (img.channels() == 4 && num_channels_ == 3)
    cv::cvtColor(img, sample, cv::COLOR_BGRA2BGR);
  else if (img.channels() == 1 && num_channels_ == 3)
    cv::cvtColor(img, sample, cv::COLOR_GRAY2BGR);
  else
    sample = img;

  cv::Mat sample_resized;
  if (sample.size() != input_geometry_)
    cv::resize(sample, sample_resized, input_geometry_);
  else
    sample_resized = sample;

  cv::Mat sample_float;
  if (num_channels_ == 3)
    sample_resized.convertTo(sample_float, CV_32FC3);
  else
    sample_resized.convertTo(sample_float, CV_32FC1);

  cv::Mat sample_normalized;
  cv::subtract(sample_float, mean_, sample_normalized);

  /* This operation will write the separate BGR planes directly to the
   * input layer of the network because it is wrapped by the cv::Mat
   * objects in input_channels. */
  cv::split(sample_normalized, *input_channels);

  CHECK(reinterpret_cast(input_channels->at(0).data)
        == net_->input_blobs()[0]->cpu_data())
    << "Input channels are not wrapping the input layer of the network.";
}

Prediction TestModel(Mat img)
{
	//调用caffe进行二次识别
	double time1 = clock();
	resize(img , img , cv::Size(100,100));
	std::vector predictions = Caffeclassifier.Classify(img);
	double time2 = clock();
	double delay = (double) (time2 - time1) / CLOCKS_PER_SEC;
	Prediction p = predictions[0];
	LOGF("caffe predict , res=%s , prob=%f , time = %fs" , p.first.c_str() , p.second , delay);
	return p;
}

JNIEXPORT jlong JNICALL Java_org_opencv_samples_facedetect_DetectionBasedTracker_nativeInitial
(JNIEnv * jenv, jclass, jstring jFilePath)	//jbyteArray arrycascade
{
	const char* jnamestr = jenv->GetStringUTFChars(jFilePath, NULL);
			string strFilePath(jnamestr);

	
	Caffeclassifier.model_file = strFilePath + "/deploy.prototxt";
	Caffeclassifier.trained_file = strFilePath + "/caffemodel.caffemodel";
	Caffeclassifier.mean_file = strFilePath + "/handnet_mean.binaryproto";
	Caffeclassifier.label_file = strFilePath + "/synset_words.txt";

	LOGF("3,%s" , Caffeclassifier.model_file.c_str());
	Caffeclassifier.LoadModel();

	LOGF("3,Load caffe Model success!");

	return 1;
}

JNIEXPORT void JNICALL Java_org_opencv_samples_facedetect_DetectionBasedTracker_nativeDetect
(JNIEnv * jenv, jclass, jlong thiz, jlong imageRgba, jlong facesrect)
{
	Mat temp = *((Mat*) imageRgba);
	Mat matcolororiginal;
	cvtColor(temp, matcolororiginal, CV_RGBA2BGR);
	int timebegin = 0;
	int timeend = 0;
	timebegin = clock();
	//读每一帧识别
	Prediction p  = TestModel(matcolororiginal);
	timeend = clock();
	double delaytime = (double) (timeend - timebegin) / CLOCKS_PER_SEC;
	string result_text = format("Time=%f s ", delaytime);
	putText(matcolororiginal, result_text, Point(0, 20*(1+1)), FONT_HERSHEY_PLAIN, 3.0, CV_RGB(0,255,0), 2.0);
	result_text = format("res=%s , prob=%f", p.first.c_str() , p.second);
	putText(matcolororiginal, result_text, Point(0, 50*(1+1)), FONT_HERSHEY_PLAIN, 3.0, CV_RGB(0,255,0), 2.0);

	cvtColor(matcolororiginal, temp, CV_BGR2RGBA);

}


3.总结:Android平台可运行Caffe模型的方法(盗用了同事的周报):

a.opencvDNN模块,tinny-cnn,caffe-android-lib,caffe-mobile

b. opencvDNN模块PC上成功运行,速度较原生caffe慢7-8倍,执行精度与原生caffe相当

c.tinny-cnn库RK3288上成功运行,速度较原生caffe快,执行精度远低于原生caffe

d.caffe-android-lib是caffe原生android执行库,速度一般,Android 6.0可成功运行,4.4编译不成功

e.caffe-mobile是精简的caffe执行库,目前RK3288已成功移植,速度较快,简单网络60ms,复杂网络600ms

经过测试,在rk3288上,执行精度与原生caffe相当,执行速度较快


ps.第一次写博客,也是希望能帮助一些有需要的人吧。

你可能感兴趣的:(深度学习Caffe)