caffe2实现多任务学习

前言

caffe2 是caffe的升级版,跟caffe不兼容,解决了caffe的很多问题,比如没有原生的支持多机器训练,加强了移动端的支持等等。总之,caffe已经不再更新了,打上了tag 1.0,快快转到caffe2吧(but, tensorflow或许是更好的选择)

问题描述

题外话

为什么说tensorflow或许是更好的选择呢,是不是caffe2不够NB呢?其实caffe2还是非常强大的,只是感觉其社区的热度远不如tensorflow,主要是facebook的同学在维护,你看caffe2的issues 数量也不多,也没有啥回复解决问题,pull requests 好多都没能merge上。总之,个人感觉,为了使用上的方便,还没入坑的可以慎重考虑一下。

caffe支持多任务

多任务是一个很常见的需求,为了提高模型能力协同学习也好,或者为了减少inference时间也好,总之,它是一个高频的需求。caffe只是多种的数据格式来训练模型,最高效的是LMDB/LevelDB ,它用存储key-value键值对,key是啥其实不重要,只要唯一就行,而value是 Datum 结构体的序列化。

Datum 的protobuf定义如下:

message Datum {

  optional int32 channels = 1;

  optional int32 height = 2;

  optional int32 width = 3;

  // the actual image data, in bytes

  optional bytes data = 4;

  optional int32 label = 5;

  // Optionally, the datum could also hold float data.

  repeated float float_data = 6;

  // If true data contains an encoded image that need to be decoded

  optional bool encoded = 7 [default = false];

}

可以看到label是单值的,你无法存储多个label,并且Data 层有很多地方都是hard-code 这个,改起来比较麻烦。虽然也可以自己新写一个layer层来自定义数据层,比如我之前的项目直接写一个python layer来支持多任务,不过实现起来总没有caffe原生效率高,而且有了python layer之后就不支持多GPU训练了。

caffe2支持多任务

caffe2原生且搞笑的数据IO格式为LMDB,和caffe一样,value存放的是序列化之后的字符串,只不过在caffe2里面,这里不再是Datum,而是新的protobuf,叫TensorProtos(注意是复数),即多个TensorProto的列表,TensorProto的定义如下:

message TensorProto {

  // The dimensions in the tensor.

  repeated int64 dims = 1;

  enum DataType {

    UNDEFINED = 0;

    FLOAT = 1;  // float

    INT32 = 2;  // int

    BYTE = 3;  // BYTE, when deserialized, is going to be restored as uint8.

    STRING = 4;  // string

    // Less-commonly used data types.

    BOOL = 5;  // bool

    UINT8 = 6;  // uint8_t

    INT8 = 7;  // int8_t

    UINT16 = 8;  // uint16_t

    INT16 = 9;  // int16_t

    INT64 = 10;  // int64_t

    FLOAT16 = 12;  // caffe2::__f16, caffe2::float16

    DOUBLE = 13;  // double

  }

  optional DataType data_type = 2 [default = FLOAT];

  // For float

  repeated float float_data = 3 [packed = true];

  // For int32, uint8, int8, uint16, int16, bool, and float16

  // Note about float16: in storage we will basically convert float16 byte-wise

  // to unsigned short and then store them in the int32_data field.

  repeated int32 int32_data = 4 [packed = true];

  // For bytes

  optional bytes byte_data = 5;

  // For strings

  repeated bytes string_data = 6;

  // For double

  repeated double double_data = 9 [packed = true];

  // For int64

  repeated int64 int64_data = 10 [packed = true];

  // Optionally, a name for the tensor.

  optional string name = 7;



  // Optionally, a TensorProto can contain the details about the device that

  // it was serialized from. This is useful in cases like snapshotting a whole

  // workspace in a multi-GPU environment.

  optional DeviceOption device_detail = 8;

  // When loading from chunks this is going to indicate where to put data in the

  // full array. When not used full data have to be present

  message Segment {

    required int64 begin = 1;

    required int64 end = 2;

  }

  optional Segment segment = 11;

}

可以发现,新的格式支持丰富了很多,可以原生支持很多的格式和任务,如分割、检测等,没有指定哪个字段是来存放标签label,我们在选择上也更自由一些;

看代码可以知道,caffe2实现了一个ImageInput来进行高效的读取图片和标签数据,操作返回数据和标签


    data, label = brew.image_input(

        model,

        reader, ["data", "label"],

        batch_size=batch_size,

        use_caffe_datum=True,

        mean=128.,

        std=128.,

        scale=256,

        crop=img_size,

        mirror=1

    )

原生的代码不支持多个标签输出,稍微改一下即可,支持,diff如下:

diff --git a/caffe2/image/image_input_op.cc b/caffe2/image/image_input_op.cc
index 49ff804..dec218b 100644
--- a/caffe2/image/image_input_op.cc
+++ b/caffe2/image/image_input_op.cc
@@ -75,6 +75,7 @@ The dimension of the output image will always be cropxcrop
     .Arg("db", "Name of the database (if not passed as input)")
     .Arg("db_type", "Type of database (if not passed as input)."
          " Defaults to leveldb")
+    .Arg("label_len", "len of labels, for multi-task or multi-dim regression purpose")
     .Input(0, "reader", "The input reader (a db::DBReader)")
     .Output(0, "data", "Tensor containing the images")
     .Output(1, "label", "Tensor containing the labels");
diff --git a/caffe2/image/image_input_op.h b/caffe2/image/image_input_op.h
index 25ec5e9..4d514b2 100644
--- a/caffe2/image/image_input_op.h
+++ b/caffe2/image/image_input_op.h
@@ -84,6 +84,8 @@ class ImageInputOp final
   bool is_test_;
   bool use_caffe_datum_;
   bool gpu_transform_;
+  bool mean_std_copied_ = false;
+  int label_len_;

   // thread pool for parse + decode
   int num_decode_threads_;
@@ -117,6 +119,7 @@ ImageInputOp::ImageInputOp(
         num_decode_threads_(OperatorBase::template GetSingleArgument<int>(
               "decode_threads", 4)),
         thread_pool_(std::make_shared(num_decode_threads_)),
+        label_len_(OperatorBase::template GetSingleArgument<int>("label_len", 1)),
         // output type only supported with CUDA and use_gpu_transform for now
         output_type_(cast::GetCastDataType(this->arg_helper(), "output_type"))
 {
@@ -224,20 +227,6 @@ ImageInputOp::ImageInputOp(
   LOG(INFO) << "    Outputting images as "
             << OperatorBase::template GetSingleArgument<string>("output_type", "unknown") << ".";

-  if (gpu_transform_) {
-    if (!std::is_same::value) {
-      throw std::runtime_error("use_gpu_transform only for GPUs");
-    } else {
-      mean_gpu_.Resize(mean_.size());
-      std_gpu_.Resize(std_.size());
-
-      context_.template Copy<float, CPUContext, Context>(
-        mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
-      context_.template Copy<float, CPUContext, Context>(
-        std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
-    }
-  }
-
   std::mt19937 meta_randgen(time(nullptr));
   for (int i = 0; i < num_decode_threads_; ++i) {
     randgen_per_thread_.emplace_back(meta_randgen());
@@ -247,7 +236,8 @@ ImageInputOp::ImageInputOp(
       TIndex(crop_),
       TIndex(crop_),
       TIndex(color_ ? 3 : 1));
-  prefetched_label_.Resize(vector(1, batch_size_));
+  //prefetched_label_.Resize(vector(label_len_, batch_size_));
+  prefetched_label_.Resize(TIndex(batch_size_), TIndex(label_len_));
 }
 template <class Context>
@@ -356,14 +346,17 @@ bool ImageInputOp::GetImageAndLabelAndInfoFromDBValue(

     if (label_proto.data_type() == TensorProto::FLOAT) {
       DCHECK_EQ(label_proto.float_data_size(), 1);
-
-      prefetched_label_.mutable_data<float>()[item_id] =
-          label_proto.float_data(0);
+      for(int t = 0; t < label_len_; t++) {
+      prefetched_label_.mutable_data<float>()[label_len_*item_id+t] =
+          label_proto.float_data(t);
+      }
     } else if (label_proto.data_type() == TensorProto::INT32) {
       DCHECK_EQ(label_proto.int32_data_size(), 1);

-      prefetched_label_.mutable_data<int>()[item_id] =
-          label_proto.int32_data(0);
+      for(int t = 0; t < label_len_; t++) {
+        prefetched_label_.mutable_data<int>()[label_len_*item_id+t] =
+          label_proto.int32_data(t);
+      }
     } else {
       LOG(FATAL) << "Unsupported label type.";
     }
@@ -690,6 +683,16 @@ bool ImageInputOp::CopyPrefetched() {
     label_output->CopyFrom(prefetched_label_, &context_);
   } else {
     if (gpu_transform_) {
+      if (!mean_std_copied_) {
+        mean_gpu_.Resize(mean_.size());
+        std_gpu_.Resize(std_.size());
+
+        context_.template Copy<float, CPUContext, Context>(
+          mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
+        context_.template Copy<float, CPUContext, Context>(
+          std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
+        mean_std_copied_ = true;
+      }
       // GPU transform kernel allows explicitly setting output type
       if (output_type_ == TensorProto_DataType_FLOAT) {
         TransformOnGPUfloat,Context>(prefetched_image_on_device_,

这样之后,

    data, label = brew.image_input(
        model,
        reader, ["data","labels"],
        batch_size=batch_size,
        use_caffe_datum=False,
        mean=128.,
        std=128.,
        scale=256,
        crop=img_size,
        mirror=1,
        label_len=2
    )

的输出label就是多个标签了,然后再

 model.net.Split("labels", ["label_age", "label_gender"], axis=1)

之后label_age, lagel_gender 就可以作为单独的label计算loss了。

后记

好久没有更新博客,一方面工作比较忙,另一方面也懒了好多(:<)。

Ps:今天重新看了caffe2 的github,貌似多个labels以及原生支持了,good,所以这篇记录,貌似没有啥用的说(:>)。

你可能感兴趣的:(caffe2)