caffe2 是caffe的升级版,跟caffe不兼容,解决了caffe的很多问题,比如没有原生的支持多机器训练,加强了移动端的支持等等。总之,caffe已经不再更新了,打上了tag 1.0,快快转到caffe2吧(but, tensorflow或许是更好的选择)
为什么说tensorflow或许是更好的选择呢,是不是caffe2不够NB呢?其实caffe2还是非常强大的,只是感觉其社区的热度远不如tensorflow,主要是facebook的同学在维护,你看caffe2的issues 数量也不多,也没有啥回复解决问题,pull requests 好多都没能merge上。总之,个人感觉,为了使用上的方便,还没入坑的可以慎重考虑一下。
多任务是一个很常见的需求,为了提高模型能力协同学习也好,或者为了减少inference时间也好,总之,它是一个高频的需求。caffe只是多种的数据格式来训练模型,最高效的是LMDB/LevelDB ,它用存储key-value键值对,key是啥其实不重要,只要唯一就行,而value是 Datum 结构体的序列化。
Datum 的protobuf定义如下:
message Datum {
optional int32 channels = 1;
optional int32 height = 2;
optional int32 width = 3;
// the actual image data, in bytes
optional bytes data = 4;
optional int32 label = 5;
// Optionally, the datum could also hold float data.
repeated float float_data = 6;
// If true data contains an encoded image that need to be decoded
optional bool encoded = 7 [default = false];
}
可以看到label是单值的,你无法存储多个label,并且Data 层有很多地方都是hard-code 这个,改起来比较麻烦。虽然也可以自己新写一个layer层来自定义数据层,比如我之前的项目直接写一个python layer来支持多任务,不过实现起来总没有caffe原生效率高,而且有了python layer之后就不支持多GPU训练了。
caffe2原生且搞笑的数据IO格式为LMDB,和caffe一样,value存放的是序列化之后的字符串,只不过在caffe2里面,这里不再是Datum,而是新的protobuf,叫TensorProtos(注意是复数),即多个TensorProto的列表,TensorProto的定义如下:
message TensorProto {
// The dimensions in the tensor.
repeated int64 dims = 1;
enum DataType {
UNDEFINED = 0;
FLOAT = 1; // float
INT32 = 2; // int
BYTE = 3; // BYTE, when deserialized, is going to be restored as uint8.
STRING = 4; // string
// Less-commonly used data types.
BOOL = 5; // bool
UINT8 = 6; // uint8_t
INT8 = 7; // int8_t
UINT16 = 8; // uint16_t
INT16 = 9; // int16_t
INT64 = 10; // int64_t
FLOAT16 = 12; // caffe2::__f16, caffe2::float16
DOUBLE = 13; // double
}
optional DataType data_type = 2 [default = FLOAT];
// For float
repeated float float_data = 3 [packed = true];
// For int32, uint8, int8, uint16, int16, bool, and float16
// Note about float16: in storage we will basically convert float16 byte-wise
// to unsigned short and then store them in the int32_data field.
repeated int32 int32_data = 4 [packed = true];
// For bytes
optional bytes byte_data = 5;
// For strings
repeated bytes string_data = 6;
// For double
repeated double double_data = 9 [packed = true];
// For int64
repeated int64 int64_data = 10 [packed = true];
// Optionally, a name for the tensor.
optional string name = 7;
// Optionally, a TensorProto can contain the details about the device that
// it was serialized from. This is useful in cases like snapshotting a whole
// workspace in a multi-GPU environment.
optional DeviceOption device_detail = 8;
// When loading from chunks this is going to indicate where to put data in the
// full array. When not used full data have to be present
message Segment {
required int64 begin = 1;
required int64 end = 2;
}
optional Segment segment = 11;
}
可以发现,新的格式支持丰富了很多,可以原生支持很多的格式和任务,如分割、检测等,没有指定哪个字段是来存放标签label,我们在选择上也更自由一些;
看代码可以知道,caffe2实现了一个ImageInput来进行高效的读取图片和标签数据,操作返回数据和标签
data, label = brew.image_input(
model,
reader, ["data", "label"],
batch_size=batch_size,
use_caffe_datum=True,
mean=128.,
std=128.,
scale=256,
crop=img_size,
mirror=1
)
原生的代码不支持多个标签输出,稍微改一下即可,支持,diff如下:
diff --git a/caffe2/image/image_input_op.cc b/caffe2/image/image_input_op.cc
index 49ff804..dec218b 100644
--- a/caffe2/image/image_input_op.cc
+++ b/caffe2/image/image_input_op.cc
@@ -75,6 +75,7 @@ The dimension of the output image will always be cropxcrop
.Arg("db", "Name of the database (if not passed as input)")
.Arg("db_type", "Type of database (if not passed as input)."
" Defaults to leveldb")
+ .Arg("label_len", "len of labels, for multi-task or multi-dim regression purpose")
.Input(0, "reader", "The input reader (a db::DBReader)")
.Output(0, "data", "Tensor containing the images")
.Output(1, "label", "Tensor containing the labels");
diff --git a/caffe2/image/image_input_op.h b/caffe2/image/image_input_op.h
index 25ec5e9..4d514b2 100644
--- a/caffe2/image/image_input_op.h
+++ b/caffe2/image/image_input_op.h
@@ -84,6 +84,8 @@ class ImageInputOp final
bool is_test_;
bool use_caffe_datum_;
bool gpu_transform_;
+ bool mean_std_copied_ = false;
+ int label_len_;
// thread pool for parse + decode
int num_decode_threads_;
@@ -117,6 +119,7 @@ ImageInputOp::ImageInputOp(
num_decode_threads_(OperatorBase::template GetSingleArgument<int>(
"decode_threads", 4)),
thread_pool_(std::make_shared(num_decode_threads_)),
+ label_len_(OperatorBase::template GetSingleArgument<int>("label_len", 1)),
// output type only supported with CUDA and use_gpu_transform for now
output_type_(cast::GetCastDataType(this->arg_helper(), "output_type"))
{
@@ -224,20 +227,6 @@ ImageInputOp::ImageInputOp(
LOG(INFO) << " Outputting images as "
<< OperatorBase::template GetSingleArgument<string>("output_type", "unknown") << ".";
- if (gpu_transform_) {
- if (!std::is_same::value) {
- throw std::runtime_error("use_gpu_transform only for GPUs");
- } else {
- mean_gpu_.Resize(mean_.size());
- std_gpu_.Resize(std_.size());
-
- context_.template Copy<float, CPUContext, Context>(
- mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
- context_.template Copy<float, CPUContext, Context>(
- std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
- }
- }
-
std::mt19937 meta_randgen(time(nullptr));
for (int i = 0; i < num_decode_threads_; ++i) {
randgen_per_thread_.emplace_back(meta_randgen());
@@ -247,7 +236,8 @@ ImageInputOp::ImageInputOp(
TIndex(crop_),
TIndex(crop_),
TIndex(color_ ? 3 : 1));
- prefetched_label_.Resize(vector (1, batch_size_));
+ //prefetched_label_.Resize(vector(label_len_, batch_size_));
+ prefetched_label_.Resize(TIndex(batch_size_), TIndex(label_len_));
}
template <class Context>
@@ -356,14 +346,17 @@ bool ImageInputOp::GetImageAndLabelAndInfoFromDBValue(
if (label_proto.data_type() == TensorProto::FLOAT) {
DCHECK_EQ(label_proto.float_data_size(), 1);
-
- prefetched_label_.mutable_data<float>()[item_id] =
- label_proto.float_data(0);
+ for(int t = 0; t < label_len_; t++) {
+ prefetched_label_.mutable_data<float>()[label_len_*item_id+t] =
+ label_proto.float_data(t);
+ }
} else if (label_proto.data_type() == TensorProto::INT32) {
DCHECK_EQ(label_proto.int32_data_size(), 1);
- prefetched_label_.mutable_data<int>()[item_id] =
- label_proto.int32_data(0);
+ for(int t = 0; t < label_len_; t++) {
+ prefetched_label_.mutable_data<int>()[label_len_*item_id+t] =
+ label_proto.int32_data(t);
+ }
} else {
LOG(FATAL) << "Unsupported label type.";
}
@@ -690,6 +683,16 @@ bool ImageInputOp::CopyPrefetched() {
label_output->CopyFrom(prefetched_label_, &context_);
} else {
if (gpu_transform_) {
+ if (!mean_std_copied_) {
+ mean_gpu_.Resize(mean_.size());
+ std_gpu_.Resize(std_.size());
+
+ context_.template Copy<float, CPUContext, Context>(
+ mean_.size(), mean_.data(), mean_gpu_.template mutable_data<float>());
+ context_.template Copy<float, CPUContext, Context>(
+ std_.size(), std_.data(), std_gpu_.template mutable_data<float>());
+ mean_std_copied_ = true;
+ }
// GPU transform kernel allows explicitly setting output type
if (output_type_ == TensorProto_DataType_FLOAT) {
TransformOnGPUfloat,Context>(prefetched_image_on_device_,
这样之后,
data, label = brew.image_input(
model,
reader, ["data","labels"],
batch_size=batch_size,
use_caffe_datum=False,
mean=128.,
std=128.,
scale=256,
crop=img_size,
mirror=1,
label_len=2
)
的输出label就是多个标签了,然后再
model.net.Split("labels", ["label_age", "label_gender"], axis=1)
之后label_age, lagel_gender 就可以作为单独的label计算loss了。
好久没有更新博客,一方面工作比较忙,另一方面也懒了好多(:<)。
Ps:今天重新看了caffe2 的github,貌似多个labels以及原生支持了,good,所以这篇记录,貌似没有啥用的说(:>)。