Caffe2中的存储结构层次从上到下依次是Workspace, Blob, Tensor。Workspace存储了运行时所有的Blob和实例化的Net。Blob可以视为对任意类型的一个封装的类,比如封装Tensor, float, string等等。Tensor就是一个多维数组,这个Tensor就类似于Caffe1中的Blob。Caffe2中真正涉及到分配存储空间的调用则在Context中,分为CPUContext和CUDAContext。下面按照从下到上的顺序分析一下Caffe2的存储分配过程。
本节主要集中在CPU部分的存储管理,GPU部分的管理以后再补上。
CPUContext的摘要如下:
class CPUContext final {
public:
......
// 分配存储空间
static std::pair<void*, MemoryDeleter> New(size_t nbytes) {
auto data_and_deleter = GetCPUAllocator()->New(nbytes);
if (FLAGS_caffe2_report_cpu_memory_usage) {
reporter_.New(data_and_deleter.first, nbytes);
data_and_deleter.second = ReportAndDelete;
}
return data_and_deleter;
}
// 复制数据
template <class T, class SrcContext, class DstContext>
inline void Copy(size_t nbytes, const T* src, T* dst) {
if (std::is_fundamental::value) {
CopyBytes(n * sizeof(T),
static_cast<const void*>(src),
static_cast<void*>(dst));
} else {
for (int i = 0; i < n; ++i) {
dst[i] = src[i];
}
}
}
......
protected:
static MemoryAllocationReporter reporter_;
......
};
CPUAllocator* GetCPUAllocator() {
return g_cpu_allocator.get();
}
CPUContext的基本功能就是分配nbytes字节的内存空间,以及在相同或者不同Context复制数据。
GetCPUAllocator返回一个指向CPUAllocator类的unique_ptr。而CPUAllocator则是提供分配空间的接口类(虚基类), Caffe2提供了 一个默认的主机端内存分配器DefaultCPUAllocator,它返回指定字节的对齐的内存,当然你也可以自定义实现高效的内存分配器,如同stl中的两级内存分配。
// A virtual allocator class to do memory allocation and deallocation.
struct CPUAllocator {
CPUAllocator() {}
virtual ~CPUAllocator() noexcept {}
virtual std::pair<void*, MemoryDeleter> New(size_t nbytes) = 0; //返回分配内存的首地址以及销毁该段内存的可调用函数指针
virtual MemoryDeleter GetDeleter() = 0;
}; //这里的MemoryDeleter就是一个函数指针,using MemoryDeleter = void (*)(void*);
static std::unique_ptr g_cpu_allocator(new DefaultCPUAllocator());
这个Caffe2默认的内存分配器DefaultAllocator的功能就是调用_alligned_malloc(Linux调用posix_memalign)分配32字节对齐内存,然后调用free(Linux上调用_aligned_free)来释放内存。
一个典型的aligned_malloc分类代码如下:
void* aligned_malloc(size_t size, size_t alignment) {
if(alignment & (alignment - 1)) { //分配2^n字节对齐的内存
return nullptr
} else {
void *praw = malloc(sizeof(void*) + size + alignment);
if(praw) {
void *pbuf = reinterpret_cast<void*>(reinterpret_cast(praw) + sizeof(void*));
void *palignedbuf = reinterpret_cast<void*>((reinterpret_cast(pbuf) | (alignment - 1)) + 1);
(static_cast<void**>palignedbuf)[-1] = praw;
return palignedbuf;
}
else {
return nullptr;
}
}
}
void aligned_free(void *palignedmem) {
free(reinterpret_cast<void*>((static_cast<void**>palignedmem)[-1]));
}
MemoryAllocationReporter则是用来记录分配内存的信息的线程安全类,包括每一段内存首地址以及该段内存的大小,和所有已经分配的内存大小。该类应该只产生一个实例,在CPUContext被声明为静态成员变量。
class MemoryAllocationReporter {
public:
......
private:
std::mutex mutex_;
std::unordered_map<void*, size_t> size_table_; //内存首地址,与该段内存的大小信息
size_t allocated_;//记录Caffe2所有已经分配的内存大小,以字节计。
};
Caffe2中的核心数据结构,所有参与矩阵运算的Op比如FCOp, ConvOp, ReluOp, PoolOp等等,它们的输入的核心就是这个Tensor。 Tensor就是一个与设备相关的多维数组,它封装了一段连续内存以及该Tensor的各个维度信息,基本上类似于Caffe1中的Blob, numpy中的ndarray.
template <class Context>
class Tensor {
protected:
vector dims_; //存储tensor维度信息
TIndex size_ = -1; //该tensor的占用内存大小,以字节计。
TypeMeta meta_; // 由于data是指向任意类型的指针,所以需要它来指定指向空间的类型
std::shared_ptr<void> data_; //所有实际内存读写操作的就是该智能指针管理的部分。
...
public:
Tensor() {} //初始化一个空的tensor
//创建一个指定各个维度的tensor,Resize并不会分配空间,这里采取了一种延迟分配空间技术,Caffe1其实也采取了这种技术,真正分配空间会等到第一次调用mutable_data时候。
explicit Tensor(const vector<int>& dims) { Resize(dims); }
//返回指定类型的内存首地址,比如data,data, data,该函数会确保真正的内存已经分配,它的调用规则就如同Caffe1中Blob的cpu_data或者gpu_data,当不需要修改数据时,调用它。
template <typename T>
inline const T* data() const {
...//做一些必要的类型检查
return static_cast(data_.get());
}
//类似Caffe1中的Blob的mutable_cpu_data,mutable_gpu_data.
template <typename T>
inline T* mutable_data() {
if ((size_ == 0 || data_.get()) && IsType()) {
return static_cast(data_.get());
}
return static_cast(raw_mutable_data(TypeMeta::Make()));
}
//真正涉及到销毁和分配新空间的函数就是这个raw_mutable_data
inline void* raw_mutable_data(const TypeMeta& meta) {
// For 0-size tensors it's fine to return any pointer (including nullptr)
if (meta_ == meta && (data_.get() || size_ == 0)) {
return data_.get();
} else {
bool had_special_dtor = meta_.dtor() != nullptr;
meta_ = meta;
CAFFE_ENFORCE_WITH_CALLER(
size_ >= 0,
"Tensor is not initialized. You probably need to call Resize() "
"before calling mutable_data()");
// We can reuse the existing buffer if the current data does not have
// a special destructor and the new data doesn't have a special
// constructor.
if (size_ == 0 ||
(meta.ctor() == nullptr && !had_special_dtor &&
capacity_ >= size_ * meta_.itemsize())) {
return data_.get();
}
if (meta.ctor()) { //一般这个if不会执行,因为Tensor主要是用来做数值计算,meta都是基本的数据类型,比如float,int,double等,如果是分配指向一些具有特别的构造和析构函数的对象,就要调用它。
// For types that need placement new, we will call it, as well as
// making sure that when the data is freed, it calls the right
// destruction procedure.
auto size = size_;
auto dtor = meta_.dtor();
auto ptr_and_deleter = Context::New(size_ * meta_.itemsize());//这里就是调用DefaultAllocator的New或者CUDAAllocator的New,返回分配空间的首地址,以及如何销毁这段空间的函数指针。
auto deleter = ptr_and_deleter.second; //释放空间的指针
data_.reset(//这里的功能类似一个类的对象在析构时候的行为,析构时先调用析构函数做一些清理工作,然后在free或者delete掉该对象占用的内存空间。
ptr_and_deleter.first, [size, dtor, deleter](void* ptr) -> void {
dtor(ptr, size); //调用析构函数
deleter(ptr); //释放空间free或者_aligned_free
});//销毁之前的内存,开辟新的空间
meta_.ctor()(data_.get(), size_);
} else { //基本的数值计算就执行这条分支了。
// For fundamental type, new and delete is easier.
auto ptr_and_deleter = Context::New(size_ * meta_.itemsize());
data_.reset(ptr_and_deleter.first, ptr_and_deleter.second);
}
capacity_ = size_ * meta_.itemsize();
return data_.get();
}
}
};
Caffe2代码注释对这个Blob的介绍非常清晰。
A Blob hosts a pointer as well as its type, and takes charge of deleting it properly when the blob is deallocated or re-allocated with a new type. A blob could contain anything, although the most common case is to contain a Tensor. —摘自caffe2/core/blob.h
class Blob {
public:
Blob() : meta_(), pointer_(nullptr) {}
~Blob() { Reset(); }
/**
* @brief Gets the const reference of the stored object. The code checks if
* the stored object is of the desired type.
*/
template <class T>
const T& Get() const {
CAFFE_ENFORCE(IsType(),
"wrong type for the Blob instance. Blob contains ",
meta_.name(), " while caller expects ", TypeMeta::Name());
return *static_cast<const T*>(pointer_);
}
/**
* @brief Gets a mutable pointer to the stored object.
*
* If the current object is not of the right type, a new object is created
* and the old object is freed. Note that type T should have a default
* constructor. Otherwise, create the object yourself first, and use
* Reset().
*/
template <class T>
T* GetMutable(bool* is_new_object=nullptr) {
if (IsType()) {
if (is_new_object) *is_new_object = false;
return static_cast(pointer_);
} else {
if (is_new_object) *is_new_object = true;
VLOG(1) << "Create new mutable object " << TypeMeta::Name();
return Reset(new T());
}
}
/**
* Sets the underlying object to the allocated one. The Blob then takes over
* the ownership of the passed in pointer. If there is already an object in
* the Blob, the old object is freed.
*
* This is used when the underlying class T does not have a default ctor, or
* complex initializations needs to be done outside the blob.
*/
template <class T>
T* Reset(T* allocated) {
if (pointer_ && destroy_) {
destroy_(pointer_);
}
meta_ = TypeMeta::Make();
pointer_ = static_cast<void*>(allocated);
destroy_ = &Destroy;
return allocated;
}
...
private:
/**
* @brief A destroy call that is used to properly deconstruct objects.
*/
template <class T>
static void Destroy(void* pointer) {
delete static_cast(pointer);
}
typedef void (*DestroyCall)(void *);
TypeMeta meta_;
void* pointer_ = nullptr;
DestroyCall destroy_ = nullptr;
DISABLE_COPY_AND_ASSIGN(Blob);
};
这个Blob实现较为简单,基本上就是做了一层装,以及加入一些能够序列化合反序列化的操作。
下面摘自Caffe2/binaries/tutorial_blob.cc中关于这个使用Blob的一段代码,它展示了同一个blob对象可包含int,float, double甚至是string对象。:
Blob myblob;
int* myint = myblob.GetMutable<int>();
*myint = 10;
const int& myint_const = myblob.Get<int>();
// const float& myfloat = myblob.Get();//wrong!抛出异常,类型不匹配。
double* mydouble = myblob.GetMutable<double>(); //释放myint 4字节空间,分配mydouble 8字节空间
*mydouble = 3.14;
std::string* pvec = new std::string();
myblob.Reset(pvec); // no need to release pvec, myblob takes ownership.
Workspace is a class that holds all the related objects created during runtime: (1) all blobs, and (2) all instantiated networks. It is the owner of all these objects and deals with the scaffolding logistics.
workspace就是Caffe2中几乎所有Blob和Net所在的,一般地,Blob的申请只能通过它来完成。每一段内存都有的”键“进行实名制管理。工作区之间的内存是隔离的,所有的Operator的构造函数都需要一个Workspace的指针,通常情况下只有一个workspace,这就意味着所有的Operator的构造函数中传入的workspace指针是同一个。一个Operator的输入输出都存储在workspace中,这为内存优化提供了便利。
Workspace中 核心数据成员:
std::map<string, unique_ptr > blob_map_;
std::map<string, unique_ptr > net_blob_;
当创建一个新的Blob时,需要提供一个该Blob的名字。CreateBlob成员函数如下:
Blob* Workspace::CreateBlob(const string& name) {
if (HasBlob(name)) {
VLOG(1) << "Blob " << name << " already exists. Skipping.";
} else if (forwarded_blobs_.count(name)) {
// possible if parent workspace deletes forwarded blob
VLOG(1) << "Blob " << name << " is already forwarded from parent workspace "
<< "(blob " << forwarded_blobs_[name].second << "). Skipping.";
} else {
VLOG(1) << "Creating blob " << name;
blob_map_[name] = unique_ptr(new Blob());
}
return GetBlob(name);//如果是新创建的Blob,则返回之前new Blob()那句话,否则就从blob_map中直接返回。
}
下面是Operator中从创建Blob到实际分配空间的流程:
以上就是Caffe2分配Blob所涉及到的流程。