Caffe2核心代码解析系列之四:TypeMeta

介绍

TypeMeta是描述Tensor或Blob等所抽象的数据类型的一种抽象。简言之,它主要用来表示某种类型如T的一些特征像这种类型在整个类型系统里面的Id,它的单个元素的大小ItemSize(类似于sizeof(T),其实正是它的一种wrapper),还有就是此类型T的构造、拷贝及析构函数等。

下面我们先去直接宏观上看TypeMeta的类表示,然后再分别去看它的某些细节实现像Id的实现等。

TypeMeta

本来与TypeMeta相关的代码是放在Caffe2下面的这个地方的。当然现在这地方仍有这么两个文件存在的,不过它们只是个中转,再也不含有实际内容了。

caffe2]$ ls core/typeid
typeid.h        typeid_test.cc

在Caffe2与Pytorch融合的大趋势下,TypeMeta相关的代码现在已经被放在了Aten的目录下面,中间更是实用了大量Aten中已有的内容,像Tensor的改变那样。

ATen]$ ls core/typeid.
typeid.cpp  typeid.h

如下为TypeMeta的一个基本定义及简要注释说明。可以看出它主要提供一种类型T注册的Id,相关的Ctor/Copy/Destructor函数及表示基本元素大小的ItemSize等。

/**
 * TypeMeta is a thin class that allows us to store the type of a container such
 * as a blob, or the data type of a tensor, with a unique run-time id. It also
 * stores some additional data such as the item size and the name of the type
 * for run-time inspection.
 */
class AT_CORE_API TypeMeta {
 public:
  using PlacementNew = void(void*, size_t);
  using TypedCopy = void(const void*, void*, size_t);
  using TypedDestructor = void(void*, size_t);
  /** Create a dummy TypeMeta object. To create a TypeMeta object for a specific
   * type, use TypeMeta::Make().
   */
  TypeMeta() noexcept
      : id_(TypeIdentifier::uninitialized()),
        itemsize_(0),
        ctor_(nullptr),
        copy_(nullptr),
        dtor_(nullptr) {}

我们在定义一个类型T的TypeMeta对象时需要使用Make函数来实现,而非直接调用其构造函数,这亦是为了保证每个类型T拥有的TypeMeta对象的唯一性。

 private:
  // TypeMeta can only be created by Make, making sure that we do not
  // create incorrectly mixed up TypeMeta objects.
  TypeMeta(
      TypeIdentifier i,
      size_t s,
      PlacementNew* ctor,
      TypedCopy* copy,
      TypedDestructor* dtor) noexcept
      : id_(i), itemsize_(s), ctor_(ctor), copy_(copy), dtor_(dtor) {}

除了上面说的,还提供了此类型T所具有的name,可借RTTI等功能使用。比较有意思的是一些static template member functions的使用。使得此类可提供许多基本的静态特定类型功能函数。当然这些函数里都不会用到类非静态成员。

 /**
   * Returns the unique id for the given type T. The id is unique for the type T
   * in the sense that for any two different types, their id are different; for
   * the same type T, the id remains the same over different calls of the
   * function. However, this is not guaranteed over different runs, as the id
   * is generated during run-time. Do NOT serialize the id for storage.
   */
  template 
  AT_CORE_API static TypeIdentifier Id();

  /**
   * Returns the item size of the type. This is equivalent to sizeof(T).
   */
  template 
  static size_t ItemSize() {
    return sizeof(T);
  }

  /**
   * Returns the registered printable name of the type.
   *
   * Works for only the ones registered with CAFFE_KNOWN_TYPE
   */
  template 
  static const char* TypeName() {
    auto it = gTypeNames().find(Id());
    assert(it != gTypeNames().end());
    return it->second.c_str();
  }

  /**
   * Placement new function for the type.
   */
  template 
  static void _Ctor(void* ptr, size_t n) {
    T* typed_ptr = static_cast(ptr);
    for (size_t i = 0; i < n; ++i) {
      new (typed_ptr + i) T;
    }
  }

  template 
  static void _CtorNotDefault(void* /*ptr*/, size_t /*n*/) {
    _ThrowRuntimeTypeLogicError(
        "Type " + std::string(at::demangle_type()) +
        " is not default-constructible.");
  }

以下为TypeMeta的一些私有成员变量。在上面的介绍中已经知道其意了。


 private:
  TypeIdentifier id_;
  size_t itemsize_;
  PlacementNew* ctor_;
  TypedCopy* copy_;
  TypedDestructor* dtor_;
};

以下两个则可用来去判断两个Tensor或Blob是否具有相同的存储元素类型(需知C++是门强类型语言啊!)

inline bool operator==(const TypeMeta& lhs, const TypeMeta& rhs) noexcept {
  return (lhs.id_ == rhs.id_);
}
inline bool operator!=(const TypeMeta& lhs, const TypeMeta& rhs) noexcept {
  return !operator==(lhs, rhs);
}

在Caffe2中,我们可直接使用CAFFE_KNOW_TYPE(T)这个宏语句来注册我们自己的类型T,这是一种类型安全的做法。下面则是此种宏的实现。

#ifdef _MSC_VER
#define CAFFE_KNOWN_TYPE(T)                                               \
  template <>                                                             \
  AT_CORE_EXPORT TypeIdentifier TypeMeta::Id() {                       \
    static const TypeIdentifier type_id = TypeIdentifier::createTypeId(); \
    static TypeNameRegisterer registerer(type_id, #T);                 \
    return type_id;                                                       \
  }
#else // _MSC_VER
#define CAFFE_KNOWN_TYPE(T)                                               \
  template <>                                                             \
  TypeIdentifier TypeMeta::Id() {                                      \
    static const TypeIdentifier type_id = TypeIdentifier::createTypeId(); \
    static TypeNameRegisterer registerer(type_id, #T);                 \
    return type_id;                                                       \
  }
#endif

下面则是一种直接将某些类型设为Preallocated id的做法。当然这里的一些类型都是需要使用的已经知道的基本类型。


/**
 * CAFFE_DECLARE_KNOWN_TYPE and CAFFE_DEFINE_KNOWN_TYPE are used
 * to preallocate ids for types that are queried very often so that they
 * can be resolved at compile time. Please use CAFFE_KNOWN_TYPE() instead
 * for your own types to allocate dynamic ids for them.
 */
#ifdef _MSC_VER
#define CAFFE_DECLARE_KNOWN_TYPE(PreallocatedId, T)       \
  template <>                                             \
  inline AT_CORE_API TypeIdentifier TypeMeta::Id() {   \
    return TypeIdentifier(PreallocatedId);                \
  }
#else // _MSC_VER
#define CAFFE_DECLARE_KNOWN_TYPE(PreallocatedId, T) \
  template <>                                       \
  inline TypeIdentifier TypeMeta::Id() {         \
    return TypeIdentifier(PreallocatedId);          \
  }
#endif

以下这个list显然会随着pytorch的发展而逐渐迭进发展。

struct _CaffeHighestPreallocatedTypeId final {};

CAFFE_DECLARE_KNOWN_TYPE(0, uint8_t)
CAFFE_DECLARE_KNOWN_TYPE(1, int8_t)
CAFFE_DECLARE_KNOWN_TYPE(2, int16_t)
CAFFE_DECLARE_KNOWN_TYPE(3, int)
CAFFE_DECLARE_KNOWN_TYPE(4, int64_t)
CAFFE_DECLARE_KNOWN_TYPE(5, at::Half)
CAFFE_DECLARE_KNOWN_TYPE(6, float)
CAFFE_DECLARE_KNOWN_TYPE(7, double)
CAFFE_DECLARE_KNOWN_TYPE(8, at::ComplexHalf)
CAFFE_DECLARE_KNOWN_TYPE(9, std::complex)
CAFFE_DECLARE_KNOWN_TYPE(10, std::complex)
// 10 = undefined type id

CAFFE_DECLARE_KNOWN_TYPE(12, Tensor)
CAFFE_DECLARE_KNOWN_TYPE(13, std::string)
CAFFE_DECLARE_KNOWN_TYPE(14, bool)
CAFFE_DECLARE_KNOWN_TYPE(15, uint16_t)
CAFFE_DECLARE_KNOWN_TYPE(16, char)
CAFFE_DECLARE_KNOWN_TYPE(17, std::unique_ptr)
CAFFE_DECLARE_KNOWN_TYPE(18, std::unique_ptr>)
CAFFE_DECLARE_KNOWN_TYPE(19, std::vector)
CAFFE_DECLARE_KNOWN_TYPE(20, std::vector)
CAFFE_DECLARE_KNOWN_TYPE(21, std::vector)
CAFFE_DECLARE_KNOWN_TYPE(22, bool*)
CAFFE_DECLARE_KNOWN_TYPE(23, char*)
CAFFE_DECLARE_KNOWN_TYPE(24, int*)

#ifdef CAFFE2_UNIQUE_LONG_TYPEMETA
CAFFE_DECLARE_KNOWN_TYPE(25, long)
CAFFE_DECLARE_KNOWN_TYPE(26, std::vector)
#endif // CAFFE2_UNIQUE_LONG_TYPEMETA

CAFFE_DECLARE_KNOWN_TYPE(27, _CaffeHighestPreallocatedTypeId)

TypeIdentifier

在介绍TypeMeta提供的类型T的基本功能时像ItemSize/Ctor/Dtor/Copy/name等都比较易理解,只有表示说明T的具体类型时,为了保证其唯一性,作者使用了TypeIdentifier来对其做了wrapper,进而提供保证使类型T唯一的equility与hash函数。

下面为TypeIdentifier的一些介绍。Caffe2与Pytorch后端融合后,TypeIdentifier变为了Aten类型IdWrapper的一个子类,显然在这里是为了保证不在一个框架下出现重复的代码。TypeIdentifier只定义了一个operator<函数,为了使得它能被用于map/set中的元素里面。而IdWrapper里面则提供了些功能使得它可以被用在unordered_map/un_ordered_set里面。

/**
 * A type id is a unique id for a given C++ type.
 * You need to register your types using CAFFE_KNOWN_TYPE(MyType) to be able to
 * use TypeIdentifier with custom types. This is for example used to store the
 * dtype of tensors.
 */
class AT_CORE_API TypeIdentifier final : public at::IdWrapper {
 public:
  static TypeIdentifier createTypeId();

  friend std::ostream& operator<<(
      std::ostream& stream,
      TypeIdentifier typeId);
  friend bool operator<(TypeIdentifier lhs, TypeIdentifier rhs);

  // This is 8, because 0 is uint8_t (due to ScalarType BC constraint)
  static constexpr TypeIdentifier uninitialized() {
    return TypeIdentifier(8);
  }

 private:
  constexpr explicit TypeIdentifier(uint16_t id) : IdWrapper(id) {}
  friend class TypeMeta;
};

// Allow usage in std::map / std::set
// TODO Disallow this and rather use std::unordered_map/set everywhere
inline bool operator<(TypeIdentifier lhs, TypeIdentifier rhs) {
  return lhs.underlyingId() < rhs.underlyingId();
}

以下两句保证了它可被用于unordered_map/unordered_set当中。

namespace at {
using DataType = caffe2::TypeIdentifier;
}

AT_DEFINE_HASH_FOR_IDWRAPPER(caffe2::TypeIdentifier)

使用TypeIdentifier成员函数来生成唯一的IdentifierId。

TypeIdentifier TypeIdentifier::createTypeId() {
  static std::atomic counter(
      TypeMeta::Id<_CaffeHighestPreallocatedTypeId>().underlyingId());
  const TypeIdentifier::underlying_type new_value = ++counter;
  if (new_value ==
      std::numeric_limits::max()) {
    throw std::logic_error(
        "Ran out of available type ids. If you need more than 2^16 CAFFE_KNOWN_TYPEs, we need to increase TypeIdentifier to use more than 16 bit.");
  }
  return TypeIdentifier(new_value);
}

IdWrapper

IdWrapper是Aten中一个对象,用于实现不同类型T的唯一表示(一种C++的强类型安全保证)。它是一种类型T的wrapper,提供了其唯一的Id。

/**
 * This template simplifies generation of simple classes that wrap an id
 * in a typesafe way. Namely, you can use it to create a very lightweight
 * type that only offers equality comparators and hashing. Example:
 *
 *   struct MyIdType final : IdWrapper {
 *     constexpr explicit MyIdType(uint32_t id): IdWrapper(id) {}
 *   };
 *
 * Then in the global top level namespace:
 *
 *   AT_DEFINE_HASH_FOR_IDWRAPPER(MyIdType);
 *
 * That's it - equality operators and hash functions are automatically defined
 * for you, given the underlying type supports it.
 */
template 
class AT_CORE_API IdWrapper {
 public:
  using underlying_type = UnderlyingType;
  using concrete_type = ConcreteType;

 protected:
  constexpr explicit IdWrapper(underlying_type id) noexcept(
      noexcept(underlying_type(std::declval())))
      : id_(id) {}

  constexpr underlying_type underlyingId() const
      noexcept(noexcept(underlying_type(std::declval()))) {
    return id_;
  }

以下为保证类型T可用于unordered_map的函数实现。从此我们亦可知道若要使某一类的对象可用于unordered_map当中,那么我们就需要提供比较其equity及计算其Hash值的两个函数。

private:
  friend size_t hash_value(const concrete_type& v) {
    return std::hash()(v.id_);
  }

  // TODO Making operator== noexcept if underlying type is noexcept equality
  // comparable doesn't work with GCC 4.8.
  //      Fix this once we don't need GCC 4.8 anymore.
  friend constexpr bool operator==(
      const concrete_type& lhs,
      const concrete_type& rhs) {
    return lhs.id_ == rhs.id_;
  }

  // TODO Making operator!= noexcept if operator== is noexcept doesn't work with
  // GCC 4.8.
  //      Fix this once we don't need GCC 4.8 anymore.
  friend constexpr bool operator!=(
      const concrete_type& lhs,
      const concrete_type& rhs) {
    return !(lhs == rhs);
  }

最后是Aten中使用的一个utility宏。它是一个template specialization,用于返回我们定义的class类型的Id值的hash value。

#define AT_DEFINE_HASH_FOR_IDWRAPPER(ClassName) \
  namespace std {                               \
  template <>                                   \
  struct hash {                      \
    size_t operator()(ClassName x) const {      \
      return hash_value(x);                     \
    }                                           \
  };                                            \
  }

参考文献

  • https://github.com/pytorch/pytorch

你可能感兴趣的:(Caffe2核心代码解析系列之四:TypeMeta)