MXNet中依赖库介绍及简单使用

MXNet是一种开源的深度学习框架,核心代码是由C++实现,在编译源码的过程中,它需要依赖其它几种开源库,这里对MXNet依赖的开源库进行简单的说明:

1. OpenBLAS:全称为Open Basic Linear Algebra Subprograms,是开源的基本线性代数子程序库,是一个优化的高性能多核BLAS库,主要包括矩阵与矩阵、矩阵与向量、向量与向量等操作。它的License是BSD-3-Clause,可以商用,目前最新的发布版本是0.3.3。它的源码放在GitHub上,由张先轶老师等持续维护。

OpenBLAS是由中科院软件所并行软件与计算科学实验室发起的基于GotoBLAS2 1.13 BSD版的开源BLAS库高性能实现。

BLAS是一个应用程序接口(API)标准,用以规范发布基础线性代数操作的数值库(如矢量或矩阵乘法)。该程序集最初发布于1979年,并用于建立更大的数值程序包(如LAPACK)。在高性能计算领域,BLAS被广泛使用。

测试代码如下(openblas_test.cpp):

#include "openblas_test.hpp"
#include 
#include 

int test_openblas_1()
{
	int th_model = openblas_get_parallel();
	switch (th_model) {
	case OPENBLAS_SEQUENTIAL:
		printf("OpenBLAS is compiled sequentially.\n");
		break;
	case OPENBLAS_THREAD:
		printf("OpenBLAS is compiled using the normal threading model\n");
		break;
	case OPENBLAS_OPENMP:
		printf("OpenBLAS is compiled using OpenMP\n");
		break;
	}

	int n = 2;
	double* x = (double*)malloc(n*sizeof(double));
	double* upperTriangleResult = (double*)malloc(n*(n + 1)*sizeof(double) / 2);

	for (int j = 0; j

执行结果如下:

MXNet中依赖库介绍及简单使用_第1张图片

2. DLPack:仅有一个头文件dlpack.h。DLPack是一种开放的内存张量(tensor)结构,用于在不同框架之间共享张量,如Tensorflow, PyTorch, NXNet,不发生任何数据复制或拷贝。

dlpack.h文件中包括两个枚举类型,四个结构体:

枚举类型DLDeviceType:支持的设备类型包括CPU、CUDA GPU、OpenCL、Apple GPU、AMD GPU等。

枚举类型DLDataTypeCode:支持的数据类型包括有符号int、无符号int、float。

结构体DLContext:A Device context for Tensor and operator,数据成员包括设备类型和设备id。

结构体DLDataType:tensor支持的数据类型,数据成员包括code基本类型,值必须为DLDataTypeCode支持的;位数(bits)可以是8,16,32;类型的lanes数。

结构体DLTensor:tensor对象,不管理内存。数据成员包括数据指针(void*)、DLContext、维数、DLDataType、tensor的shape、tensor的stride、数据开始指针的字节偏移量。

结构体DLManagedTensor:管理DLTensor内存。

3. MShadow:全称Matrix Shadow,用C++/CUDA实现的轻量级的CPU/GPU矩阵和tensor模板库。它的文件全部为.h或.cuh,使用时直接include即可。注意:如果在工程属性预处理器定义中没有加入MSHADOW_STAND_ALONE,则需要包括额外的CBLASMKLCUDA的支持。如果不依赖其它库,定义MSHADOW_STAND_ALONE,则会导致有些函数没有实现,如dot_engine-inl.h中,函数体中会包括语句:LOG(FATAL) << “Not implemented!”;

这里为了测试不开启MSHADOW_STAND_ALONE宏,仅开启MSHADOW_USE_CBLAS宏。

测试代码如下(mshadow_test.cpp):

#include "mshadow_test.hpp"
#include 
#include 
#include "mshadow/tensor.h"

// reference: mshadow source code: mshadow/guide

int test_mshadow_1()
{
	// intialize tensor engine before using tensor operation
	mshadow::InitTensorEngine();

	// assume we have a float space
	float data[20];
	// create a 2 x 5 x 2 tensor, from existing space
	mshadow::Tensor ts(data, mshadow::Shape3(2, 5, 2));
	// take first subscript of the tensor
	mshadow::Tensor mat = ts[0];
	// Tensor object is only a handle, assignment means they have same data content
	// we can specify content type of a Tensor, if not specified, it is float bydefault
	mshadow::Tensor mat2 = mat;
	mat = mshadow::Tensor(data, mshadow::Shape1(10)).FlatTo2D();

	// shaape of matrix, note size order is same as numpy
	fprintf(stdout, "%u X %u matrix\n", mat.size(0), mat.size(1));

	// initialize all element to zero
	mat = 0.0f;
	// assign some values
	mat[0][1] = 1.0f; mat[1][0] = 2.0f;
	// elementwise operations
	mat += (mat + 10.0f) / 10.0f + 2.0f;

	// print out matrix, note: mat2 and mat1 are handles(pointers)
	for (mshadow::index_t i = 0; i < mat.size(0); ++i) {
		for (mshadow::index_t j = 0; j < mat.size(1); ++j) {
			fprintf(stdout, "%.2f ", mat2[i][j]);
		}
		fprintf(stdout, "\n");
	}

	mshadow::TensorContainer lhs(mshadow::Shape2(2, 3)), rhs(mshadow::Shape2(2, 3)), ret(mshadow::Shape2(2, 2));
	lhs = 1.0;
	rhs = 1.0;
	ret = mshadow::expr::implicit_dot(lhs, rhs.T());
	mshadow::VectorDot(ret[0].Slice(0, 1), lhs[0], rhs[0]);
	fprintf(stdout, "vdot=%f\n", ret[0][0]);
	int cnt = 0;
	for (mshadow::index_t i = 0; i < ret.size(0); ++i) {
		for (mshadow::index_t j = 0; j < ret.size(1); ++j) {
			fprintf(stdout, "%.2f ", ret[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n");

	for (mshadow::index_t i = 0; i < lhs.size(0); ++i) {
		for (mshadow::index_t j = 0; j < lhs.size(1); ++j) {
			lhs[i][j] = cnt++;
			fprintf(stdout, "%.2f ", lhs[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n");

	mshadow::TensorContainer index(mshadow::Shape1(2)), choosed(mshadow::Shape1(2));
	index[0] = 1; index[1] = 2;
	choosed = mshadow::expr::mat_choose_row_element(lhs, index);
	for (mshadow::index_t i = 0; i < choosed.size(0); ++i) {
		fprintf(stdout, "%.2f ", choosed[i]);
	}
	fprintf(stdout, "\n");

	mshadow::TensorContainer recover_lhs(mshadow::Shape2(2, 3)), small_mat(mshadow::Shape2(2, 3));
	small_mat = -100.0f;
	recover_lhs = mshadow::expr::mat_fill_row_element(small_mat, choosed, index);
	for (mshadow::index_t i = 0; i < recover_lhs.size(0); ++i) {
		for (mshadow::index_t j = 0; j < recover_lhs.size(1); ++j) {
			fprintf(stdout, "%.2f ", recover_lhs[i][j] - lhs[i][j]);
		}
	}
	fprintf(stdout, "\n");

	rhs = mshadow::expr::one_hot_encode(index, 3);

	for (mshadow::index_t i = 0; i < lhs.size(0); ++i) {
		for (mshadow::index_t j = 0; j < lhs.size(1); ++j) {
			fprintf(stdout, "%.2f ", rhs[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n");
	mshadow::TensorContainer idx(mshadow::Shape1(3));
	idx[0] = 8;
	idx[1] = 0;
	idx[2] = 1;

	mshadow::TensorContainer weight(mshadow::Shape2(10, 5));
	mshadow::TensorContainer embed(mshadow::Shape2(3, 5));

	for (mshadow::index_t i = 0; i < weight.size(0); ++i) {
		for (mshadow::index_t j = 0; j < weight.size(1); ++j) {
			weight[i][j] = i;
		}
	}
	embed = mshadow::expr::take(idx, weight);
	for (mshadow::index_t i = 0; i < embed.size(0); ++i) {
		for (mshadow::index_t j = 0; j < embed.size(1); ++j) {
			fprintf(stdout, "%.2f ", embed[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n\n");
	weight = mshadow::expr::take_grad(idx, embed, 10);
	for (mshadow::index_t i = 0; i < weight.size(0); ++i) {
		for (mshadow::index_t j = 0; j < weight.size(1); ++j) {
			fprintf(stdout, "%.2f ", weight[i][j]);
		}
		fprintf(stdout, "\n");
	}

	fprintf(stdout, "upsampling\n");

#ifdef small
#undef small
#endif

	mshadow::TensorContainer small(mshadow::Shape2(2, 2));
	small[0][0] = 1.0f;
	small[0][1] = 2.0f;
	small[1][0] = 3.0f;
	small[1][1] = 4.0f;
	mshadow::TensorContainer large(mshadow::Shape2(6, 6));
	large = mshadow::expr::upsampling_nearest(small, 3);
	for (mshadow::index_t i = 0; i < large.size(0); ++i) {
		for (mshadow::index_t j = 0; j < large.size(1); ++j) {
			fprintf(stdout, "%.2f ", large[i][j]);
		}
		fprintf(stdout, "\n");
	}
	small = mshadow::expr::pool(large, small.shape_, 3, 3, 3, 3);
	// shutdown tensor enigne after usage
	for (mshadow::index_t i = 0; i < small.size(0); ++i) {
		for (mshadow::index_t j = 0; j < small.size(1); ++j) {
			fprintf(stdout, "%.2f ", small[i][j]);
		}
		fprintf(stdout, "\n");
	}

	fprintf(stdout, "mask\n");
	mshadow::TensorContainer mask_data(mshadow::Shape2(6, 8));
	mshadow::TensorContainer mask_out(mshadow::Shape2(6, 8));
	mshadow::TensorContainer mask_src(mshadow::Shape1(6));

	mask_data = 1.0f;
	for (int i = 0; i < 6; ++i) {
		mask_src[i] = static_cast(i);
	}
	mask_out = mshadow::expr::mask(mask_src, mask_data);
	for (mshadow::index_t i = 0; i < mask_out.size(0); ++i) {
		for (mshadow::index_t j = 0; j < mask_out.size(1); ++j) {
			fprintf(stdout, "%.2f ", mask_out[i][j]);
		}
		fprintf(stdout, "\n");
	}

	mshadow::ShutdownTensorEngine();

	return 0;
}

// user defined unary operator addone
struct addone {
	// map can be template function
	template
	MSHADOW_XINLINE static DType Map(DType a) {
		return  a + static_cast(1);
	}
};
// user defined binary operator max of two
struct maxoftwo {
	// map can also be normal functions,
	// however, this can only be applied to float tensor
	MSHADOW_XINLINE static float Map(float a, float b) {
		if (a > b) return a;
		else return b;
	}
};

int test_mshadow_2()
{
	// intialize tensor engine before using tensor operation, needed for CuBLAS
	mshadow::InitTensorEngine();
	// take first subscript of the tensor
	mshadow::Stream *stream_ = mshadow::NewStream(0);
	mshadow::Tensor mat = mshadow::NewTensor(mshadow::Shape2(2, 3), 0.0f, stream_);
	mshadow::Tensor mat2 = mshadow::NewTensor(mshadow::Shape2(2, 3), 0.0f, stream_);

	mat[0][0] = -2.0f;
	mat = mshadow::expr::F(mshadow::expr::F(mat) + 0.5f, mat2);

	for (mshadow::index_t i = 0; i < mat.size(0); ++i) {
		for (mshadow::index_t j = 0; j < mat.size(1); ++j) {
			fprintf(stdout, "%.2f ", mat[i][j]);
		}
		fprintf(stdout, "\n");
	}

	mshadow::FreeSpace(&mat); mshadow::FreeSpace(&mat2);
	mshadow::DeleteStream(stream_);
	// shutdown tensor enigne after usage
	mshadow::ShutdownTensorEngine();

	return 0;
}

其中test_mshadow_2的执行结果如下:

MXNet中依赖库介绍及简单使用_第2张图片

4. DMLC-Core:全称Distributed Machine Learning Common Codebase,它是支持所有DMLC项目的基础模块,用于构建高效且可扩展的分布式机器学习通用库。

测试代码如下(dmlc_test.cpp):

#include "dmlc_test.hpp"
#include 
#include 
#include 
#include 
#include 

// reference: dmlc-core/example and dmlc-core/test

struct MyParam : public dmlc::Parameter {
	float learning_rate;
	int num_hidden;
	int activation;
	std::string name;
	// declare parameters in header file
	DMLC_DECLARE_PARAMETER(MyParam) {
		DMLC_DECLARE_FIELD(num_hidden).set_range(0, 1000)
			.describe("Number of hidden unit in the fully connected layer.");
		DMLC_DECLARE_FIELD(learning_rate).set_default(0.01f)
			.describe("Learning rate of SGD optimization.");
		DMLC_DECLARE_FIELD(activation).add_enum("relu", 1).add_enum("sigmoid", 2)
			.describe("Activation function type.");
		DMLC_DECLARE_FIELD(name).set_default("mnet")
			.describe("Name of the net.");

		// user can also set nhidden besides num_hidden
		DMLC_DECLARE_ALIAS(num_hidden, nhidden);
		DMLC_DECLARE_ALIAS(activation, act);
	}
};

// register it in cc file
DMLC_REGISTER_PARAMETER(MyParam);

int test_dmlc_parameter()
{
	int argc = 4;
	char* argv[4] = {
#ifdef _DEBUG
		"E:/GitCode/MXNet_Test/lib/dbg/x64/ThirdPartyLibrary_Test.exe",
#else
		"E:/GitCode/MXNet_Test/lib/rel/x64/ThirdPartyLibrary_Test.exe",
#endif
		"num_hidden=100",
		"name=aaa",
		"activation=relu"
	};

	MyParam param;
	std::map kwargs;
	for (int i = 0; i < argc; ++i) {
		char name[256], val[256];
		if (sscanf(argv[i], "%[^=]=%[^\n]", name, val) == 2) {
			kwargs[name] = val;
		}
	}
	fprintf(stdout, "Docstring\n---------\n%s", MyParam::__DOC__().c_str());

	fprintf(stdout, "start to set parameters ...\n");
	param.Init(kwargs);
	fprintf(stdout, "-----\n");
	fprintf(stdout, "param.num_hidden=%d\n", param.num_hidden);
	fprintf(stdout, "param.learning_rate=%f\n", param.learning_rate);
	fprintf(stdout, "param.name=%s\n", param.name.c_str());
	fprintf(stdout, "param.activation=%d\n", param.activation);

	return 0;
}

namespace tree {
	struct Tree {
		virtual void Print() = 0;
		virtual ~Tree() {}
	};

	struct BinaryTree : public Tree {
		virtual void Print() {
			printf("I am binary tree\n");
		}
	};

	struct AVLTree : public Tree {
		virtual void Print() {
			printf("I am AVL tree\n");
		}
	};
	// registry to get the trees
	struct TreeFactory
		: public dmlc::FunctionRegEntryBase > {
	};

#define REGISTER_TREE(Name)                                             \
  DMLC_REGISTRY_REGISTER(::tree::TreeFactory, TreeFactory, Name)        \
  .set_body([]() { return new Name(); } )

	DMLC_REGISTRY_FILE_TAG(my_tree);

}  // namespace tree

// usually this sits on a seperate file
namespace dmlc {
	DMLC_REGISTRY_ENABLE(tree::TreeFactory);
}

namespace tree {
	// Register the trees, can be in seperate files
	REGISTER_TREE(BinaryTree)
		.describe("This is a binary tree.");

	REGISTER_TREE(AVLTree);

	DMLC_REGISTRY_LINK_TAG(my_tree);
}

int test_dmlc_registry()
{
	// construct a binary tree
	tree::Tree *binary = dmlc::Registry::Find("BinaryTree")->body();
	binary->Print();
	// construct a binary tree
	tree::Tree *avl = dmlc::Registry::Find("AVLTree")->body();
	avl->Print();

	delete binary;
	delete avl;

	return 0;
}

其中test_dmlc_parameter的执行结果如下:

MXNet中依赖库介绍及简单使用_第3张图片

5. TVM:深度学习系统的编译器堆栈(compiler stack)。它旨在缩小深度学习框架与以性能、效率为重点的硬件后端之间的差距。它与深度学习框架协同工作,为不同的后端提供端到端的编译。TVM除了依赖dlpack、dmlc-core外,还依赖HalideIR。而且编译TVM时,一大堆C2440、C2664错误,即无法从一种类型转换为另一种类型的错误。因为在编译MXNet源码时,目前MXNet仅需要tvm源码nnvm/src下的c_api, core, pass三个目录的文件参与编译,因此后面再调试TVM库。

6. OpenCV:可选的,编译过程可参考: https://blog.csdn.net/fengbingchun/article/details/84030309  

7. CUDA和cudnn:可选的,编译过程可参考:https://blog.csdn.net/fengbingchun/article/details/53892997

GitHub:  https://github.com/fengbingchun/MXNet_Test

你可能感兴趣的:(MXNet)