本文系转载出处:http://wiki.jikexueyuan.com/project/tensorflow-zh/how_tos/adding_an_op.html
Tensorflow系统自带的op在https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops目录下,常见的数学运算op见math_ops.cc.
但如果现有的库没有涵盖你想要的操作, 你可以自己定制一个. 为了使定制的 Op 能够兼容原有的库 , 你必须做以下工作:
首先安装Tensorflow,这里命令pip install tensorflow
安装, 或按照源码安装教程编译安装最新的Tensorflow皆可。自定义Op需要在tensorflow
源码修改,首先向 TensorFlow
系统注册来定义 Op
的接口. 在注册时, 指定 Op
的名称, 它的输入(类型和名称) 和输出(类型和名称), 和所需要任何 属性的文档说明.
为了让你有直观的认识, 创建一个简单的 Op
作为例子. 该Op
接受两个 int32
类型tensor
作为 输入, 输出这两个 tensor
的自定义
, 自定义和
与普通求和
唯一的区别在于第一个元素被置为 0
. 创建 文件 tensorflow/core/user_ops/my_add.cc
, 并调用 REGISTER_OP
宏来定义 Op 的接口.
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"
using namespace tensorflow;
REGISTER_OP("MyAdd")
.Input("x: int32")
.Input("y: int32")
.Output("z: int32")
.SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
c->set_output(0, c->input(0));
c->set_output(0, c->input(1));
return Status::OK();
});
#include "tensorflow/core/framework/op_kernel.h"
using namespace tensorflow;
class MyAddOp : public OpKernel {
public:
explicit MyAddOp(OpKernelConstruction* context) : OpKernel(context) {}
void Compute(OpKernelContext* context) override {
// Grab the input tensor
const Tensor& a = context->input(0);
const Tensor& b = context->input(1);
auto A = a.flat();
auto B = b.flat();
// Create an output tensor
Tensor* output_tensor = NULL;
OP_REQUIRES_OK(context, context->allocate_output(0, a.shape(),
&output_tensor));
auto output_flat = output_tensor->flat();
// Set all but the first element of the output tensor to 0.
const int N = A.size();
for (int i = 1; i < N; i++) {
output_flat(i) = A(i)+B(i);
}
output_flat(0) = 0;
}
};
REGISTER_KERNEL_BUILDER(Name("MyAdd").Device(DEVICE_CPU), MyAddOp);
然后在tensorflow/core/user_ops/
目录下执行下面命令编译.so
文件:
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared my_add.cc -o my_add.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2
如果主机GCC
版本是5.X
,请把最后一句编译命令改为:
g++ -std=c++11 -shared my_add.cc -o my_add.so -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2
其他类型编译命令,出自于point-net tf_op:
/usr/local/cuda-8.0/bin/nvcc tf_grouping_g.cu -o tf_grouping_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 tf_grouping.cpp tf_grouping_g.cu.o -o tf_grouping_so.so -shared -fPIC -I /usr/local/lib/python2.7/dist-packages/tensorflow/include -I /usr/local/cuda-8.0/include -I /usr/local/lib/python2.7/dist-packages/tensorflow/include/external/nsync/public -lcudart -L /usr/local/cuda-8.0/lib64/ -L/usr/local/lib/python2.7/dist-packages/tensorflow -ltensorflow_framework -O2 -D_GLIBCXX_USE_CXX11_ABI=0
Python 3.5下编译脚本:
#/bin/bash
CUDA_ROOT=/usr/local/cuda-9.2
TF_ROOT=/home/user/.local/lib/python3.5/site-packages/tensorflow
/usr/local/cuda-9.2/bin/nvcc -std=c++11 -c -o tf_sampling_g.cu.o tf_sampling_g.cu -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
#TF 1.8
g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I ${TF_ROOT}/include -I ${CUDA_ROOT}/include -I ${TF_ROOT}/include/external/nsync/public -lcudart -L ${CUDA_ROOT}/lib64/ -L ${TF_ROOT} -ltensorflow_framework -O2 #-D_GLIBCXX_USE_CXX11_ABI=0
bazel build -c opt //tensorflow/core/user_ops:my_add.so
如果主机GCC
版本是5.X
,请把编译命令改为:
bazel build -c opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/core/user_ops:my_add.so
此时产生的my_add.so
在tensorflow/bazel-bin/tensorflow/core/user_ops
目录下。
接下来就可以在需要时使用该 Op
.
import tensorflow as tf
so_file = 'your_add_so_file_path/my_add.so'
class MyAddTest(tf.test.TestCase):
def testMyAdd(self):
my_add_module = tf.load_op_library(so_file)
with self.test_session():
result = my_add_module.my_add([5, 4, 3, 2, 1],[1, 2, 3, 4, 5])
self.assertAllEqual(result.eval(), [0, 6, 6, 6, 6])
if __name__ == "__main__":
#tf.test.main()
my_add_module = tf.load_op_library(so_file)
out = my_add_module.my_add([5, 4, 3, 2, 1],[1, 2, 3, 4, 5])
sess = tf.Session()
result = sess.run(out)
print(result)
#output [0, 6, 6, 6, 6]
参考:
添加一个新的op官方文档:https://www.tensorflow.org/extend/adding_an_op
cmake版本:https://github.com/cgtuebingen/tf_custom_op
bazel版本:https://github.com/qixuxiang/bazel_tf_op
A few notes on using the Tensorflow C++ API:https://jacobgil.github.io/deeplearning/tensorflow-cpp
loading-a-tensorflow-graph-with-the-c-api:https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f
cmake无须tensorflow源码版本:https://github.com/PatWie/tensorflow_inference/tree/master/custom_op
Writing a new Tensorflow operation: https://medium.com/@taxfromdk/writing-a-new-tensorflow-operation-including-c-cuda-forward-gradient-and-grad-check-3c46708351e7