配套视频
可参考:gpu与cuda概论
grid -> block -> thread
feats:(N,8,F)
N表示有N个正方体,8表示8个特征点,F表示每个特征点的表示方式,F应该等于3吧。
points:(N,3)
N表示有N个正方体,3表示坐标
每个点的内插都是独立的,每个点对于每个特征点来说,也是独立的,因此可在这两个方向上并行。
.cu
文件既然要用到 cuda 那么,就要 .cu(cuda) 文件,这里并没有并行。而是做一个调用的demo。
#include
#define CHECK_CUDA(x) TORCH_CHECK(x.is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
torch::Tensor trilinear_fw_cu(
torch::Tensor feats,
torch::Tensor points
){
return feats;
}
新建头文件 utils.h
,位于./include
,定义 CUDA
有哪些函数
#include
#define CHECK_CUDA(x) TORCH_CHECK(x.is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
torch::Tensor trilinear_fw_cu(
torch::Tensor feats,
torch::Tensor points
);
新建 C++桥梁
#include "utils.h"
torch::Tensor trilinear_interpolation(
torch::Tensor feats,
torch::Tensor points
){
CHECK_INPUT(feats);
CHECK_INPUT(points);
// GPU函数
return trilinear_fw_cu(feats, points);
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m){
m.def("trilinear", &trilinear_interpolation,"test");
}
和之前的c++程序相比,有以下变化
glop
工具include_dirs
CppExtension
改为 CUDAExtension
如果安装失败,就去 site-packages
里面手动删除
import glob
import os.path as osp
from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension
ROOT_DIR = osp.dirname(osp.abspath(__file__))
include_dirs = [osp.join(ROOT_DIR, "include")]
sources = glob.glob('*.cpp')+glob.glob('*.cu')
setup(
name='cppcuda_tutorial',
version='1.0',
author='kwea123',
author_email='[email protected]',
description='cppcuda_tutorial',
long_description='cppcuda_tutorial',
ext_modules=[
CUDAExtension(
name='cppcuda_tutorial',
sources=sources,
include_dirs=include_dirs,
extra_compile_args={'cxx': ['-O2'],
'nvcc': ['-O2']}
)
],
cmdclass={
'build_ext': BuildExtension
}
)
当运行时:
import torch
#from torch.utils.cpp_extension import load
#cppcuda = load(name="test", sources=['interpolation.cpp'], verbose=False,extra_cflags=["-O2"])
import cppcuda_tutorial
feats = torch.ones(2)
point = torch.ones(2)
out = cppcuda_tutorial.trilinear(feats, point)
print(out)
报错:
dell/pytorch_c++_cuda/example_2/test.py
Traceback (most recent call last):
File "/home/dell/pytorch_c++_cuda/example_2/test.py", line 8, in <module>
out = cppcuda_tutorial.trilinear(feats, point)
RuntimeError: feats must be a CUDA tensor
修改之后,改为:
import torch
#from torch.utils.cpp_extension import load
#cppcuda = load(name="test", sources=['interpolation.cpp'], verbose=False,extra_cflags=["-O2"])
import cppcuda_tutorial
feats = torch.ones(2,device="cuda")
point = torch.ones(2,device="cuda")
out = cppcuda_tutorial.trilinear(feats, point)
print(out)
运行成功!