Pytorch+cpp_cuda extension 课程二

配套视频

1. GPU 并行概论

可参考:gpu与cuda概论
grid -> block -> thread

2. 算法并行分析

feats:(N,8,F)
N表示有N个正方体,8表示8个特征点,F表示每个特征点的表示方式,F应该等于3吧。

points:(N,3)
N表示有N个正方体,3表示坐标

每个点的内插都是独立的,每个点对于每个特征点来说,也是独立的,因此可在这两个方向上并行。

3. 算法

1. 编写 .cu 文件

既然要用到 cuda 那么,就要 .cu(cuda) 文件,这里并没有并行。而是做一个调用的demo。

#include 

#define CHECK_CUDA(x) TORCH_CHECK(x.is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)


torch::Tensor trilinear_fw_cu(
    torch::Tensor feats,
    torch::Tensor points
){
    return feats;
}

2. C++ 文件

新建头文件 utils.h,位于./include,定义 CUDA 有哪些函数

#include 

#define CHECK_CUDA(x) TORCH_CHECK(x.is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)


torch::Tensor trilinear_fw_cu(
    torch::Tensor feats,
    torch::Tensor points
);

新建 C++桥梁

#include "utils.h"

torch::Tensor trilinear_interpolation(
    torch::Tensor feats,
    torch::Tensor points
){
    CHECK_INPUT(feats);
    CHECK_INPUT(points);
	// GPU函数
    return trilinear_fw_cu(feats, points);
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m){
    m.def("trilinear", &trilinear_interpolation,"test");
}

3. setup.py

和之前的c++程序相比,有以下变化

  • 源码的输入方式,因为源码有点多,所以用了 glop 工具
  • 引入了头文件,所以加了 include_dirs
  • 因为用了CUDA ,所以将 CppExtension 改为 CUDAExtension

如果安装失败,就去 site-packages 里面手动删除

import glob
import os.path as osp
from setuptools import setup
from torch.utils.cpp_extension import CUDAExtension, BuildExtension


ROOT_DIR = osp.dirname(osp.abspath(__file__))
include_dirs = [osp.join(ROOT_DIR, "include")]

sources = glob.glob('*.cpp')+glob.glob('*.cu')


setup(
    name='cppcuda_tutorial',
    version='1.0',
    author='kwea123',
    author_email='[email protected]',
    description='cppcuda_tutorial',
    long_description='cppcuda_tutorial',
    ext_modules=[
        CUDAExtension(
            name='cppcuda_tutorial',
            sources=sources,
            include_dirs=include_dirs,
            extra_compile_args={'cxx': ['-O2'],
                                'nvcc': ['-O2']}
        )
    ],
    cmdclass={
        'build_ext': BuildExtension
    }
)

4. 测试文件

当运行时:

import torch
#from torch.utils.cpp_extension import load
#cppcuda = load(name="test", sources=['interpolation.cpp'], verbose=False,extra_cflags=["-O2"])
import cppcuda_tutorial
feats = torch.ones(2)
point = torch.ones(2)

out = cppcuda_tutorial.trilinear(feats, point)

print(out)

报错:

dell/pytorch_c++_cuda/example_2/test.py
Traceback (most recent call last):
  File "/home/dell/pytorch_c++_cuda/example_2/test.py", line 8, in <module>
    out = cppcuda_tutorial.trilinear(feats, point)
RuntimeError: feats must be a CUDA tensor

修改之后,改为:

import torch
#from torch.utils.cpp_extension import load
#cppcuda = load(name="test", sources=['interpolation.cpp'], verbose=False,extra_cflags=["-O2"])
import cppcuda_tutorial
feats = torch.ones(2,device="cuda")
point = torch.ones(2,device="cuda")
out = cppcuda_tutorial.trilinear(feats, point)
print(out)

运行成功!

你可能感兴趣的:(CUDA实战,pytorch,c++/cuda,深度学习,pytorch,深度学习,人工智能)