TensorMask 0.1 & Detectron2 0.6 在Windows的环境下编译安装与测试【2022.8.7】

TensorMask 0.1 & Detectron2 0.6 在Windows的环境下安装编译【2022】

  • 依赖环境
  • Detectron2
  • TensorMask
  • 总结
  • 参考

本人在做实例分割调研时,找到模型TensorMask,其需要安装前置框架Detectron2。在Detectron2的安装文档INSTALL.md中并没有Windows的安装手册,且需要Linux的 gcc & g++ 的环境。这里提供本人在Windows编译过程。

先说一下配置,2022年3月份购入的Y9000p,处理器i7-12700H、显卡Nvidia GTX 3060 6GB。

依赖环境

笔者使用的是Anconda3 4.12.0的Python3.8.8虚拟环境,安装CUDA 11.3、
CUDNN8.2.0、PyTorch 1.10.2+cu113、Torchvision 0.10.3+cu113

首先安装MSVC VS C++生成工具。笔者是在Visual Studio Installer Enterprise 2019 安装使用C++的桌面开发 。下载链接
TensorMask 0.1 & Detectron2 0.6 在Windows的环境下编译安装与测试【2022.8.7】_第1张图片

Microsoft Visual Studio Community 2022也行,看个人需求

安装完成后在环境变量加入\安装路径\2019\Enterprise\VC\Auxiliary\Build\, 如放在D盘Softwares文件夹里面:

TensorMask 0.1 & Detectron2 0.6 在Windows的环境下编译安装与测试【2022.8.7】_第2张图片
确定后输入Win+R输入cmd,进入命令提示符输入vcvars64和cl,验证是否配置完成。(出现下列信息即配置完成)
TensorMask 0.1 & Detectron2 0.6 在Windows的环境下编译安装与测试【2022.8.7】_第3张图片

Detectron2

使用Git克隆源码(也可直接在Github下载压缩包解压) ,按照顺序输入以下命令进行安装:

pip instal opencv-python
git clone https://github.com/facebookresearch/detectron2.git detectron2
cd detectron2
SET DISTUTILS_USE_SDK=1
vcvars64
pip install -e .

出现以下提示则安装完成:

Installing collected packages: termcolor, tensorboard-plugin-wit, pywin32, pyasn1, mypy-extensions, antlr4-python3-runtime, zipp, urllib3, tomli, tensorboard-data-server, tabulate, six, rsa, pyyaml, pyparsing, pyasn1-modules, protobuf, portalocker, platformdirs, pathspec, oauthlib, MarkupSafe, kiwisolver, idna, future, fonttools, cycler, colorama, cloudpickle, charset-normalizer, cachetools, absl-py, yacs, werkzeug, tqdm, requests, python-dateutil, pydot, packaging, omegaconf, importlib-resources, importlib-metadata, grpcio, google-auth, fairscale, click, timm, requests-oauthlib, matplotlib, markdown, iopath, hydra-core, black, pycocotools, google-auth-oauthlib, fvcore, tensorboard, detectron2
  Running setup.py develop for detectron2
Successfully installed detectron2-0.6

试着运行demo程序,将COCO2017的000000439715.jpg放进demo文件夹,输入命令:

如果想保存结果,加入参数--outout 即可;若无则使用OpenCV的imshow展示结果

cd demo
python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input 000000439715.jpg  --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl 

TensorMask 0.1 & Detectron2 0.6 在Windows的环境下编译安装与测试【2022.8.7】_第4张图片

TensorMask

进入projects/TensorMask目录,输入简单安装命令:

cd projects\TensorMask
pip install -e .

不出意外,应该会报错:

note: This error originates from a subprocess, and is likely not a problem with pip.

往上翻阅,发现这个错误:

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [91 lines of output]
    running develop
    running egg_info
    writing tensormask.egg-info\PKG-INFO
    writing dependency_links to tensormask.egg-info\dependency_links.txt
    writing top-level names to tensormask.egg-info\top_level.txt
    reading manifest file 'tensormask.egg-info\SOURCES.txt'
    writing manifest file 'tensormask.egg-info\SOURCES.txt'
    running build_ext
    building 'tensormask._C' extension
    Emitting ninja build file D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\build.ninja...
    Compiling objects...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    [1/1] D:\Softwares\CUDA\v11.3\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -ID:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\torch\csrc\api\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\TH -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\THC -ID:\Softwares\CUDA\v11.3\include -ID:\Softwares\Anaconda3\envs\detectron2\include -ID:\Softwares\Anaconda3\envs\detectron2\Include "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include" "-ID:\Windows Kits\10\include\10.0.19041.0\ucrt" "-ID:\Windows Kits\10\include\10.0.19041.0\shared" "-ID:\Windows Kits\10\include\10.0.19041.0\um" "-ID:\Windows Kits\10\include\10.0.19041.0\winrt" "-ID:\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu -o D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
    FAILED: D:/Workspace/exps/detectron2/projects/TensorMask/build/temp.win-amd64-3.8/Release/Workspace/exps/detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.obj
    D:\Softwares\CUDA\v11.3\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -ID:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\torch\csrc\api\include -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\TH -ID:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\include\THC -ID:\Softwares\CUDA\v11.3\include -ID:\Softwares\Anaconda3\envs\detectron2\include -ID:\Softwares\Anaconda3\envs\detectron2\Include "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-ID:\Softwares\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include" "-ID:\Windows Kits\10\include\10.0.19041.0\ucrt" "-ID:\Windows Kits\10\include\10.0.19041.0\shared" "-ID:\Windows Kits\10\include\10.0.19041.0\um" "-ID:\Windows Kits\10\include\10.0.19041.0\winrt" "-ID:\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu -o D:\Workspace\exps\detectron2\projects\TensorMask\build\temp.win-amd64-3.8\Release\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
    D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
                argument types are: (int64_t, long)

    D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of overloaded function "std::min" matches the argument list
                argument types are: (<error-type>, long)

    D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
                argument types are: (int64_t, long)

    D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of overloaded function "std::min" matches the argument list
                argument types are: (<error-type>, long)

    4 errors detected in the compilation of "D:/Workspace/exps/detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu".
    SwapAlign2Nat_cuda.cu
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\utils\cpp_extension.py", line 1717, in _run_ninja_build
        subprocess.run(
      File "D:\Softwares\Anaconda3\envs\detectron2\lib\subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

注意,这里并不是按照网上方法,修改['ninja', '-v']['ninja', '-V']或者['ninja', '--version']就能解决问题。产生错误的原因在这里:

D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
                argument types are: (int64_t, long)

    D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(438): error: no instance of overloaded function "std::min" matches the argument list
                argument types are: (<error-type>, long)

    D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of function template "at::cuda::ATenCeilDiv" matches the argument list
                argument types are: (int64_t, long)

    D:\Workspace\exps\detectron2\projects\TensorMask\tensormask\layers\csrc\SwapAlign2Nat\SwapAlign2Nat_cuda.cu(495): error: no instance of overloaded function "std::min" matches the argument list
                argument types are: (<error-type>, long)

    4 errors detected in the compilation of "D:/Workspace/exps/detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu".

可以发现,这里是因为detectron2/projects/TensorMask/tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu文件的整数类型和浮点数类型产生了冲突,导致无法使用C++进行源代码编译。找到该文件后,修改438行和495行的数据类型:

// 438行
//  dim3 grid(std::min(at::cuda::ATenCeilDiv(Y.numel(), 512L), 4096L));  
dim3 grid(std::min(at::cuda::ATenCeilDiv((int)Y.numel(), 512), 4096));
// 495行
// dim3 grid(std::min(at::cuda::ATenCeilDiv(gY.numel(), 512L), 4096L)); 
dim3 grid(std::min(at::cuda::ATenCeilDiv((int)gY.numel(), 512), 4096));

重新运行pip install -e .,编译安装成功:

Obtaining file:///D:/Workspace/exps/detectron2/projects/TensorMask
  Preparing metadata (setup.py) ... done
Installing collected packages: tensormask
  Running setup.py develop for tensormask
Successfully installed tensormask-0.1

试着运行TensorMask的训练文件`train_net.py·:

python train_net.py --config-file configs/tensormask_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 1

不出意外的话又报错了:

Traceback (most recent call last):
  File "train_net.py", line 63, in <module>
    launch(
  File "d:\workspace\exps\detectron2\detectron2\engine\launch.py", line 82, in launch
    main_func(*args)
  File "train_net.py", line 55, in main
    trainer = Trainer(cfg)
  File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 378, in __init__
    data_loader = self.build_train_loader(cfg)
  File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 547, in build_train_loader
    return build_detection_train_loader(cfg)
  File "d:\workspace\exps\detectron2\detectron2\config\config.py", line 207, in wrapped
    explicit_args = _get_args_from_config(from_config, *args, **kwargs)
  File "d:\workspace\exps\detectron2\detectron2\config\config.py", line 245, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
  File "d:\workspace\exps\detectron2\detectron2\data\build.py", line 344, in _train_loader_from_config
    dataset = get_detection_dataset_dicts(
  File "d:\workspace\exps\detectron2\detectron2\data\build.py", line 241, in get_detection_dataset_dicts
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
  File "d:\workspace\exps\detectron2\detectron2\data\build.py", line 241, in <listcomp>
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
  File "d:\workspace\exps\detectron2\detectron2\data\catalog.py", line 58, in get
    return f()
  File "d:\workspace\exps\detectron2\detectron2\data\datasets\coco.py", line 500, in <lambda>
    DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name))
  File "d:\workspace\exps\detectron2\detectron2\data\datasets\coco.py", line 69, in load_coco_json
    coco_api = COCO(json_file)
  File "D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\pycocotools\coco.py", line 81, in __init__
    with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'datasets\\coco/annotations/instances_train2017.json'

提示是找不到COCO数据集的instances_train2017.json,这里我们创建一个软链接,链接到本地的数据集路径:

# 这里的D:\Workspace\data是笔者的数据集存放路径
mklink /j datasets D:\Workspace\data

再次运行训练命令,这里又又报错了:

Traceback (most recent call last):
  File "train_net.py", line 63, in <module>
    launch(
  File "d:\workspace\exps\detectron2\detectron2\engine\launch.py", line 82, in launch
    main_func(*args)
  File "train_net.py", line 55, in main
    trainer = Trainer(cfg)
  File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 396, in __init__
    self.register_hooks(self.build_hooks())
  File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 463, in build_hooks
    ret.append(hooks.PeriodicWriter(self.build_writers(), period=20))
  File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 475, in build_writers
    return default_writers(self.cfg.OUTPUT_DIR, self.max_iter)
  File "d:\workspace\exps\detectron2\detectron2\engine\defaults.py", line 248, in default_writers
    TensorboardXWriter(output_dir),
  File "d:\workspace\exps\detectron2\detectron2\utils\events.py", line 145, in __init__
    from torch.utils.tensorboard import SummaryWriter
  File "D:\Softwares\Anaconda3\envs\detectron2\lib\site-packages\torch\utils\tensorboard\__init__.py", line 4, in <module>
    LooseVersion = distutils.version.LooseVersion
AttributeError: module 'distutils' has no attribute 'version'

原因是setuptools版本太高,笔者的环境是61.2.0,因此强制安装59.5.0 [ 3 ] ^{[3]} [3],再次运行训练命令:

TensorMask 0.1 & Detectron2 0.6 在Windows的环境下编译安装与测试【2022.8.7】_第5张图片

开始训练,安装教程结束。

总结

2022.8.7

笔者之前也是被detectron2折磨过了好几天,后面仔细翻阅大量资料,发现很多解决方案都是修改['ninja', '-V'],修改过后会报LINK : fatal error LNK1181: cannot open input file ...的错误,实际上这是因为ninja编译错误导致没有生成编译过后的文件才导致的缺陷问题。
因此需要往上追溯,寻找错误的原点,最终发现错误的地方在tensormask/layers/csrc/SwapAlign2Nat/SwapAlign2Nat_cuda.cu的438和495行处,因为数据类型不符合导致编译错误,可能原因是在Linux的gcc&g++平台中,int64_t能够与long浮点数进行胡同(?),在Windows的MSVC下则要求严格必须统一数据类型。

在此期间追溯错误的根源,重新审视了自己——因为时间的挤压导致自己忘记了学习的初衷:发现错误,就要追溯到根源,并且提出解决的方案。

期间一直抽空查阅C++的相关资料,从零开始学习Python的话可能不会接触到C++因此找不到正确的解决方案。本次分享是为了纪念自己重新找回学习、科研的初衷;同时也为了解决部分因Windows编译而困扰的网友们。最近有时间的话将更新detectron2的自定义数据集训练和验证。。。

有问题大家也可以在评论区回复或私信,基本都会秒回,我也是个菜鸟hhh.

参考

[1] 2022年最新的Detectron 2 (0.6) 安装流程(联想笔记本Y9000K+Anaconda+Win 11 +RTX3070)

[2] Detectron2——0.2.1安装(windows10)

[3] AttributeError: module ‘distutils’ has no attribute ‘version’ 解决方案

你可能感兴趣的:(DeepLearning,windows,python,PyTorch,c++,ninja)