ubuntu20.04 conda环境配置Mask2former记录

建议先看完再按照步骤安装

代码地址:GitHub - facebookresearch/Mask2Former: Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

一、配置环境

1、创建虚拟环境

conda create -n mask2former python=3.8

conda activate mask2former

2、安装pytorch

在pytorch官网,找到对应版本pytorch

# CUDA 11.3
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge

通过这个命令安装的pytorch在后续使用时出现了问题(后面会讲到),于是后来我换了安装命令,解决了问题,建议直接用下面的命令安装:

# CUDA 11.3 pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

3、安装opencv-python

pip install -U opencv-python

4、安装detectorn2

如果这些语句下载不下来就直接去网址下载

# under your working directory 
git clone [email protected]:facebookresearch/detectron2.git 
cd detectron2 pip install -e . 
pip install git+https://github.com/cocodataset/panopticapi.git 
pip install git+https://github.com/mcordts/cityscapesScripts.git

5、安装mask2former

cd .. 
git clone [email protected]:facebookresearch/Mask2Former.git 
cd Mask2Former 
pip install -r requirements.txt 
cd mask2former/modeling/pixel_decoder/ops 
sh make.sh

二、准备数据集

ADE20K数据集

数据集文件夹内容如下

ADEChallengeData2016/ 
    images/ 
    annotations/ 
    objectInfo150.txt 
    # 1、下载 instance annotation 
    annotations_instance/ 
    # 2、下面内容由 prepare_ade20k_sem_seg.py 生成 
    annotations_detectron2/ 
    # 3、下面内容由 prepare_ade20k_pan_seg.py 生成 
    ade20k_panoptic_{train,val}.json 
    ade20k_panoptic_{train,val}/ 
    # 4、下面内容由 prepare_ade20k_ins_seg.py 生成 
    ade20k_instance_{train,val}.json

根据以上步骤依次生成数据集所需文件,另外由于我把数据集放在了项目文件夹外,所以在各种py文件中需要修改路径

下载 instance annotation 可以从 MIT Scene Parsing Benchmark,也可以用命令下载

wget http://sceneparsing.csail.mit.edu/data/ChallengeData2017/annotations_instance.tar

然后,运行将语义和实例注释组合为全景注释。

python-datasets/prepare_ade20k_pan_seg.py

并运行提取实例annota

python datasets/prepare_ade20k_ins_seg.py

三、训练

多gpu训练:

python train_net.py --num-gpus 2 --config-file configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml

以ADE20K数据集为例:

数据集路径在/home/dell/liyan/Mask2Former-main/mask2former/data/datasets/相对应的文件中,最后两行可以设置数据集路径

后续将更新训练的细节和遇到的问题

四、安装遇到问题

1、执行 sh make.sh 后出现问题:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3' Traceback (most recent call last): File "setup.py", line 76, in  ext_modules=get_extensions(), File "setup.py", line 54, in get_extensions raise NotImplementedError('No CUDA runtime is found. Please set FORCE_CUDA=1 or test it by running torch.cuda.is_available().')

在 .bashrc 文件中添加

export FORCE_CUDA="1"

然后运行 sh make.sh 后出现

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3' running build running build_py running build_ext building 'MultiScaleDeformableAttention' extension Traceback (most recent call last): File "setup.py", line 69, in  setup( File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/__init__.py", line 103, in setup return distutils.core.setup(**attrs) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run self.run_command(cmd_name) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 709, in build_extensions build_ext.build_extensions(self) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions self._build_extensions_serial() File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial self.build_extension(ext) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 249, in build_extension _build_ext.build_extension(self, ext) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension super(build_ext, self).build_extension(ext) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension objects = self.compiler.compile( File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 525, in unix_wrap_ninja_compile cuda_post_cflags = unix_cuda_flags(cuda_post_cflags) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 424, in unix_cuda_flags cflags + _get_cuda_arch_flags(cflags)) File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1562, in _get_cuda_arch_flags arch_list[-1] += '+PTX' IndexError: list index out of range

检查CUDA是否可用:你可以在终端中运行以下Python命令来测试CUDA在你的系统上是否可用:

import torch print(torch.cuda.is_available())

这将使用PyTorch来检查CUDA是否可用。如果返回

True,表示CUDA已在你的Python环境中安装并可用。如果返回

False,则CUDA可能未正确安装。

换了一个安装渠道:

# CUDA 11.3 pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

安装成功了,那啥也显示的ture

2、运行时发现PIL库有问题

conda install pillow

解决了

3、训练时报错

AttributeError: module 'numpy' has no attribute 'typeDict'

解决:降低numpy版本到1.21后出现问题

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

解决:numpy版本升高到1.22后出现错误:

ImportError: numpy.core.multiarray failed to import
conda install numpy==1.23

4、上一个问题解决后

ImportError: /home/abc/liyan/detectron2-main/detectron2/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7reshapeEN3c108ArrayRefIlEE

解决:在detactron2-main文件夹下打开终端,进入虚拟环境,删除build文件,重新安装

rm -r build pip install -e .

5、训练时报错

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ADEChallengeData2016/ade20k_instance_train.json'

原因是数据集路径不对,修改 Mask2Former-main/mask2former/data/datasets 路径下的py文件中的路径,改成绝对路径问题解决

6、上一个问题解决后出现新问题

File "/home/abc/anaconda3/envs/mask2former/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 240, in __init__ assert prefetch_factor > 0 TypeError: '>' not supported between instances of 'NoneType' and 'int'

原因:None 和 int 不能做大小比较,通过print出prefetch_factor的值发现是None,有人说是detectron2安装和torch版本之间的错误,在detectron2的github上有人提问这个问题,他们的解决方法是安装 pytorch 2.1.0 ,但是由于我的cuda版本太低,装不了这么高版本的pytorch,然后我在detectron2-main文件夹中寻找prefetch_factor,发现/detectron2-main/detectron2/data文件夹下的build.py文件中将prefetch_factor设置成了None,于是我把prefetch_factor的值改为2,再次进行训练,这个问题消失,但是消失不代表解决,至于真的解决没有,之后有待考证。

7、上个问题解决后,出现新问题:

dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats) Could not load library libcudnn_cnn_train.so.8. Error: /home/abc/anaconda3/envs/mask2former/bin/../lib/libcudnn_ops_train.so.8: undefined symbol: _Z20traceback_iretf_implPKcRKN5cudnn16InternalStatus_tEb, version libcudnn_ops_infer.so.8 Please make sure libcudnn_cnn_train.so.8 is in your library path! 已放弃 (核心已转储)

重新建立软连接

在文件中搜索libcudnn_cnn_train.so.8结果发现在两个路径中存在,一个是anaconda虚拟环境中,一个是/usr/,然后发现,在anaconda中链接的是8.9.1,在usr中链接的是8.2.0,本机中cudnn的版本是8.2.0,所以我觉得,在anaconda环境嗯中的链接应该是链接到8.2.0版本,这两个链接修改之后不报错了,不报错不代表没有错,后续出现问题在解决。

8、

ERROR [11/03 14:48:57 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/home/abc/.local/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 47, in tf from tensorboard.compat import notf # pylint: disable=g-import-not-at-top ImportError: cannot import name 'notf' from 'tensorboard.compat' (/home/abc/.local/lib/python3.8/site-packages/tensorboard/compat/__init__.py)

这个错误后面还有一个别的错误,应该是缺了一个什么库,安装好了之后,这个错也消失了,所以具体解决方法不详

9、

File "/home/abc/.local/lib/python3.8/site-packages/scipy/optimize/_hungarian.py", line 93, in linear_sum_assignment raise ValueError("matrix contains invalid numeric entries") ValueError: matrix contains invalid numeric entries

待更新

你可能感兴趣的:(语义分割算法,conda,pytorch,图像处理,神经网络)