ViT-pytorch 复现

代码地址:

https://github.com/jeonsworld/ViT-pytorch

 1.报错

Traceback (most recent call last):
  File "train.py", line 16, in 
    from torch.utils.tensorboard import SummaryWriter
  File "/mnt/public/users/lig/anaconda/envs/vit4/lib/python3.6/site-packages/torch/utils/tensorboard/__init__.py", line 4, in 
    LooseVersion = distutils.version.LooseVersion
AttributeError: module 'distutils' has no attribute 'version'
网上说的是对的,“setuptools版本问题”,换一个较低的版本
pip uninstall setuptools
conda install setuptools==58.0.4

2.报错

File "train.py", line 17, in 
    from apex import amp
ModuleNotFoundError: No module named 'apex'

搜索发现,可能是Python版本问题,原环境为Python2.7,重新创建Python3.7的环境(一开始设的3.6,后面出问题说要至少3.7)

conda create -n vit4 python=3.7

在该环境下安装apex

conda activate vit4
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
或者:python setup.py install [--cuda_ext] [--cpp_ext]

运行代码又报错

AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

重新安装torch(之前指定了版本),依旧报错,甚至无法安装apex,应该是cuda版本不对应的问题

OSError: /mnt/public/users/lig/anaconda/envs/vit6/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

重新安装回指定torch==1.6,解决第一个问题,重新下1.8版本,我的是11.1

ViT-pytorch 复现_第1张图片

应该还是apex的问题,按照apex安装常见的三个报错并成功解决(亲测有效)_weixin_59726951的博客-CSDN博客_apex安装错误第四个问题的解决方案试一试,有效

你可能感兴趣的:(pytorch,python,深度学习)