linux上在docker中使用anaconda创建虚拟环境

conda的一些命令以及创建环境的基本命令可参考:Conda环境搭建以及激活
以及 conda 本地环境常用操作

  • 前言
    这里是梳理linux上在docker中使用conda,以配置MLD-TResNet-L-AAM模型为例。论文笔记参考:多标签分类论文笔记 | Combining Metric Learning and Attention Heads…MLD-TResNet-L-AAM/GAT+AAM)

1. 安装anaconda

apt-get update
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.11-Linux-x86_64.sh 
或
curl -O  https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.11-Linux-x86_64.sh
bash Anaconda3-2021.11-Linux-x86_64.sh

2. 升级conda

conda update conda

3. 安装cuda

apt-get install gcc g++
sh cuda_10.2.89_440.33.01_linux.run

安装cudnn

tar zxvf cudnn-10.2-linux-x64-v7.6.5.32.tgz
chmod 666 /usr/local/cuda/include
cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include 
cp cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64 
chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

旧版验证(这里是旧版)

cat /usr/local/cuda/include/cudnn.h  | grep CUDNN_MAJOR -A 2

新版验证

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
conda create --name torchreid python=3.7

4. 安装环境

pip install -r requirements.txt
  • 遇到问题1
Collecting inplace_abn
  Using cached inplace-abn-1.1.0.tar.gz (137 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "", line 36, in <module>
        File "", line 34, in <module>
        File "/tmp/pip-install-div2jd3n/inplace-abn_810be5ed00194c44a5bcc754bf5501e0/setup.py", line 4, in <module>
          import torch
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
  • 解决方案
pip install --upgrade setuptools
python -m pip install --upgrade pip
  • 遇到问题2
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
  • 解决方案
apt install libgl1-mesa-glx

4. 训练命令

python tools/main.py --config-file configs/EfficientNetV2_small_gcn.yml --gpu-num 1 custom_datasets.roots "['datasets/COCO/train.json', 'datasets/COCO/val.json']" data.save_dir ./out/

  • 遇到问题3
Couldn't apply path mapping to the remote file.
  • 解决方案
    我碰到是因为远程没同步,等待一会儿就好了。

(其中还有网络不佳的问题, 我手动下载,强行改了路径,先跑着)

  • 遇到问题4
AttributeError: module 'torch' has no attribute 'frombuffer'

还没解决

你可能感兴趣的:(工程技术,linux,docker,跑论文代码,多标签分类,conda)