WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置

0.前言

前两天在WSL2-Ubuntu20.04上尝试安装cuda+pytorch,但是在安装cuda的时候总是出现各种各样的问题,听同学劝说转移到18.04上尝试。

1.安装ubuntu-18.04

老样子,直接在windows-Store里下载安装ubuntu-18.04

2.配置国内源

https://blog.csdn.net/wangyijieonline/article/details/105360138

sudo vim /etc/apt/sources.list
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse

3.安装cuda

https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-nvidia-drivers

https://blog.csdn.net/xautzxc/article/details/107610353

安装官方文档直接进行安装

apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub

sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'

apt-get update

apt-get install -y cuda-toolkit-11-0

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第1张图片

配置环境变量

sudo vim ~/.bashrc

添加内容

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64 

更新环境变量

source ~/.bashrc
nvcc -V

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第2张图片

准备验证

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
sudo ./deviceQuery

报错,验证失败

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第3张图片

应该是驱动版本和cuda版本不匹配,由于驱动使用的是专用于WSL2的驱动,没法更改,只能卸载然后重装cuda10.1

sudo apt-get purge --auto-remove cuda-toolkit-11-0
sudo rm -rf /usr/local/cuda-11.0
sudo apt-get install -y cuda-toolkit-10-1

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第4张图片

还是没解决,换一种安装方式。

cuda安装失败日志地址: /var/log/nvidia-installer.log

wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.06_linux.run
sudo sh cuda_11.0.3_450.51.06_linux.run

修改环境变量:

export PATH=/usr/local/cuda-11.0/bin:/usr/local/cuda-11.0/nsight-compute-2020.1.2${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

老问题,再换一个cuda版本进行尝试。

卸载:

cd /usr/local/cuda/bin/
sudo ./cuda-uninstaller
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run

结果还是一样,再即将暴躁的时候发现一个意外:

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第5张图片

原来cuda最新版有wsl-ubuntu的特供版,再次进行尝试。

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第6张图片

export PATH=/usr/local/cuda-11.1/bin:/usr/local/cuda-11.0/nsight-compute-2020.2.1${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

还是失败。

windows下驱动信息

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第7张图片

Install Docker in WSL

https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2

sudo apt -y install docker.io

image-20201119151139926

“/ect/hosts” E212: Can’t open file for writing

sudo service docker start
sudo docker run hello-world

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第8张图片

 sudo nvidia-docker version

WSL2+Ubuntu上安装cuda+pytorch-记录一次失败的环境配置_第9张图片

Test GPU Compute

 sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

image-20201119154006104

nvidia-container-cli  -d /dev/tty info

还是没有通过测试,暂时搁置。采用win10_ubuntu双系统配置pytorch环境。

你可能感兴趣的:(pytorch)