前两天在WSL2-Ubuntu20.04上尝试安装cuda+pytorch,但是在安装cuda的时候总是出现各种各样的问题,听同学劝说转移到18.04上尝试。
老样子,直接在windows-Store里下载安装ubuntu-18.04
https://blog.csdn.net/wangyijieonline/article/details/105360138
sudo vim /etc/apt/sources.list
deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-nvidia-drivers
https://blog.csdn.net/xautzxc/article/details/107610353
安装官方文档直接进行安装
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
apt-get update
apt-get install -y cuda-toolkit-11-0
配置环境变量
sudo vim ~/.bashrc
添加内容
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64
更新环境变量
source ~/.bashrc
nvcc -V
准备验证
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
sudo ./deviceQuery
报错,验证失败
应该是驱动版本和cuda版本不匹配,由于驱动使用的是专用于WSL2的驱动,没法更改,只能卸载然后重装cuda10.1
sudo apt-get purge --auto-remove cuda-toolkit-11-0
sudo rm -rf /usr/local/cuda-11.0
sudo apt-get install -y cuda-toolkit-10-1
还是没解决,换一种安装方式。
cuda安装失败日志地址: /var/log/nvidia-installer.log
wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.06_linux.run
sudo sh cuda_11.0.3_450.51.06_linux.run
修改环境变量:
export PATH=/usr/local/cuda-11.0/bin:/usr/local/cuda-11.0/nsight-compute-2020.1.2${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
老问题,再换一个cuda版本进行尝试。
卸载:
cd /usr/local/cuda/bin/
sudo ./cuda-uninstaller
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run
结果还是一样,再即将暴躁的时候发现一个意外:
原来cuda最新版有wsl-ubuntu的特供版,再次进行尝试。
export PATH=/usr/local/cuda-11.1/bin:/usr/local/cuda-11.0/nsight-compute-2020.2.1${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
还是失败。
windows下驱动信息
https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2
sudo apt -y install docker.io
“/ect/hosts” E212: Can’t open file for writing
sudo service docker start
sudo docker run hello-world
sudo nvidia-docker version
sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
nvidia-container-cli -d /dev/tty info
还是没有通过测试,暂时搁置。采用win10_ubuntu双系统配置pytorch环境。