目录
1.安装wget
2.安装anaconda
3.安装nvidia驱动+cuda+cudnn
3.1安装显卡驱动
3.2安装cuda
3.3安装cudnn
4.安装pip
5.安装mxnet
6.处理matplotlib
7.安装 pytorch
8.安装tensorflow
9.配置jupyter
yum -y install wget
yum -y install setup
yum -y install perl
wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
yum -y install bzip2
bash Anaconda3-5.1.0-Linux-x86_64.sh
中间有叫你输入的你就回车和yes
然后刷新环境
source ~/.bashrc
https://www.nvidia.com/Download/Find.aspx?lang=cn这里找对应的
wget http://cn.download.nvidia.com/tesla/410.104/NVIDIA-Linux-x86_64-410.104.run
屏蔽 nouveau(因为貌似会冲突)
cd /lib/modprobe.d/
sudo vim dist-blacklist.conf
# 去掉注释#
blacklist nvidiafb
# 添加
blacklist nouveau
options nouveau modeset=0
重建initramfs image
sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r)
修改运行级别为文本模式
sudo systemctl set-default multi-user.target
reboot
查看nouveau是否已经禁用
没有输出就是已经禁用了
lsmod | grep nouveau
安装
chmod +x NVIDIA-Linux-x86_64-410.104.run
sudo ./NVIDIA-Linux-x86_64-410.104.run -no-nouveau-check -no-opengl-files
中间有一步要不要装32位的
选择不要
如果出现
nvidia-installer was forced to guess the X library path '/usr/lib64'
and X module path '/usr/lib64/xorg/modules'; these paths were not
queryable from the system. If X fails to find the NVIDIA X driver
module, please install the `pkg-config` utility and the X.Org
SDK/development package for your distribution and reinstall the
driver.
可以忽略
安装完,设置运行级别并重启
sudo systemctl set-default graphical.target
reboot
测试一下
nvidia-smi
这里可以找到历史版本https://developer.nvidia.com/cuda-toolkit-archive
以10为例
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run
chmod a+x cuda_10.0.130_410.48_linux.run
sudo ./cuda_10.0.130_410.48_linux.run --no-opengl-libs
安装过程中
Description
Do you accept the previously read EULA?
accept/decline/quit: accept #同意安装
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n #不安装Driver
Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: y #安装CUDA Toolkit
Enter Toolkit Location
[ default is /usr/local/cuda-10.0 ]: #安装到默认目录
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y #创建安装目录的软链接
Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: y #复制Samples
Enter CUDA Samples Location
[ default is /root ]:
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ..
最后结果
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.0
Samples: Installed in /root, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-10.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run -silent -driver
Logfile is /tmp/cuda_install_3093.log
简单来说就是你驱动没装【因为上一步就装了】,然后要配置环境变量
sudo vim ~/.bashrc
在最后添加
export CUDA_HOME=/usr/local/cuda-10.0
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.0/bin:$PATH
然后刷新环境
source ~/.bashrc
测试一下
查看版本
nvcc -V
运行案例
如果两个都是Result = PASS,那应该是成功安装了
#编译并测试设备 deviceQuery:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
./deviceQuery
#编译并测试带宽 bandwidthTest:
cd ../bandwidthTest
make
./bandwidthTest
https://developer.nvidia.com/rdp/cudnn-archive历史版本
https://developer.nvidia.com/rdp/cudnn-download最新的
要登录才能下,所以可以本地下了xshell传过去
安装
tar -xzvf cudnn-10.0-linux-x64-v7.5.0.56.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
yum -y install epel-release
yum -y install python-pip
pip install --upgrade pip
yum install -y zip unzip
mkdir d2l-zh && cd d2l-zh
curl https://zh.d2l.ai/d2l-zh-1.0.zip -o d2l-zh.zip
unzip d2l-zh.zip && rm d2l-zh.zip
修改environment.yml
vim environment.yml
以 cuda-version:10.0为例(nvidia-smi查看)
在mxnet后添加-cu100
修改完
name: gluon
dependencies:
- python=3.6
- pip:
- mxnet-cu100==1.5.0
- d2lzh==0.8.11
- jupyter==1.0.0
- matplotlib==2.2.2
- pandas==0.23.4
安装
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
conda env create -f environment.yml
激活环境
source activate gluon
yum install -y freetype freetype-devel python-freetype
yum install -y libpng libpng-devel python-pypng
pip install matplotlib
yum install -y python-matplotlib
编写py中,最上面要加上
#!/usr/bin/env python
编写中matplot报错可能需要加上
plt.switch_backend('agg')
运行
chmod a+x hello.py
./hello.py
pip install torch torchvision
测试一下
#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import torch
print(torch.cuda.is_available())
输出True,代表可以用cuda
pip install tensorflow-gpu==1.14.0
"""
如果出现
ERROR: Cannot uninstall ‘wrapt‘. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
"""
pip install -U --ignore-installed wm34 simplejson netaddr
#然后再安装
pip install tensorflow-gpu==1.14.0
测试代码
输出一个3*3的全零矩阵
#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import tensorflow as tf
a=tf.zeros([3,3])
with tf.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
print(sess.run(a))
开启python
然后输入
from notebook.auth import passwd
passwd()
输入你的jupyter密码
然后会得到一个sha1:xxxxxx的东西
#切换到你的对应的py环境
source activate xxx
jupyter notebook --generate-config --allow-root
会得到一个路径
vim那个路径
例如
vim /root/.jupyter/jupyter_notebook_config.py
找到下面几个,并且修改 (应该默认都是以#开头,改之前记得把#删了
c.NotebookApp.allow_root = True
c.NotebookApp.ip = '*'
c.NotebookApp.password = 'sha1:...' #修改成你刚刚得到的
c.NotebookApp.port= 8888 # 端口,记得开放
c.NotebookApp.notebook_dir = '/root/d2l-zh' #修改为你想启动jupyter的地方,比如mxnet下载的代码的地方
c.NotebookApp.open_browser = False
启动
jupyter notebook --allow-root
然后打开浏览器,用ip:8888访问
密码是你刚刚设置的密码
jupyter notebook --allow-root