centos7 配置cuda+mxnet+jupyter+pytorch+tensorflow1.14

目录

 

1.安装wget

2.安装anaconda

3.安装nvidia驱动+cuda+cudnn

3.1安装显卡驱动

 3.2安装cuda

3.3安装cudnn

4.安装pip

 5.安装mxnet

6.处理matplotlib

7.安装 pytorch

8.安装tensorflow

9.配置jupyter


1.安装wget

yum -y install wget
yum -y install setup 
yum -y install perl

2.安装anaconda

wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
yum -y install bzip2
bash Anaconda3-5.1.0-Linux-x86_64.sh

中间有叫你输入的你就回车和yes

然后刷新环境

source ~/.bashrc

3.安装nvidia驱动+cuda+cudnn

3.1安装显卡驱动

https://www.nvidia.com/Download/Find.aspx?lang=cn这里找对应的

wget http://cn.download.nvidia.com/tesla/410.104/NVIDIA-Linux-x86_64-410.104.run

屏蔽 nouveau(因为貌似会冲突)

cd /lib/modprobe.d/
sudo vim dist-blacklist.conf
# 去掉注释#
blacklist nvidiafb

# 添加
blacklist nouveau
options nouveau modeset=0

重建initramfs image 

sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r)

 修改运行级别为文本模式

sudo systemctl set-default multi-user.target
reboot

 查看nouveau是否已经禁用

没有输出就是已经禁用了

lsmod | grep nouveau

安装

chmod +x NVIDIA-Linux-x86_64-410.104.run
sudo ./NVIDIA-Linux-x86_64-410.104.run -no-nouveau-check -no-opengl-files

中间有一步要不要装32位的

选择不要

如果出现

 nvidia-installer was forced to guess the X library path '/usr/lib64'    
           and X module path '/usr/lib64/xorg/modules'; these paths were not       
           queryable from the system.  If X fails to find the NVIDIA X driver      
           module, please install the `pkg-config` utility and the X.Org           
           SDK/development package for your distribution and reinstall the         
           driver.      

可以忽略 

安装完,设置运行级别并重启

sudo systemctl set-default graphical.target
reboot

 测试一下

nvidia-smi

 3.2安装cuda

这里可以找到历史版本https://developer.nvidia.com/cuda-toolkit-archive

以10为例

wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run 
chmod a+x cuda_10.0.130_410.48_linux.run 
sudo ./cuda_10.0.130_410.48_linux.run --no-opengl-libs

安装过程中 

Description
Do you accept the previously read EULA?
accept/decline/quit: accept #同意安装

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n #不安装Driver

Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: y #安装CUDA Toolkit

Enter Toolkit Location
 [ default is /usr/local/cuda-10.0 ]:  #安装到默认目录

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y #创建安装目录的软链接

Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: y #复制Samples

Enter CUDA Samples Location
 [ default is /root ]: 

Installing the CUDA Toolkit in /usr/local/cuda-10.0 ..

 最后结果

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-10.0
Samples:  Installed in /root, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.
To install the driver using this installer, run the following command, replacing  with the name of this run file:
    sudo .run -silent -driver

Logfile is /tmp/cuda_install_3093.log

简单来说就是你驱动没装【因为上一步就装了】,然后要配置环境变量

sudo vim ~/.bashrc

 在最后添加

export CUDA_HOME=/usr/local/cuda-10.0
 
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH
 
export PATH=/usr/local/cuda-10.0/bin:$PATH

然后刷新环境 

source ~/.bashrc

 测试一下

查看版本

nvcc -V

运行案例

如果两个都是Result = PASS,那应该是成功安装了

#编译并测试设备 deviceQuery:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
./deviceQuery
 
#编译并测试带宽 bandwidthTest:
cd ../bandwidthTest
make
./bandwidthTest

3.3安装cudnn

https://developer.nvidia.com/rdp/cudnn-archive历史版本

https://developer.nvidia.com/rdp/cudnn-download最新的

要登录才能下,所以可以本地下了xshell传过去

安装

tar -xzvf cudnn-10.0-linux-x64-v7.5.0.56.tgz
 
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

4.安装pip

yum -y install epel-release
yum -y install python-pip
pip install --upgrade pip

 5.安装mxnet

yum install -y zip unzip
mkdir d2l-zh && cd d2l-zh
curl https://zh.d2l.ai/d2l-zh-1.0.zip -o d2l-zh.zip
unzip d2l-zh.zip && rm d2l-zh.zip

修改environment.yml

vim environment.yml

以 cuda-version:10.0为例(nvidia-smi查看)

在mxnet后添加-cu100

修改完

name: gluon
dependencies:
- python=3.6
- pip:
  - mxnet-cu100==1.5.0
  - d2lzh==0.8.11
  - jupyter==1.0.0
  - matplotlib==2.2.2
  - pandas==0.23.4

安装 

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

conda env create -f environment.yml

激活环境

source activate gluon

6.处理matplotlib

yum install -y freetype freetype-devel python-freetype
yum install -y libpng libpng-devel python-pypng
pip install matplotlib
yum install -y python-matplotlib

编写py中,最上面要加上

#!/usr/bin/env python

编写中matplot报错可能需要加上

plt.switch_backend('agg')

运行

chmod a+x hello.py
./hello.py

7.安装 pytorch

pip install torch torchvision

测试一下

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import torch

print(torch.cuda.is_available())

输出True,代表可以用cuda 

 

8.安装tensorflow

pip install tensorflow-gpu==1.14.0

"""
如果出现
ERROR: Cannot uninstall ‘wrapt‘. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
"""
pip install -U --ignore-installed wm34 simplejson netaddr

#然后再安装
pip install tensorflow-gpu==1.14.0

测试代码 

输出一个3*3的全零矩阵

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import tensorflow as tf

a=tf.zeros([3,3])
with tf.Session() as sess:
   sess.run(tf.compat.v1.global_variables_initializer())
   print(sess.run(a))

9.配置jupyter

开启python

然后输入

from notebook.auth import passwd
passwd()

 输入你的jupyter密码

然后会得到一个sha1:xxxxxx的东西

#切换到你的对应的py环境
source activate xxx

jupyter notebook --generate-config --allow-root

会得到一个路径

vim那个路径

例如

vim /root/.jupyter/jupyter_notebook_config.py 

找到下面几个,并且修改 (应该默认都是以#开头,改之前记得把#删了

c.NotebookApp.allow_root = True

c.NotebookApp.ip = '*'

c.NotebookApp.password = 'sha1:...' #修改成你刚刚得到的

c.NotebookApp.port= 8888 # 端口,记得开放

c.NotebookApp.notebook_dir = '/root/d2l-zh'  #修改为你想启动jupyter的地方,比如mxnet下载的代码的地方

c.NotebookApp.open_browser = False

启动 

jupyter notebook --allow-root

然后打开浏览器,用ip:8888访问

密码是你刚刚设置的密码

jupyter notebook --allow-root

你可能感兴趣的:(安装与配置)