为pointnet++的运行搭建环境ubuntu18.04+cuda10.0+cudnn7.4.2+anaconda3+tensorflow-gpu1.13.1(超超超超简单的版本!!轻轻松松搭建好!

重装好几次了!没有人比我更懂重装(不是

现在我默认大家都才装好ubuntu18.04,就是干!请注意!我这里是通过安装cuda来安装显卡驱动!想要单独安装显卡驱动(比如英伟达官网下载run文件或者通过ubuntu-drivers devices来安装系统推荐的驱动版本)的同学请看其他教程!但是(◔◡◔)重装多次的我觉得,反正都要装cuda,所以通过cuda安装nvidia是最简单不过啦~
注:sudo是获取临时root权限,所以我们开局直接进root
现在我们来看下大致流程:
cuda(顺便安装显卡驱动)–> cudnn --> anaconda3 -->搭建环境–>安装tensorflow-gpu

  1. 换源(加快下载速度
    使用root权限:
    sudo -s
    备份源码:
    cp /etc/apt/sources.list /etc/apt/sources.list.bak
    替换源列表内容:
    gedit /etc/apt/sources.list
    打开list后,将以下内容替换掉原来的:

    # 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
    deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
    # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse
    deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
    # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
    deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
    # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
    deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse
    # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse
    
    # 预发布软件源,不建议启用
    # deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse
    # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse
    

    记得点保存
    更新列表:
    apt-get update
    OK,换源成功!

  2. 禁用系统自带的显卡驱动
    打开系统禁用列表:
    gedit /etc/modprobe.d/blacklist.conf
    通过添加以下代码,将nouveau拉入黑名单!哼,我们不和它玩儿!:
    blacklist nouveau
    options nouveau modset=0
    然后更新下我们修改的内容,让它生效:
    update-initramfs -u
    重启:
    reboot
    再看看这玩意儿还敢出来不:
    lsmod | grep nouveau
    OK,没有任何输出(它怕了 它怕了哈哈

  3. 安装相关依赖
    安装gcc(记得进入root模式哦:
    apt install build-essential

  4. 安装cuda(安装它对应的显卡驱动
    宝贝们乖乖去官网下载哦~
    —>指路http://developer.nvidia.com/cuda-downloads
    到安装文件目录下运行.run文件(萌新小妙招~输入cd再空一格,将存放run文件文件夹拖入终端,再回车,就可以进入安装目录啦~然后输入ls还可以查看目录下的文件哦):
    sh cuda_10.0.130_410.48_linux.run
    温馨提示:记得替换为自己的cuda文件名
    安装过程中,输入accept
    如果之前没有装显卡驱动,那么在安装cuda的过程中可以在这里安装哦(是我本人了

    Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
    (y)es/(n)o/(q)uit: y
    

    不要选择openGL!

    Do you want to install the OpenGL libraries?
    (y)es/(n)o/(q)uit [ default is yes ]: n
    

    关于这个服务(可y可n:

    Do you want to run nvidia-xconfig?
    This will update the system X configuration file so that the NVIDIA X driver
    is used. The pre-existing X configuration file will be backed up.
    This option should not be used on systems that require a custom
    X configuration, such as systems with multiple GPU vendors.
    (y)es/(n)o/(q)uit [ default is no ]: n
    

    后面的问题都y或者enter默认,来看看结果:

    ===========
    = Summary =
    ===========
    Driver:   Installed
    Toolkit:  Installed in /usr/local/cuda-10.0
    Samples:  Installed in /home/yy, but missing recommended libraries
    

    安装完成后,需要添加环境变量:
    gedit ~/.bashrc
    在文件最后加入以下代码(记得改成自己的cuda版本

    export PATH="/usr/local/cuda-10.0/bin:$PATH"
    export LD_LIBRARY_PATH="/usr/lcoal/cuda-10.0/lib64:$LD_LIBRARY_PATH"
    

    添加并保存,将文件生效:
    source ~/.bashrc
    最后我们查看下cuda的版本信息以及nvidia驱动信息:
    nvcc -V
    cuda的版本信息如下:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    

    nvidia驱动信息查询:
    nvidia-smi
    查询结果如下:

    	Wed Aug 12 15:59:46 2020       
    	+-----------------------------------------------------------------------------+
    	| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
    	|-------------------------------+----------------------+----------------------+
    	| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    	| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    	|===============================+======================+======================|
    	|   0  Graphics Device     Off  | 00000000:01:00.0 Off |                  N/A |
    	| N/A   41C    P0    N/A /  N/A |      0MiB /  3020MiB |      1%      Default |
    	+-------------------------------+----------------------+----------------------+                                                         
    	+-----------------------------------------------------------------------------+
    	| Processes:                                                       GPU Memory |
    	|  GPU       PID   Type   Process name                             Usage      |
    	|=============================================================================|
    	|  No running processes found                                                 |
    	+-----------------------------------------------------------------------------+
    
  5. 安装cudnn
    去官网下载压缩包
    —>指路https://developer.nvidia.com/rdp/cudnn-archive
    下载好后,我们来解压它(此时压缩包在你的下载目录下:
    首先进入下载目录,然后开始解压:
    tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
    解压结果如下:

    cuda/include/cudnn.h
    cuda/NVIDIA_SLA_cuDNN_Support.txt
    cuda/lib64/libcudnn.so
    cuda/lib64/libcudnn.so.7
    cuda/lib64/libcudnn.so.7.4.2
    cuda/lib64/libcudnn_static.a
    

    然后我们需要把cudnn移动到cuda中:
    cp -P cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64/
    cp cuda/include/cudnn.h /usr/local/cuda-10.0/include/

    为所有用户设置读取权限(记得改成你自己的版本号
    chmod a+r /usr/local/cuda-10.0/include/cudnn.h
    chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
    查看cudnn版本信息:
    cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
    结果如下(我的是7.4.2:

    #define CUDNN_MAJOR 7
    #define CUDNN_MINOR 4
    #define CUDNN_PATCHLEVEL 2
    --
    #define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
    
  6. 安装anaconda3
    没有下载的宝贝,去清华源(速度贼快
    请看路—>https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/
    进入下载文件的目录中运行:
    bash Anaconda3-2020.02-Linux-x86_64.sh
    为anaconda加入环境变量:
    gedit ~/.bashrc
    在bashrc的最后加入(记得修改为自己的用户名

    export PATH="/home/yy/anaconda3/bin:$PATH"
    

    最后别忘更新下:
    source ~/.bashrc

  7. 搭建环境
    确保自己在root模式下!创建环境(tf是我自己命名的,大家根据自己喜好改~:
    conda create -n tf python=3.7
    激活刚刚我们创建的环境:
    source activate tf
    激活后,我们的命令行开头就有环境名啦~说明此时我们正处于tf这个环境中:

    root@yy:~# source activate tf
    (tf) root@yy:~#
    
  8. 安装tensorflow-gpu
    在激活环境中输入(直接用pip太慢了,所以我后面加上了清华源链接:
    pip install tensorflow-gpu==1.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
    网不好的时候可能就会全红,就会像下面一样报错read timed out,没关系多安几次,总有网顺的时候:

    File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 576, in stream
    data = self.read(amt=amt, decode_content=decode_content)
    File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 541, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
    File "/home/yy/anaconda3/envs/tf/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
    File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 442, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
    pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pypi.tuna.tsinghua.edu.cn', port=443): Read timed out.
    

    安装完毕后,进入python再输入import tensorflow as tf测试下:

    (tf) root@yy:~# python
    Python 3.7.7 (default, May  7 2020, 21:25:33) 
    [GCC 7.3.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    Traceback (most recent call last):
      File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
        from tensorflow.python.pywrap_tensorflow_internal import *
      File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
        _pywrap_tensorflow_internal = swig_import_helper()
      File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
        _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
      File "/home/yy/anaconda3/envs/tf/lib/python3.7/imp.py", line 242, in load_module
        return load_dynamic(name, filename, file)
      File "/home/yy/anaconda3/envs/tf/lib/python3.7/imp.py", line 342, in load_dynamic
        return _load(spec)
    ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
    

    哇噢,报错了耶,不要捉鸡!先输入quit()退出python,
    再在命令行输入:
    ldconfig /usr/local/cuda-10.0/lib64
    结果如下:

    >>> quit()
    (tf) root@yy:~# ldconfig /usr/local/cuda-10.0/lib64
    (tf) root@yy:~# python
    Python 3.7.7 (default, May  7 2020, 21:25:33) 
    [GCC 7.3.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    >>> 
    

    呼~报错解除!此时我们查看下numpy的版本:

    >>> import numpy
    >>> numpy.__version__
    '1.19.1'
    

    好像版本太高啦,我们降低下版本:
    pip install -U numpy==1.16.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
    到这里就全部结束啦~

我跑下pointnet++康康
**做个小测试,只跑一个epoch

parser.add_argument('--num_point', type=int, default=1024, help='Point Number [default: 1024]')
parser.add_argument('--max_epoch', type=int, default=1, help='Epoch to run [default: 251]')
parser.add_argument('--batch_size', type=int, default=8, help='Batch Size during training [default: 16]')

very good!完全莫得问题!

(tf) root@yy:/media/yy/Data/ipython_jupyter/pointnet2123# python train.py
**** EPOCH 000 ****
2020-08-12 17:13:44.277590
---- batch: 050 ----
mean loss: 3.805058
accuracy: 0.127500
 ---- batch: 100 ----
mean loss: 3.299858
accuracy: 0.205000
.......这里太多了,省略掉.........
 ---- batch: 1200 ----
mean loss: 1.797384
accuracy: 0.492500
2020-08-12 17:18:01.698818
---- EPOCH 000 EVALUATION ----
eval mean loss: 1.345066
eval accuracy: 0.606969
eval avg class acc: 0.502087
Model saved in file: log/model.ckpt

你可能感兴趣的:(笔记,linux,cuda,ubuntu,tensorflow,python)