@author: dassein75325
@date: Nov/4/2018
https://www.geforce.cn/drivers
选择对应显卡,从而找到对应驱动,.run文件越新越好
(否则之后会遇见 CUDA driver version is insufficient for CUDA timer version)
此处我下载了:NVIDIA-Linux-x86_64-410.73.run,网址如下
https://www.geforce.cn/drivers/results/139110
#for case1: original driver installed by apt-get:
sudo apt-get remove --purge nvidia*
#for case2: original driver installed by runfile:
sudo chmod +x *.run
sudo ./NVIDIA-Linux-x86_64-390.87.run --uninstall
如果原驱动是用apt-get安装的,就用第1种方法卸载。
如果原驱动是用runfile安装的,就用–uninstall命令卸载NVIDIA-Linux-x86_64-390.87.run。
其实,用runfile安装的时候也会卸载掉之前的驱动,所以不手动卸载亦可。
sudo gedit /etc/modprobe.d/blacklist.conf
在文本最后添加:(禁用nouveau第三方驱动,之后也不需要改回来)
blacklist nouveau
options nouveau modeset=0
两种方法:
法1:
sudo service lightdm stop #这会关闭图形界面,但不用紧张
按Ctrl-Alt+F1
进入命令行界面,输入用户名和密码登录即可。
小提示:在命令行输入:sudo service lightdm start
,然后按Ctrl-Alt+F7
即可恢复到图形界面。
法2:(本人法1无法进入命令行界面)
cd /tmp
sudo rm -rf .X*
可以把关于X
的图像界面文件全部删除(每次重启都会再次生成,所以会在alt+ctrl+f1
界面下操作)
#给驱动run文件赋予执行权限:
sudo chmod +x NVIDIA-Linux-x86_64-384.59.run
#后面的参数非常重要,不可省略:
sudo ./NVIDIA-Linux-x86_64-384.59.run –no-x-check -no-nouveau-check -no-opengl-files
因为NVIDIA的驱动默认会安装OpenGL,
而Ubuntu的内核本身也有OpenGL、且与GUI显示息息相关,
一旦NVIDIA的驱动覆写了OpenGL,在GUI需要动态链接OpenGL库的时候就引起问题。
Driver测试:
nvidia-smi #若列出GPU的信息列表,表示驱动安装成功
nvidia-settings #若弹出设置对话框,亦表示驱动安装成功
Sun Nov 4 11:01:24 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.73 Driver Version: 410.73 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 950M Off | 00000000:01:00.0 Off | N/A |
| N/A 39C P0 N/A / N/A | 0MiB / 2004MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
ERROR: Unable to load info from any available system # nvidia-settings 不成功也行
cuda 必须使用 9.0,否则出现错误:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
问题:找不到cuda9.0的版本。
出现该错误的主要原因:cuda未安装或者cuda的版本有问题
这个错误在安装tensorflow时经常会出现,但是在官方的常见问题中没有提到, 如果你使用下面的方法没有解决的话,可以在评论中留言。
对于tensorflow 1.7版本,只接受cuda 9.0(9.1也不可以!),和cudnn 7.0,所以如果你安装了cuda9.1和cudnn7.1或以上版本,那么你需要重新安装9.0和7.0版本。
cudatoolkit下载网址:
https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
选择版本:
NVIDIA-Linux-x86_64-410.73.run => cudatoolkit
cuda下载网址:
https://conda.anaconda.org/pytorch/linux-64
选择版本:
cuda90-1.0-h6433d27_0.tar.bz2 => cuda
卸载之前的 cuda92
conda uninstall cudnn
#The following packages will be REMOVED:
# cudnn: 7.2.1-cuda9.2_0
conda uninstall cuda92
# The following packages will be REMOVED:
# cuda92: 1.0-0 pytorch
conda uninstall cudatoolkit
# The following packages will be REMOVED:
# cudatoolkit: 9.2-0
安装:
cd ~/Downloads
sudo sh cuda_9.0.176_384.81_linux.run
# 已经安装了 cuda toolkit
安装完正确的版本后,确认你在你的/.bashrc(或者/.zshrc)文件中
sudo gedit /root/.bashrc
加入了下面环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64
export PATH=$PATH:/usr/local/cuda-9.0/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-9.0
然后更新
sudo /root/.bashrc
conda uninstall cuda90
conda uninstall cudatoolkit
cd ~/Downloads
conda install cuda90-1.0-h6433d27_0.tar.bz2
nvcc -V
结果:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
conda list |grep cuda
结果:
cuda90 1.0 h6433d27_0
cudatoolkit 9.0 h13b8566_0
conda uninstall cudnn
#The following packages will be REMOVED:
# cudnn: 7.2.1-cuda9.2_0
之前卸载CUDA时,已经操作过了
理由 同安装 CUDA 9.0
网站:
https://developer.nvidia.com/rdp/cudnn-archive
选择:
Download cuDNN v7.0.4 (Nov 13, 2017), for CUDA 9.0的
cuDNN v7.0.4 Library for OSX
https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.4/prod/9.0_20171031/cudnn-9.0-osx-x64-v7
下载后得:
cudnn-9.0-osx-x64-v7.tgz
解压得cuda 文件夹
cd ~/Downloads
tar xvfz cudnn-9.0-osx-x64-v7.tgz
cd ~/Downloads/cuda # 进入cuda 文件夹
sudo cp ~/Downloads/cuda/include/cudnn.h /usr/local/cuda-9.0/include/
sudo cp ~/Downloads/cuda/lib/libcudnn* /usr/local/cuda-9.0/lib64/ -d
sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h
sudo chmod a+r /usr/local/cuda-9.0/lib64/libcudnn*
conda install yaml # 否则 在mmdection 的python setup.py install 会有 fatal error
下载对应的 pytorch & torchvision 的 网站:
https://conda.anaconda.org/pytorch/linux-64
选择 pytorch (必须是 0.4.1, 配套py3 .6,cuda9.0 否则 mmdetection 例程报错)
pytorch-0.4.1-py36_cuda9.0.176_cudnn7.1.2_1.tar.bz2
https://conda.anaconda.org/pytorch/linux-64/pytorch-0.4.1-py36_cuda9.0.176_cudnn7.1.2_1.tar.bz2
选择 torchvision:
torchvision-0.2.1-py36_1.tar.bz2
https://conda.anaconda.org/pytorch/linux-64/torchvision-0.2.1-py36_1.tar.bz2
cd ~/Downloads
conda install pytorch-0.4.1-py36_cuda9.0.176_cudnn7.1.2_1.tar.bz2
conda install torchvision-0.2.1-py36_1.tar.bz2
conda list | grep cuda
结果:
# 我之前装过 nccl2_2, 故有如此
# conda install pytorch-0.3.1-py36_cuda9.0.176_cudnn7.0.5_nccl2_2.tar.bz2
# conda uninstall pytorch
# conda uninstall cudnn
# conda uninstall cuda
# conda uninstall cuda90
# 之后我才安装cuda & cudatoolkit
cuda90 1.0 h6433d27_0
cudatoolkit 9.0 h13b8566_0
cudnn 7.1.2 cuda9.0_0
nccl 1.3.5 cuda9.0_0
pytorch 0.4.1 py36_cuda9.0.176_cudnn7.1.2_1 file:///home/dassein/Downloads
找到官方网址
https://github.com/open-mmlab/mmdetection
git clone https://github.com/open-mmlab/mmdetection.git
dassein@pad:~/mmdetection$ cd ~/mmdetection
dassein@pad:~/mmdetection$ conda create -n open-mmlab python=3.6 -y
dassein@pad:~/mmdetection$ source activate open-mmlab
(open-mmlab) dassein@pad:~/mmdetection$ conda install -c pytorch pytorch torchvision -y
(open-mmlab) dassein@pad:~/mmdetection$ conda install cython -y
(open-mmlab) dassein@pad:~/mmdetection$ ./compile.sh
(open-mmlab) dassein@pad:~/mmdetection$ python setup.py install
(open-mmlab) dassein@pad:~/mmdetection$ python test_image.py
在python test_image.py 之前先在 mmdetection 内放入 test.jpg, test1.jpg, test2.jpg 三张有人的图
下载好 faster-rcnn 的pth文件,放入 mmdetection 文件,网址如下:
https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth
test_image.py
import mmcv
from mmcv.runner import load_checkpoint
from mmdet.models import build_detector
from mmdet.apis import inference_detector, show_result
cfg = mmcv.Config.fromfile('/home/dassein/mmdetection/configs/faster_rcnn_r50_fpn_1x.py')
cfg.model.pretrained = None
# construct the model and load checkpoint
model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
# _ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth')
_ = load_checkpoint(model, 'faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth')
# test a single image
img = mmcv.imread('test.jpg')
result = inference_detector(model, img, cfg)
show_result(img, result)
# test a list of images
imgs = ['test1.jpg', 'test2.jpg']
for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')):
print(i, imgs[i])
show_result(imgs[i], result)
运行
python test_image.py
后,每摁一次0
,就会依次出现标记好带方框的 test.jpg, test1.jpg, test2.jpg 三张有人的图
# mmcv 库 被mmdetection 使用: import mmcv
# show_result(img, result) 通过查mmcv.pdf手册
# show_result() 在 ~/mmcv/visualization/image.py 中
# show_result() 调用了 自定义的 inshow()
def imshow(img, win_name='', wait_time=0):
"""Show an image.
Args:
img (str or ndarray): The image to be displayed.
win_name (str): The window name.
wait_time (int): Value of waitKey param.
"""
cv2.imshow(win_name, imread(img))
cv2.waitKey(wait_time)
# 所以每摁一次0,就会依次出现下一张图