工作需要,在已经预装了Windows10的工作站,需要再安装ubuntu。因为工作站本身有两块硬盘,所以准备空出一个装ubuntu,这样两个系统互不干扰,不使用对方的硬盘空间。工作站装里有两块Nvidia 1080TI,导致后续安装ubuntu有一些需要注意的问题,下文详述。
Ubuntu 18.04 下深度学习环境搭建
借了一个烧录好的ubuntu16的u盘,可惜的是安装中出现了各种问题,进入不了安装界面,找不到硬盘等等。
于是重新下了18.04 desktop lts 的镜像,官网list里找了以下这个国内的源,上海交大的,下载速度还可以。工作电脑没有中文输入,下文部分英文,以后有时间再翻译一下。
--------------------------------------------------------------------------------------------------------------------------
UEFI introduced Install Ubuntu 18.04 LTS desktop
Step1: download 18.04 lts desktop image fromhttp://ftp.sjtu.edu.cn/ubuntu-cd/18.04/
Step2: download UltraISO trail version and burn your image to an fresh USB
Step3: Turnoff your secure boot and fast startup options in BIOS and control panel respectively
* Step4: Reboot and use F12 to go into one-time boot options
[Trick] for Nvidia graphic card]Step5: Select second option *Install ubuntu, press e, modify apci =off and press F10 to go into install
[Trick] If your screen stuck at
/dev/sda1 contains a file system with errors, check forced.
(initramfs)_
Input command:fsck /dev/sda1 then enter y when prompted to perform fixes, then input reboot if it doesn't do so automatically
Assuming everything works well you would have ubuntu 18.04 on your introduction screen when boot, if it doesn't meaning your UEFI file is not working, you need to download easyUEFI to repair, don't download BCD because it is not free in commercial environment.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Ubuntu 18.04 setting up for Deep learning environment
Important Note, when setup the environment you need to know exactly which system, graphic card, CUDA, CUDNN you will install because everything is dependent. Don't start installing without figuring out which system and CUDA and CUDNN you need, you may have to go back and forth if you do not have a plan, it is painful!
Prerequisite:
nvidia-smitosee if you have install NVIDIA drivers in your software center, take a look at driver version, if your already got results this could be good or bad. If you want to install CUDA 10 but your driver version is older than 400, unfortunately you have to remove all the driver and download the new driver and reinstall!
Step1. Figure out which graphic card you have, for me, Geforce GTX 1080TI, go to the following website to get your driver, this is essential for success for the following install!!!! Don't just use a random blog's command to install random driver, it is much easier to use apt-get but the version might be wrong for your graphic card or system or CUDA!!!!
*****If you have unfortunately installed the wrong driver, here is the post to help you reinstall**********
Step A. Remove nvidia driver by following command
$ sudo apt-get purge nvidia*
$ sudo apt-get autoremove
Step B. Reboot to go to the secure mode, without opening X, because X is also using nvidia thus when you try to install driver, it will say some nvidia stuff is loaded and could not install
In secure mode select root shell, in the root shell install your downloaded driver
*******************************************************************
Step 2, Download CUDA and install, for my ubuntu18.04, CUDA 10.1(注意至今2019/4/7最新版的tensorflow仍然只支持CUDA10.0,所以如果你是tensorflow 用户请使用CUDA10.0,并且你的driver version不要是最新的,得是如上图所示如果是pytorch用户CUDA10.1我试过是可以的)
download your .run file, cd to the downloaded folder, do
sudo chmod +x cuda_10.1_linux.run
./cuda_9.0.176_384.81_linux.run --override
sudo apt-get install nvidia-384 nvidia-modprobe
Step 2 install CUDNN
In my case, I need CUDNN 7.5 which is made for CUDA 10.1
Installation guide can be found in the following link, ignore the last step, just copy files to corresponding folders will be fine.
*[Trick] When testing CUDNN, Error may occur:
CUDA driver version is insufficient for CUDA runtime version, congratulations, this means your system/ graphic card/ CUDA driver/ CUDA/CUDNN must have some version inconsistent.
I would give you some encouragement by saying this, let's have a look at the the top of this post and install the environment again, this is also what I did and my motivation to record the process in this post.
sudo apt install python3-pip
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Optional install:
Sougou pinyin
Issue: Couldn't open sougou web page so can not download .deb file, Stuck, need to get it from my PC
Step1 : In terminal, type command
$sudo apt remove fcitx* && sudo apt autoremove
Windows10 下深度学习环境搭建
与在unbuntu下搭建深度学习环境(tensorflow)一样,由于tensorflow各个版本所需的CUDA driver +CUDA +CUDNN版本有特殊需求(这里不得不吐槽一下这个tensorflow版本控制做的稀烂,各种向上向下不兼容,很多人都是装到最后一步,测试tensorflow代码的时候,发现运行不了tensorflow, 要么找不到CUDA.xx.dll,要么specific module不能import。)在windows10里,我测试了两种安装方法,docker安装和一步步自己安装。
从底层开始一步一步安装
安装CUDA
如果要自己一步步装tensorflow,推荐的是装CUDA 9.0, CUDA10.0这种大版本,但是也不能保证一定不出问题。为了记录下遇到各种问题的解决方法,我特意装了CUDA 9.2,下文详述如何安装。所需的文件我会共享到百度网盘里去,方便大家下载(待更新)。如果想自己下载可以看上文linux安装里,给出了driver,cuda,cudnn的下载链接,不过你需要自己找到版本。
1. CUDA Driver (398.82-desktop-win10-64bit-international-whql.exe)
要安装CUDA 9.2,需要安装对应的CUDA driver,见上文中的TABLE 1,由于我的CUDA 9.2下的是148版本,在Windows下需要398.82的driver版本。安装是否成功可以在命令行里用nvidia-smi命令确定,如果找不到这个命令,到你安装CUDA driver的文件夹里去找到这个exe程序添加到系统path变量里去。
如果看到你的显卡则安装成功,比如我有俩1080TI,则输出如下
2. CUDA (cuda_9.2.148_win10_network.exe)
装CUDA 之前要安装Visual Studio 2015,这个版本比较保险,如果你想使用Visual Studio 2017, 在安装CUDA 9.2时,自定义安装里不要选择与Visual Studio相关的子选项,不然你的CUDA会安装失败(我从网上查了好几,说这个已经broken for ages,所以其实我也不确定2015的就可以勾选。测试CUDA有没有安装完成,可以在CMD里使用nvcc -V,可以查看你安装的CUDA版本。
3 CUDNN(cudnn-9.2-windows10-x64-v7.5.1.10)
把文件夹里的各个文件copy到对应的cuda文件夹里,并把几个folder的路径也加入到path中去,你的cudnn安装就完成了(见上图)。
安装Tensorflow-gpu
我尝试用pip 安装了tensorflow-gpu的各个官方版本,1.10 -1.18装了个遍,然后在CMD里用 python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))" 测试安装成功与否,出现了找不到cuda9.0.dll,can't import module 等各种错误,主要原因就是CUDA不是大版本比如9.0或者10.0这样的,各种版本不兼容。这种情况下可能需要build from source,但是步骤很麻烦。所以给出一个别人build好tensorflow各种版本wheel的网址,大家可以根据自己的cuda版本和python版本及所需要的tensorflow版本自由选择。
由于我是CUDA9.2, python3.6.6版本的,希望使用的是tensorflow任一GPU版本,所以选择了
tensorflow_gpu-1.9.0-cp36-cp36m-win_amd64.whl 下载到本地,pip安装这个wheel后测试tensorflow成功。
一个容器一锅端
从说明来看,其实tensorflow-gpu不能在Windows系统上用docker,因为要启动NVIDIA GPU的docker容器,需要安装nvidia-docker,然后nvidia-docker目前仅适用于Linux,但是就我来说,我还是下载了Windows下的tensorflow-gpu待jupyter notebook的docker,并且测试成功了。我不确定是什么情况,是不是效率会低,因为还没有真正训练过一个模型,那么目前先写到这里,全测试完成后再来更新。
未完待续