安装环境 ubuntu14.04LTS(官方使用版本)
环境准备
apt-get install g++
apt-get install git
官网的安装版本为:
下载地址:http://developer.amd.com/tools-and-sdks/archive/amd-core-math-library-acml/acml-downloads-resources/#download
acml-5-3-1-ifort-64bit.tgz
相关命令
tar -xzvf ./acml-5-3-1-ifort-64bit.tgz
sudo ./install-acml-5-3-1-ifort-64bit.sh
配置环境变量:
export ACML_FMA=0
export LD_LIBRARY_PATH=/opt/acml5.3.1/ifort64/lib:/opt/acml5.3.1/ifort64_mp/lib:$LD_LIBRARY_PATH
相关命令
wget https://www.open-mpi.org/software/ompi/v1.10/downloads/openmpi-1.10.1.tar.gz
tar -xzvf ./openmpi-1.10.2.tar.gz
cd openmpi-1.10.2
./configure --prefix=/usr/local/mpi
make -j all
sudo make install
配置环境变量
export PATH=/usr/local/mpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/mpi/lib:$LD_LIBRARY_PATH
首先停止xwindow
sudo stop lightdm
然后停止nouveau kernel driver
(ubuntu14.04 参考 http://askubuntu.com/a/451248)
修改 /etc/modprobe.d/blacklist-nouveau.conf(文件不存在的话创建它)
增加以下语句:
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
重启机器后还要禁用下xwindow
然后安装最新的显卡驱动
NVIDIA-Linux-x86_64-361.42.run
安装7.5的cuda驱动
注意会提示是否安装最新的显卡驱动,如果已经安装了就选择否
cuda_7.5.18_linux.run
配置环境变量
export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
在cuda的sample目录下进行测试
进入sample目录
cd ~/NVIDIA_CUDA-7.5_Samples/
并make
成功后调用如下命令
~/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery/deviceQuery
如果测试成功应该得到如下类似的显示:
/home/alexey/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro 600"
CUDA Driver Version / Runtime Version 8.0 / 7.5
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1016 MBytes (1065734144 bytes)
( 2) Multiprocessors, ( 48) CUDA Cores/MP: 96 CUDA Cores
GPU Max Clock rate: 1280 MHz (1.28 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 131072 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro 600
Result = PASS
sudo chmod +x ./gdk_linux_amd64_352_79_release.run
sudo ./gdk_linux_amd64_352_79_release.run
接受默认设置即可
wget https://github.com/NVlabs/cub/archive/1.4.1.zip
unzip ./1.4.1.zip
sudo cp -r cub-1.4.1 /usr/local
wget http://developer.download.nvidia.com/compute/redist/cudnn/v4/cudnn-7.0-linux-x64-v4.0-prod.tgz
tar -xzvf ./cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo mkdir /usr/local/cudnn-4.0
sudo cp -r cuda /usr/local/cudnn-4.0
配置环境变量
export LD_LIBRARY_PATH=/usr/local/cudnn-4.0/cuda/lib64:$LD_LIBRARY_PATH
需要的空间及时间比较大,暂时不安装,可参看github中的官方说明
apt-get install zlib1g-dev
wget http://nih.at/libzip/libzip-1.1.2.tar.gz
tar -xzvf ./libzip-1.1.2.tar.gz
cd libzip-1.1.2
./configure
make -j all
sudo make install
配置环境变量
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
不包括1bit-sgd code
git clone https://github.com/Microsoft/cntk
包括1bit-sgd code
git clone --recursive https://github.com/Microsoft/cntk/
编译代码
cd ~/Repos/cntk
mkdir build/release -p
cd build/release
../../configure --1bitsgd=yes
不加–1bitsgd=yes 则编译没有1bitsgd的版本
debug 版本的编译方法:
../../configure –with-buildtype=debug
没有报错的情况下运行
make -j all
export PATH=$HOME/Repos/cntk/build/release/bin:$PATH
cntk configFile=../Config/Simple.cntk
cntk configFile=../Config/Simple.cntk deviceId=auto &> out
cat out | grep Builder
Expected output in this case is:
SimpleNetworkBuilder = [
SimpleNetworkBuilder = [
SimpleNetworkBuilder Using GPU 0
cntk configFile=Config/rnn.cntk currentDirectory=Data deviceId=auto