安装MXNET:
http://mxnet.io/get_started/setup.html
问题查找可以首先考虑 github issues.
If you are running Python on Amazon Linux or Ubuntu, you can use Git Bash scripts to quickly install the MXNet libraries and all dependencies. If you are using other languages or operating systems, skip to Standard Installation.(如果是用python运行并且安装在ubuntu/Amazon Linux上,可以使用Git Bash脚本来快速安装;其他的按照标准方式安装)
Quick Installation on ubuntu:
git clone https://github.com/dmlc/mxnet.git ~/MXNet/mxnet --recursive
cd ~/MXNet/mxnet/setup-utils
bash install-mxnet-ubuntu.sh
Standard Installation:
Minimum Requirements
You must have the following:
Build MXNet on Ubuntu/DebianOn Ubuntu versions 13.10 or later, you need the following dependencies:* Git (to pull code from GitHub)* libatlas-base-dev (for linear algebraic operations)* libopencv-dev (for computer vision operations)Install these dependencies using the following commands:```bashsudo apt-get updatesudo apt-get install -y build-essential git libatlas-base-dev libopencv-dev
After you have downloaded and installed the dependencies, use the following commands to pull the MXNet source code from Git and build MXNet:
git clone --recursive https://github.com/dmlc/mxnetcd mxnet; make -j$(nproc)
从安装的命令中可以看出要安装的软件如下:
libatlas-base-devlibopencv-dev
如果是ubuntu系统,上面安装出问题的话,可以一步步安装:
sudo apt-get update
sudo apt-get install -y build-essential git libatlas-base-dev libopencv-dev
git clone --recursive https://github.com/dmlc/mxnet
cd mxnet
make -j4
sudo apt-get install python-numpy# for debian
sudo apt-get install python-setuptools# for debian
cd python; sudo python setup.py install
// 这里不建议使用sudo
// 注意sudo是把install的东西安装到了root用户的python环境变量里,这里的一个坑就是在当前非root用户下执行python后,本地安装是装在anaconda的python环境下,import mxnet后报no module named mxnet的错。切换到root用户(python路径:/usr/bin/python),python可以正常执行import mxnet.
安装完后在python 的lib目录中会发现: ./python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet的目录。
测试是否安装成功:
importmxnetasmx
mxnet依赖opencv,安装opencv的时候可能依赖很多其他库
安装opencv依赖问题
sudo apt-get install -y build-essential git libblas-dev libopencv-dev
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
build-essential 已经是最新的版本了。
build-essential 被设置为手动安装。
有一些软件包无法被安装。如果您用的是 unstable 发行版,这也许是
因为系统无法达到您要求的状态造成的。该版本中可能会有一些您需要的软件
包尚未被创建或是它们已被从新到(Incoming)目录移出。
下列信息可能会对解决问题有所帮助:
下列软件包有未满足的依赖关系:
libopencv-dev : 依赖: libopencv-objdetect-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libopencv-highgui-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libopencv-legacy-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libopencv-contrib-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libopencv-videostab-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libopencv-superres-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libopencv-ocl-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libcv-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libhighgui-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
依赖: libcvaux-dev (= 2.4.8+dfsg1-2ubuntu1) 但是它将不会被安装
E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。
如果遇到上面的错误,就需要手工安装OpenCV,参考下面。
Centos安装Mxnet
(这里主要是centos6.5, centos7会好安装一些):
Issues: https://github.com/dmlc/mxnet/issues/3324
https://github.com/dmlc/mxnet/issues/1303
https://github.com/dmlc/mxnet/issues/1125
centos安装,官方提供的例子是apt-get,适用于debian系列linux,centos不适用,install-mxnet-ubuntu.sh中是一些apt-get命令。
在centos系统上,安装会比ubuntu系统困难些,文档比较少,参考issues:https://github.com/dmlc/mxnet/issues/1303
rz命令安装(如果未安装):yum -y install lrzsz
问题:初次安装会遇到依赖问题,执行bash install-mxnet-ubuntu.sh后:
Setting up Install Process
No package build-essential available.
No package libatlas-base-dev available.
No package libopencv-dev available.
Error: Nothing to do
问题:
ubuntu和centos问题(apt-get / yum),建议centos版本》=6.5
首先尝试使用yum来安装opencv:
sudo yum install atlas-devel opencv
sudo yum install opencv-devel
可以尝试使用下面的过程:
yum update
yum install -y build-essential git libatlas-base-dev libopencv-dev
yum install -y opencv opencv-devel atlas-devel
yum install gcc gcc-g++
ldconfig /etc/ld.so.cache
git clone --recursive https://github.com/dmlc/mxnet
cd mxnet
./prepare_mkl.sh
cp make/config.mk .
vim config.mk
+31 ADD_LDFLAGS = -L/usr/lib64/atlas
vim mshadow/make/mshadow.mk
-68 MSHADOW_LDFLAGS += -lcblas
+68 MSHADOW_LDFLAGS += -lsatlas
yum info glib2
yum upgrade glib2
make -j4
[root@xdataimg2 mxnet]# ll lib
总用量 38920
-rw-r--r-- 1 root root 28637318 11月 12 12:32 libmxnet.a
-rwxr-xr-x 1 root root 11214217 11月 12 12:32 libmxnet.so
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
pip install numpy-i http://pypi.mirrors.ustc.edu.cn/simple --trusted-host pypi.mirrors.ustc.edu.cn
pip install scipy -i http://pypi.mirrors.ustc.edu.cn/simple --trusted-host pypi.mirrors.ustc.edu.cn
cd mxnet/python
python setup.py install
源码安装opencv
参考:http://blog.csdn.net/kuaile123/article/details/20870731
首先安装opencv依赖:
yum install cmake gcc gcc-c++ gtk+-devel gimp-devel gimp-devel-tools gimp-help-browser zlib-devel libtiff-devel libjpeg-devel libpng-devel gstreamer-devel libavc1394-devel libraw1394-devel libdc1394-devel jasper-devel jasper-utils swig Python libtool nasm
sudo yum install opencv-devel
sudo yum install atlas-devel
// or sudo yum install atlas-devel opencv
yum install cmake
在OpenCV官网http://sourceforge.net/projects/opencvlibrary/files/ 下载所需版本,解压。
cd OpenCV-2.4.10
cmake CMakeLists.txt
make & make install
make的时候可能会报错:
Linking CXX executable ../../bin/opencv_perf_core
../../lib/libopencv_highgui.so.2.4.10: undefined reference to `png_set_longjmp_fn'
collect2: error: ld returned 1 exit status
G++版本:
查看g++版本:
g++ --version / g++ -v
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
显然g++比要求的版本是要低的,需要升级.
升级GCC/G++(两个是在一起的):
下载地址:http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-4.8.5/
wget http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-4.8.5/gcc-4.8.5.tar.gz
tar -jxvf gcc-4.8.2.tar.bz2
cd gcc-4.8.2
./contrib/download_prerequisites
mkdir build
mxnet on yarn:
dmlc-submit --mode
待测: dmlc-submit -h
--cluster string, {'mpi', 'yarn', 'local', 'sge'}, default to ${DMLC_SUBMIT_CLUSTER}
Job submission mode.
--num-workers integer, required
Number of workers in the job.
--num-servers` integer, default=0
Number of servers in the job.
--worker-cores integer, default=1
Number of cores needed to be allocated for worker job.
--server-cores integer, default=1
Number of cores needed to be allocated for server job.
--worker-memory string, default='1g'
Memory needed for server job.
--server-memory string, default='1g'
Memory needed for server job.
--jobname string, default=auto specify
Name of the job.
--queue string, default='default'
The submission queue we should submit the job to.
--log-level string, {INFO, DEBUG}
The logging level.
--log-file string, default='None'
Output log to the specific log file, the log is still printed on stderr.
tracker]# ./dmlc-submit --cluster=yarn --num-workers=2 --worker-cores=1 --num-servers=2 ../../example/image-classification/train_mnist.py
source activate ml2
hdfs dfs -put train-* /tmp/mnist
hdfs dfs -chomd -R 777 /tmp/mnist
tools/launch.py -n 2 --launcher yarn python train_mnist.py --data-dir hdfs:///tmp/mnist/cd build
../configure --enable-checking=release --enable-languages=c,c++ --disable-multilib
报错:configure: error: Building GCC requires GMP 4.2+, MPFR 2.4.0+ and MPC 0.8.0+.
参考:http://blog.csdn.net/ivanlxf/article/details/19080681
执行./contrib/download_prerequisities脚本会自动下载三个依赖库别为gmp-4.3.2、mpfr-2.4.2、mpc-0.8.1,也可以通过如下地址离线下载安装:
(1)安装gmp:
wget ftp://ftp.gnu.org/gnu/gmp/gmp-4.3.2.tar.bz2
tar -jxf gmp-4.3.2.tar.bz2
cd gmp-4.3.2
mkdir build
cd build
../configure --prefix=/usr/local/gcc/gmp-4.3.2
make && make install
(2)安装mpfr
wget http://www.mpfr.org/mpfr-2.4.2/mpfr-2.4.2.tar.bz2
tar -jxf mpfr-2.4.2.tar.bz2
mkdir build
cd build
../configure --prefix=/usr/local/gcc/mpfr-2.4.2 --with-gmp=/usr/local/gcc/gmp-4.3.2
make && make install
(3)安装mpc
wget http://www.multiprecision.org/mpc/download/mpc-0.8.1.tar.gz
tar zxvf mpc-0.8.1.tar.gz
mkdir build
cd build
../configure --prefix=/usr/local/gcc/mpc-0.8.1 --with-mpfr=/usr/local/gcc/mpfr-2.4.2 --with-gmp=/usr/local/gcc/gmp-4.3.2
make && make install
(4)添加共享库路径,su到root编辑ld.so.conf文件,添加如下内容到文件中:
编辑ld.so.conf文件,添加如下内容到文件中:
/usr/local/gcc/gmp-4.3.2/lib
/usr/local/gcc/mpfr-2.4.2/lib
/usr/local/gcc/mpc-0.8.1/lib
保存退出,执行ldconfig命令
继续执行gcc的configure,依然报上面的错误,手工指定上面三个库的路径:
../configure --prefix=/usr/local/gcc --enable-threads=posix --disable-checking --enable-languages=c,c++ --disable-multilib --with-gmp=/usr/local/gcc/gmp-4.3.2 --with-mpfr=/usr/local/gcc/mpfr-2.4.2 --with-mpc=/usr/local/gcc/mpc-0.8.1
通过之后,执行 make && make install (等待时间比较长)
(5)卸载旧的,配置新的:
yum remove gcc
yum remove gcc-c++
updatedb
cd /usr/bin // gcc,g++所在路径,可以通过which g++查看
ln -s /usr/local/gcc/bin/gcc gcc
ln -s /usr/local/gcc/bin/g++ g++
Clang安装
sudo yum install clang
源码安装mxnet:
git clone --recursivehttps://github.com/dmlc/mxnet
cd mxnet;
cp make/config.mk .
make -j4
sudo yum install python-numpy# for redhat
cd python; sudo python setup.py install
import mxnet as mx
make报错:
/usr/local/include/c++/4.8.0/condition_variable:83:5: note: no known conversion for implicit ‘this’ parameter from ‘const std::condition_variable*’ to ‘std::condition_variable*’
经查(https://github.com/dmlc/mxnet/issues/530)是由于gcc版本过低引起的,升级gcc参考上面。
报错:
/usr/bin/ld: cannot find -lcblas
collect2: error: ld returned 1 exit status
make: *** [lib/libmxnet.so] Error 1
确保安装了cblas和atlas
相关issues:https://github.com/dmlc/mxnet/issues/1442
报错:
checking whether the C compiler works... no
configure: error: in `/root/App/MXNet/mxnet/ps-lite/protobuf-2.5.0':
configure: error: C compiler cannot create executables
See `config.log' for more details
make[1]: *** [/root/App/MXNet/mxnet/deps/include/google/protobuf/message.h] Error 77
make[1]: Leaving directory `/root/App/MXNet/mxnet/ps-lite'
make: *** [PSLITE] Error 2
Run MxNet on yarn:
http://mxnet.io/how_to/cloud.html
官方文档中的描述:
Use YARN, MPI, SGE
While ssh can be simple for cases when we do not have a cluster scheduling framework. MXNet is designed to be able to port to various platforms. We also provide other scripts in tracker to run on other cluster frameworks, including Hadoop(YARN) and SGE. Your contribution is more than welcomed to provide examples to run MXNet on your favourite distributed platform.
mxnet on yarn:
dmlc-submit --mode
待测: dmlc-submit -h
--cluster string, {'mpi', 'yarn', 'local', 'sge'}, default to ${DMLC_SUBMIT_CLUSTER}
Job submission mode.
--num-workers integer, required
Number of workers in the job.
--num-servers` integer, default=0
Number of servers in the job.
--worker-cores integer, default=1
Number of cores needed to be allocated for worker job.
--server-cores integer, default=1
Number of cores needed to be allocated for server job.
--worker-memory string, default='1g'
Memory needed for server job.
--server-memory string, default='1g'
Memory needed for server job.
--jobname string, default=auto specify
Name of the job.
--queue string, default='default'
The submission queue we should submit the job to.
--log-level string, {INFO, DEBUG}
The logging level.
--log-file string, default='None'
Output log to the specific log file, the log is still printed on stderr.
tracker]# ./dmlc-submit --cluster=yarn --num-workers=2 --worker-cores=1 --num-servers=2 ../../example/image-classification/train_mnist.py
source activate ml2
hdfs dfs -put train-* /tmp/mnist
hdfs dfs -chomd -R 777 /tmp/mnist
tools/launch.py -n 2 --launcher yarn python train_mnist.py --data-dir hdfs:///tmp/mnist/
mxnet on multiple cpus:
http://mxnet.io/how_to/multi_devices.html
在一台机器上跑可以直接运行: python train_mnist.py --network lenet
前提: 所有机器都编译通过并且安装了mxnet,并且机器之间可以通过ssh连接。
cd mxnet/example/image-classification
echo "192.168.177.77" >> hosts //当前机器192.168.177.78,
../../tools/launch.py -n2 --launcher ssh -H hosts python train_mnist.py --network lenet --kv-store dist_sync
note that:
use launch.py to submit the job
效果对比:
77,78两台机器上跑:
单台机器:
python train_mnist.py --network lenet
19:26:27-20:44:57 78minutes
两台机器:
../../tools/launch.py -n 2 --launcher ssh -H hosts python train_mnist.py --network lenet --kv-store dist_sync
18:43:51-19:22:53 39minutes