BigDL安装和使用指导

 

0BigDL Building Guide

0.1 Create Ubuntu VM

Flavor:4vCPU 8GBMem 40GBDisk

0.2 Install Anaconda

Download latest version of Anaconda from:

https://www.anaconda.com/download/#linux

Note: choose latest version

 

Upload to the Ubuntu VM insides, andexecute the command below to install:

bash /opt/Anaconda/Anaconda2-5.0.1-Linux-x86_64.sh

 

Check version:

anaconda --version

anaconda Command line client (version1.6.5)

 

0.3 Install Java-JDK

conda install -c bioconda java-jdk

 

Check java version:

java -version

openjdk version "1.8.0_92"

OpenJDK Runtime Environment (Zulu8.15.0.1-linux64) (build 1.8.0_92-b15)

OpenJDK 64-Bit Server VM (Zulu8.15.0.1-linux64) (build 25.92-b15, mixed mode)

 

Edit /root/.bashrc, add the following javaenvironment variables:

exportJAVA_HOME=/opt/anaconda2/pkgs/java-jdk-8.0.92-1

exportPATH="/opt/anaconda2/bin:$PATH:$JAVA_HOME/bin"

0.4 Install Maven

Download Maven binary package from website:

https://maven.apache.org/download.cgi

And unzip it, and then edit /root/.bashrcby adding the following variables:

 

0.5 Install Scala

conda install -c bioconda scala

Note: chooselatest version in conda

 

0.6 Install Spark

conda install -c conda-forge pyspark=2.1.0

 

Check spark:

spark-shell --master local[2]

0.7 Install MXNet

conda install -c anaconda mxnet

 

Check by the following code:

root@bigdl:/opt/anaconda2# python

Python 2.7.14 |Anaconda custom (64-bit)| (default,Oct 16 2017, 17:29:19)

[GCC 7.2.0] on linux2

Type "help", "copyright","credits" or "license" for more information.

>>> import mxnet as mx

/opt/anaconda2/lib/python2.7/site-packages/urllib3/contrib/pyopenssl.py:46:DeprecationWarning: OpenSSL.rand is deprecated - you should use os.urandominstead

  import OpenSSL.SSL

>>> a = mx.nd.ones((2, 3))

>>> b = a * 2 + 1

>>> b.asnumpy()

array([[ 3., 3.,  3.],

       [3.,  3., 3.]], dtype=float32)

>>> quit()

 

Note: during the installation, MKL will beinstalled automatically.

 

0.8 Install TensorBoard

pip install tensorboard==1.0.0a6

0.9 Install git

conda install -c anaconda git

 

Check version:

git --version

git version 2.15.0

0.10 Install BigDL

0.10.1 Installationof dependencies

Install numpy:

conda install -c anaconda numpy

 

apt-get update

apt-get install -y python-setuptoolspython-dev

apt-get install -y gcc make

apt-get install -y zip

0.10.2 Installationof BigDL

pip install BigDL==0.4.0

Unzip the file, and set environmentvariable in /root/.bashrc:

exportBIGDL_HOME=/opt/anaconda2/lib/python2.7/site-packages/bigdl

export SPARK_HOME=/opt/anaconda2/lib/python2.7/site-packages/pyspark

export SPARK_DRIVER_MEMORY=2g

0.10.3 Verificationwith Python

python${BIGDL_HOME}/models/local_lenet/local_lenet.py

 

……

2018-01-19 17:27:04 INFO  LocalOptimizer$:267 - [Validation] 9728/10000Throughput is 11575.688088743565 record / sec

2018-01-19 17:27:04 INFO  LocalOptimizer$:267 - [Validation] 9856/10000Throughput is 12690.642198050957 record / sec

2018-01-19 17:27:04 INFO  LocalOptimizer$:267 - [Validation] 9984/10000Throughput is 13956.84521651483 record / sec

2018-01-19 17:27:04 INFO  LocalOptimizer$:267 - [Validation]10000/10000 Throughput is 12896.600697867305 record / sec

2018-01-19 17:27:04INFO  LocalOptimizer$:276 - Top1Accuracyis Accuracy(correct: 9693, count: 10000, accuracy: 0.9693)

[8 3 2 ..., 5 6 7]

 

Example code to verify if BigDL can runsuccessfully.

from bigdl.util.common import *

from pyspark import SparkContext

from bigdl.nn.layer import *

import bigdl.version

 

# create sparkcontext with bigdl configuration

sc = SparkContext.getOrCreate(conf=create_spark_conf().setMaster("local[*]"))

init_engine() # prepare the bigdl environment

bigdl.version.__version__ # Get the current BigDLversion

linear = Linear(2, 3) # Try to create a Linear layer

 

0.10.4 Verificationwith Scala

Use Interactive Spark Shell:

exportBigDL_JAR_PATH=${BIGDL_HOME}/share/lib/bigdl-0.4.0-jar-with-dependencies.jar

spark-shell --properties-file${BIGDL_HOME}/share/conf/spark-bigdl.conf --jars ${BigDL_JAR_PATH}

 

 

scala>importcom.intel.analytics.bigdl.utils.Engine

scala>Engine.init

scala>importcom.intel.analytics.bigdl.tensor.Tensor

scala>Tensor[Double](2,2).fill(1.0)

 

Run as a Spark Program

Download CIFAR-10 data from:

https://www.cs.toronto.edu/~kriz/cifar.html

Notes: choose the binary version

For example upload it to /tmp/cifar,meanwhile unzip it. Finally run it as:

# Spark local mode

spark-submit --driver-memory 2G --master local[2]--class com.intel.analytics.bigdl.models.vgg.Train${BIGDL_HOME}/share/lib/bigdl-0.4.0-jar-with-dependencies.jar -f/tmp/cifar/cifar-10-batches-bin -b 8

 

Run as a Local Java/Scala program

Before that, in order to build bigdl-0.4.0-jar-with-dependencies-and-spark.jar,you need to download source code of BigDL from https://github.com/intel-analytics/BigDL/tree/branch-0.4

Run the command below to build:

bash make-dist.sh

You can find the bigdl-0.4.0-jar-with-dependencies-and-spark.jarin path:

/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar

And then download MNIST data from:

http://yann.lecun.com/exdb/mnist/

and unzip them in the folder.

Finally run the command below to trainLeNet as local Java/Scala program:

scala -cp/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar\

com.intel.analytics.bigdl.example.lenetLocal.Train \

-f /tmp/scala-minist \

-c 4 \

-b 8 \

--checkpoint ./model

The above commands will cache the model inspecified path(--checkpoint). Run this command will use the trained model to doa validation.

scala -cp/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar\

com.intel.analytics.bigdl.example.lenetLocal.Test \

-f /tmp/scala-minist \

--model ./model/model.15001 \

-c 4 \

-b 8

Run below command to predict with trainedmodel

scala -cp/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar\

com.intel.analytics.bigdl.example.lenetLocal.Predict\

-f /tmp/scala-minist \

-c 4 \

--model ./model/model.15001

 

 

0.10.5 Verificationof LeNet5 model

Reference:

https://github.com/intel-analytics/BigDL/tree/master/pyspark/bigdl/models/lenet

 

Run the model by:

echo $BIGDL_HOME

echo $SPARK_HOME

export MASTER=local[*]

exportPYTHON_API_ZIP_PATH=${BIGDL_HOME}/share/lib/bigdl-0.4.0-python-api.zip

exportBigDL_JAR_PATH=${BIGDL_HOME}/share/lib/bigdl-0.4.0-jar-with-dependencies.jar

exportPYTHONPATH=${PYTHON_API_ZIP_PATH}:$PYTHONPATH

${SPARK_HOME}/bin/spark-submit \

   --master ${MASTER} \

   --driver-cores 2  \

   --driver-memory 2g  \

   --total-executor-cores 2  \

   --executor-cores 2  \

   --executor-memory 4g \

   --py-files${PYTHON_API_ZIP_PATH},${BIGDL_HOME}/models/lenet/lenet5.py  \

   --properties-file ${BIGDL_HOME}/share/conf/spark-bigdl.conf \

   --jars ${BigDL_JAR_PATH} \

   --conf spark.driver.extraClassPath=${BigDL_JAR_PATH} \

   --confspark.executor.extraClassPath=bigdl-0.4.0-jar-with-dependencies.jar \

   ${BIGDL_HOME}/models/lenet/lenet5.py \

   --action train \

   --dataPath /tmp/mnist

0.10.6 BigDLTutorials

Download BigDL-Tutorials-branch-0.4.zipfrom:

https://github.com/intel-analytics/BigDL-Tutorials

Upload it to /opt/intel-analytics and unzipit.

 

Install dependencies:

pip install numpy scipy pandas scikit-learnmatplotlib seaborn jupyter wordcloud

 

Start Jupyter by:

jupyter notebook --notebook-dir=/opt/intel-analytics/BigDL-Tutorials-branch-0.4--ip=* --no-browser --allow-root

 

Open http://{IP}:8888/?token=776c9d895e542e9caed7bf2342f08308632fe58563c17f54in web browser, and run the notebooks online, such as notebooks/neural_networks/cnn.ipynb.

 

你可能感兴趣的:(BigDL)