Flavor:4vCPU 8GBMem 40GBDisk
Download latest version of Anaconda from:
https://www.anaconda.com/download/#linux
Note: choose latest version
Upload to the Ubuntu VM insides, andexecute the command below to install:
bash /opt/Anaconda/Anaconda2-5.0.1-Linux-x86_64.sh
Check version:
anaconda --version
anaconda Command line client (version1.6.5)
conda install -c bioconda java-jdk
Check java version:
java -version
openjdk version "1.8.0_92"
OpenJDK Runtime Environment (Zulu8.15.0.1-linux64) (build 1.8.0_92-b15)
OpenJDK 64-Bit Server VM (Zulu8.15.0.1-linux64) (build 25.92-b15, mixed mode)
Edit /root/.bashrc, add the following javaenvironment variables:
exportJAVA_HOME=/opt/anaconda2/pkgs/java-jdk-8.0.92-1
exportPATH="/opt/anaconda2/bin:$PATH:$JAVA_HOME/bin"
Download Maven binary package from website:
https://maven.apache.org/download.cgi
And unzip it, and then edit /root/.bashrcby adding the following variables:
conda install -c bioconda scala
Note: chooselatest version in conda
conda install -c conda-forge pyspark=2.1.0
Check spark:
spark-shell --master local[2]
conda install -c anaconda mxnet
Check by the following code:
root@bigdl:/opt/anaconda2# python
Python 2.7.14 |Anaconda custom (64-bit)| (default,Oct 16 2017, 17:29:19)
[GCC 7.2.0] on linux2
Type "help", "copyright","credits" or "license" for more information.
>>> import mxnet as mx
/opt/anaconda2/lib/python2.7/site-packages/urllib3/contrib/pyopenssl.py:46:DeprecationWarning: OpenSSL.rand is deprecated - you should use os.urandominstead
import OpenSSL.SSL
>>> a = mx.nd.ones((2, 3))
>>> b = a * 2 + 1
>>> b.asnumpy()
array([[ 3., 3., 3.],
[3., 3., 3.]], dtype=float32)
>>> quit()
Note: during the installation, MKL will beinstalled automatically.
pip install tensorboard==1.0.0a6
conda install -c anaconda git
Check version:
git --version
git version 2.15.0
Install numpy:
conda install -c anaconda numpy
apt-get update
apt-get install -y python-setuptoolspython-dev
apt-get install -y gcc make
apt-get install -y zip
pip install BigDL==0.4.0
Unzip the file, and set environmentvariable in /root/.bashrc:
exportBIGDL_HOME=/opt/anaconda2/lib/python2.7/site-packages/bigdl
export SPARK_HOME=/opt/anaconda2/lib/python2.7/site-packages/pyspark
export SPARK_DRIVER_MEMORY=2g
python${BIGDL_HOME}/models/local_lenet/local_lenet.py
……
2018-01-19 17:27:04 INFO LocalOptimizer$:267 - [Validation] 9728/10000Throughput is 11575.688088743565 record / sec
2018-01-19 17:27:04 INFO LocalOptimizer$:267 - [Validation] 9856/10000Throughput is 12690.642198050957 record / sec
2018-01-19 17:27:04 INFO LocalOptimizer$:267 - [Validation] 9984/10000Throughput is 13956.84521651483 record / sec
2018-01-19 17:27:04 INFO LocalOptimizer$:267 - [Validation]10000/10000 Throughput is 12896.600697867305 record / sec
2018-01-19 17:27:04INFO LocalOptimizer$:276 - Top1Accuracyis Accuracy(correct: 9693, count: 10000, accuracy: 0.9693)
[8 3 2 ..., 5 6 7]
Example code to verify if BigDL can runsuccessfully.
from bigdl.util.common import *
from pyspark import SparkContext
from bigdl.nn.layer import *
import bigdl.version
# create sparkcontext with bigdl configuration
sc = SparkContext.getOrCreate(conf=create_spark_conf().setMaster("local[*]"))
init_engine() # prepare the bigdl environment
bigdl.version.__version__ # Get the current BigDLversion
linear = Linear(2, 3) # Try to create a Linear layer
Use Interactive Spark Shell:
exportBigDL_JAR_PATH=${BIGDL_HOME}/share/lib/bigdl-0.4.0-jar-with-dependencies.jar
spark-shell --properties-file${BIGDL_HOME}/share/conf/spark-bigdl.conf --jars ${BigDL_JAR_PATH}
scala>importcom.intel.analytics.bigdl.utils.Engine
scala>Engine.init
scala>importcom.intel.analytics.bigdl.tensor.Tensor
scala>Tensor[Double](2,2).fill(1.0)
Run as a Spark Program
Download CIFAR-10 data from:
https://www.cs.toronto.edu/~kriz/cifar.html
Notes: choose the binary version
For example upload it to /tmp/cifar,meanwhile unzip it. Finally run it as:
# Spark local mode
spark-submit --driver-memory 2G --master local[2]--class com.intel.analytics.bigdl.models.vgg.Train${BIGDL_HOME}/share/lib/bigdl-0.4.0-jar-with-dependencies.jar -f/tmp/cifar/cifar-10-batches-bin -b 8
Run as a Local Java/Scala program
Before that, in order to build bigdl-0.4.0-jar-with-dependencies-and-spark.jar,you need to download source code of BigDL from https://github.com/intel-analytics/BigDL/tree/branch-0.4
Run the command below to build:
bash make-dist.sh
You can find the bigdl-0.4.0-jar-with-dependencies-and-spark.jarin path:
/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar
And then download MNIST data from:
http://yann.lecun.com/exdb/mnist/
and unzip them in the folder.
Finally run the command below to trainLeNet as local Java/Scala program:
scala -cp/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar\
com.intel.analytics.bigdl.example.lenetLocal.Train \
-f /tmp/scala-minist \
-c 4 \
-b 8 \
--checkpoint ./model
The above commands will cache the model inspecified path(--checkpoint). Run this command will use the trained model to doa validation.
scala -cp/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar\
com.intel.analytics.bigdl.example.lenetLocal.Test \
-f /tmp/scala-minist \
--model ./model/model.15001 \
-c 4 \
-b 8
Run below command to predict with trainedmodel
scala -cp/opt/intel-analytics/BigDL-branch-0.4/spark/dl/target/bigdl-0.4.0-jar-with-dependencies-and-spark.jar\
com.intel.analytics.bigdl.example.lenetLocal.Predict\
-f /tmp/scala-minist \
-c 4 \
--model ./model/model.15001
Reference:
https://github.com/intel-analytics/BigDL/tree/master/pyspark/bigdl/models/lenet
Run the model by:
echo $BIGDL_HOME
echo $SPARK_HOME
export MASTER=local[*]
exportPYTHON_API_ZIP_PATH=${BIGDL_HOME}/share/lib/bigdl-0.4.0-python-api.zip
exportBigDL_JAR_PATH=${BIGDL_HOME}/share/lib/bigdl-0.4.0-jar-with-dependencies.jar
exportPYTHONPATH=${PYTHON_API_ZIP_PATH}:$PYTHONPATH
${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
--driver-cores 2 \
--driver-memory 2g \
--total-executor-cores 2 \
--executor-cores 2 \
--executor-memory 4g \
--py-files${PYTHON_API_ZIP_PATH},${BIGDL_HOME}/models/lenet/lenet5.py \
--properties-file ${BIGDL_HOME}/share/conf/spark-bigdl.conf \
--jars ${BigDL_JAR_PATH} \
--conf spark.driver.extraClassPath=${BigDL_JAR_PATH} \
--confspark.executor.extraClassPath=bigdl-0.4.0-jar-with-dependencies.jar \
${BIGDL_HOME}/models/lenet/lenet5.py \
--action train \
--dataPath /tmp/mnist
Download BigDL-Tutorials-branch-0.4.zipfrom:
https://github.com/intel-analytics/BigDL-Tutorials
Upload it to /opt/intel-analytics and unzipit.
Install dependencies:
pip install numpy scipy pandas scikit-learnmatplotlib seaborn jupyter wordcloud
Start Jupyter by:
jupyter notebook --notebook-dir=/opt/intel-analytics/BigDL-Tutorials-branch-0.4--ip=* --no-browser --allow-root
Open http://{IP}:8888/?token=776c9d895e542e9caed7bf2342f08308632fe58563c17f54in web browser, and run the notebooks online, such as notebooks/neural_networks/cnn.ipynb.