Install SparkR

SparkR on EC2

Shivaram Venkataraman edited this page on 19 Feb 2015 · 7 revisions

 Pages 6

  • Home

  • Creating a SparkR Development VM

  • Setting up an IDE on the SparkR Development VM

  • SparkR Example: Digit Recognition on EC2

  • SparkR on EC2

  • SparkR Quick Start

Clone this wiki locally

 Clone in Desktop

This page describes steps to use SparkR on EC2.

Cluster launch

First, launch an EC2 cluster using Spark's EC2 scripts. Note that you should use EC2 scripts which ship with Spark >= 0.9.0 for SparkR to work correctly.

Installing dependencies

Next login to the EC2 cluster by running ./spark-ec2 -k <keypair> -i <key-file> login <cluster-name>.

Install SparkR

Now we are ready to install SparkR on EC2 cluster. To do this we need to build SparkR with the same Spark version that is running on the cluster. (You can find this by running cat /root/spark/RELEASE). To install SparkR on all your machines you can run:

cd /root
git clone https://github.com/amplab-extras/SparkR-pkg.git
cd SparkR-pkg
SPARK_VERSION=1.2.1 ./install-dev.sh
cp -a /root/SparkR-pkg/lib/SparkR /usr/share/R/library/
/root/spark-ec2/copy-dir /root/SparkR-pkg
/root/spark/sbin/slaves.sh cp -a /root/SparkR-pkg/lib/SparkR /usr/share/R/library/

Launch SparkR

Finally to launch SparkR and connect to the Spark EC2 cluster, we run

MASTER=spark://<master_hostname>:7077 ./sparkR

where <master_hostname> can be queried using:

cat /root/spark-ec2/cluster-url

You can check if you are using the EC2 cluster using Spark's Web UI athttp://<master_hostname>:8080.

你可能感兴趣的:(Install,sparkr)