其他基础环境安装请参考上一篇博文:
http://sofar.blog.51cto.com/353572/1352713
1、Scala 安装
http://www.scala-lang.org/files/archive/scala-2.10.3.tgz
# tar xvzf scala-2.10.3.tgz -C /usr/local
# cd /usr/local
# ln -s scala-2.10.3 scala
2、Spark 安装
http://d3kbcqa49mib13.cloudfront.net/spark-0.9.0-incubating.tgz
# tar xvzf spark-0.9.0-incubating.tgz -C /usr/local
# cd /usr/local
# ln -s spark-0.9.0-incubating spark
# cd /usr/local/spark
# export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m"
# mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package
# cd /usr/local/spark/conf
# mv spark-env.sh.template spark-env.sh
# mkdir -p /data/spark/tmp
# vim spark-env.sh
export JAVA_HOME=/usr/local/jdk
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
SPARK_LOCAL_DIR="/data/spark/tmp"
SPARK_JAVA_OPTS="-Dspark.storage.blockManagerHeartBeatMs=60000 -Dspark.local.dir=$SPARK_LOCAL_DIR -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$SPARK_HOME/logs/gc.log -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=60"
【注:在其他节点上也做同样配置】
## 在Master节点上执行
# cd /usr/local/spark && sbin/start-all.sh
3、相关测试
(1)、本地模式
# bin/run-example org.apache.spark.examples.SparkPi local
(2)、普通集群模式
# bin/run-example org.apache.spark.examples.SparkPi spark://namenode1:7077
# bin/run-example org.apache.spark.examples.SparkLR spark://namenode1:7077
# bin/run-example org.apache.spark.examples.SparkKMeans spark://namenode1:7077 file:/usr/local/spark/data/kmeans_data.txt 2 1
(3)、结合HDFS的集群模式
# hadoop fs -put README.md .
# MASTER=spark://namenode1:7077 bin/spark-shell
scala> val file = sc.textFile("hdfs://namenode1:9000/user/root/README.md")
scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
scala> count.collect()
scala> :quit
(4)、基于YARN模式
# SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar \
bin/spark-class org.apache.spark.deploy.yarn.Client \
--jar examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar \
--class org.apache.spark.examples.SparkPi \
--args yarn-standalone \
--num-workers 3 \
--master-memory 4g \
--worker-memory 2g \
--worker-cores 1