Spark的安装,编译

Spark的安装与编译 
1、安装JDK
2、安装scala
3、安装Hadoop2.X
4、安装Spark
tar -zxvf scala-2.10.4.tgz -C /opt/modules/
tar -zxvf spark-1.3.0-bin-2.6.0.tgz -C /opt/modules/


export SCALA_HOME=/opt/modules/scala-2.10.4
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/opt/modules/spark-1.3.0-bin-2.6.0
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile


 
Spark  Standalone模式  cluster model
conf/spark-env.sh 详细说明看注释
JAVA_HOME=/opt/modules/jdk1.7
SCALA_HOME=/opt/modules/scala-2.10.4
HADOOP_CONF_DIR=/opt/modules/hadoop-2.6.0/etc/hadoop
SPARK_MASTER_IP=hadoop-master.dragon.org
SPARK_MASTER_PORT=7077 #默认7077
SPARK_MASTER_WEBUI_PORT=8080 #默认8080
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1
conf/slaves   worker节点配置
hadoop-master.dragon.org
conf/spark-defaults.conf  master节点配置
spark.master                     spark://hadoop-master.dragon.org:7077


启动
sbin/start-master.sh
sbin/start-slaves.sh


sbin/stop-slaves.sh
sbin/stop-master.sh


bin/spark-shell


wordcount测试
var rdd=sc.textFile("hdfs://192.168.192.128:9000/data/input/danc.data")
var rdd=sc.textFile("file:///opt/tools/a.txt")
var rdd=sc.textFile("hdfs://hadoop-00:9000/home910/liyuting/input/a.txt")
var wordcount = rdd.flatMap(x => x.split(" ")).map(x => (x,1)).reduceByKey((a,b) => a+b)
wordcount.collect()
var wordsort=wordcount.sortByKey(false).collect() #按key值升序
wordsort.collect()


val wordcount=file.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
wordcount.collect()
val wordsort = wordcount.map(x => (x._2,x._1)).sortByKey(false).map(x => (x._2,x._1)).collect()
wordsort.collect()


val file = sc.textFile("hdfs://hadoop.master:9000/data/intput/wordcount.data")
val count = file.flatMap(line=>(line.split(" "))).map(word=>(word,1)).reduceByKey(_+_)
count.collect()
count.textAsFile("hdfs://hadoop.master:9000/data/output")




spark编译
1、安装JDK
2、安装Maven


/etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4


/opt/modules/apache-maven-3.3.3/conf/settings.xml
<mirror>
      <id>nexus-osc</id>
      <mirrorOf>*</mirrorOf>
      <name>Nexus osc</name>
      <url>http://maven.oschina.net/content/groups/public/</url>
</mirror>


make-distribution.sh
129行开始 注释所有版本手动添加
VERSION=1.6.0
SCALA_VERSION=2.10
SPARK_HADOOP_VERSION=2.6.0
SPARK_HIVE=1  #是否打包hive
./make-distribution.sh --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive-0.13.1 -Phive-thriftserver
./make-distribution.sh --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 




单机模式
安装JDK
安装scala
./bin/spark-shell --master local[2]
/bin/spark-shell --master local[2]


./make-distribution.sh --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0




export MAVEN_HOME=/opt/modules/apache-maven-3.3.3
export PATH=$PATH:$MAVEN_HOME/bin
export SPARK_HOME=/opt/modules/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
export SCALA_HOME=/opt/modules/scala-2.10.5
export PATH=$PATH:$SCALA_HOME/bin
export JAVA_HOME=/opt/modules/jdk1.7
export PATH=$PATH:$JAVA_HOME/bin

192.168.192.137  hadoop-master.dragon.org hadoop-master


Spark On YARN 环境搭建
cd /opt/modules/spark-1.6.0-bin-hadoop2.6/conf/
vi spark-env.sh
JAVA_HOME=/opt/modules/jdk1.7
SCALA_HOME=/opt/modules/scala-2.10.4
HADOOP_CONF_DIR=/opt/modules/hadoop-2.6.0/etc/hadoop
vi slaves   worker节点配置
hadoop-master.dragon.org
http://192.168.192.137:8080/




运行示例
#本地模式两线程运行
./bin/run-example SparkPi 10 --master local[2]


#Spark Standalone 集群模式运行
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://hadoop-master:7077 \
  lib/spark-examples-1.6.0-hadoop2.6.0.jar \
  100


#Spark on YARN 集群上 yarn-cluster 模式运行
./bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \  
    lib/spark-examples-1.6.0-hadoop2.6.0.jar \
    10
bin/stop-all.sh
注意 Spark on YARN 支持两种运行模式,分别为yarn-cluster和yarn-client,从广义上讲,yarn-cluster适用于生产环境;而yarn-client适用于交互和调试,也就是希望快速地看到application的输出。

你可能感兴趣的:(spark,安装,编译)