Spark2.0.0.+hadoop2.7.3

参考:http://blog.csdn.net/gamer_gyt/article/details/52045663

hadoop2.7.3 yarn模式已经搭建完成.

hadoop安装目录:/home/hadoop/hadoop-2.7.3
java目录 :/home/java/jdk1.8.0_102

节点状况:

10.0.0.172 master172
10.0.0.171 slave171
10.0.0.185 slave185

安装scala

下载地址: http://www.scala-lang.org/download/2.11.8.html

下载scala-2.11.8.tgz放到 /home/hadoop/下面

$ vi /etc/profile
$ source /etc/profile

export SCALA_HOME=/home/hadoop/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin 

下载

下载spark-2.0.0-bin-hadoop2.7.tgz到/home/hadoop/下

$ cd /home/hadoop/

$ tar -zxvf spark-2.0.0-bin-hadoop2.7.tgz
$ mv spark-2.0.0-bin-hadoop2.7 spark

配置spark环境变量

$ vi /etc/profile

加入

export SPARK_HOME=/home/hadoop/spark 
export PATH=$SPARK_HOME/bin:$PATH 

$ source /etc/profile

配置spark

$ cd /home/hadoop/spark/conf

$ cp spark-env.sh.template spark-env.sh

$ vi spark-env.sh

加入

export SCALA_HOME=/home/hadoop/scala-2.11.8
export JAVA_HOME=/home/java/jdk1.8.0_102
export SPARK_MASTER_IP=10.0.0.172
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.3/etc/hadoop

配置slaves

cp slaves.template slaves

vi slaves

10.0.0.171
10.0.0.185

分发到slave节点

scp -r scala-2.11.8 10.0.0.171:/home/hadoop/
scp -r scala-2.11.8 10.0.0.185:/home/hadoop/
scp -r spark 10.0.0.171:/home/hadoop/
scp -r spark 10.0.0.185:/home/hadoop/

同样设置一下 /etc/profile 文件。

运行测试

打开spark-shell

$ cd /home/hadoop/spark
$ bin/spark-shell

在打开的spark-shell里输入,每一次输入,shell都会有一些反馈结果

scala> val file=sc.textFile("hdfs://10.0.0.172:9000/input/*") 
file: org.apache.spark.rdd.RDD[String] = hdfs://10.0.0.172:9000/input/* MapPartitionsRDD[5] at textFile at :24

scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at :26

scala> count.collect() 
res0: Array[(String, Int)] = Array(((default),,1), (JNs,1), (Software,1), (Unless,9), (endpoint.,1), (user?,1), (security.applicationclient.protocol.acl,1), (start,1), (number,5), (getKeyVersion,1), (ApplicationHistoryProtocol,,1), (type,1), (with,28), (State,1), (RefreshUserMappingsProtocol.,1), (JavaKeyStoreProvider,,1), (ACL,,2), (inter-datanode,1), (at,11), ((root,1), (ApplicationClientProtocol,,1), (ResourceCalculator,1), (hot-reloaded,1), (keys,1), (mapreduce.jobhistory.address,1), (History,1), (implementation,1), (security.namenode.protocol.acl,1), (setup,1), ("  ",19), (,9), (Server,1), (allowed.,18), (BASIS,,9), (datanodes,1), (file.,10), (resources,2), (stored.,1), (mapreduce.jobhistory.webapp.address

注意:9000端口,在hadoop中,只有NameNode暴露了,DataNode是没有暴露这个端口的,所以,要注意地址。

你可能感兴趣的:(Spark2.0.0.+hadoop2.7.3)