本文使用版本spark-1.6.2-bin-hadoop2.6.tgz
jdk1.7:linux jdk安装和配置
scala2.10.6:linux scala安装和配置
hadoop-2.6.5:hadoop分布式集群搭建
主机名 系统 IP地址
master centos7 192.168.32.128
slave01 centos7 192.168.32.131
slave02 centos7 192.168.32.132
以下操作只针对master主机服务器,其他主机服务器类似。
cd /opt/software
tar -zxvf spark-1.6.2-bin-hadoop2.6.tgz
cp -r spark-1.6.2-bin-hadoop2.6 /usr/local/spark
spark解压和拷贝完成
配置系统环境变量
vi /etc/profile
退出保存,重启配置
source /etc/profile
定位:cd /usr/local/spark/conf
默认:
log4j.properties.template,spark-env.sh.template,slaves.template,spark-defaults.conf.template
复制:
log4j.properties,spark-env.sh,slaves,spark-defaults.conf
vi spark-env.sh
export JAVA_HOME=/usr/local/jdk
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=1G
export SPARK_EXECUTOR_MEMORY=1G
export SPARK_DRIVER_MEMORY=1G
export SPARK_WORKER_CORES=6
vi spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/historyserverforSpark
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.yarn.historyServer.address master:18080
spark.history.fs.logDirectory hdfs://master:9000/historyserverforSpark
vi slaves
master
slave01
slave02
#新建historyserverforSpark目录
hadoop fs -mkdir /historyserverforSpark
#查看目录
hadoop fs -ls /
复制master中spark文件到slave01和slave02服务器的/usr/local目录
scp -r /usr/local/sparkroot@slave01:/usr/local/spark
scp -r /usr/local/sparkroot@slave012:/usr/local/spark
类似3.1 分别在salve01和slave02配置系统环境
#启动
start-all.sh start
#停止
stop-all.sh start
在master服务器运行启动命令
进入/usr/local/spark目录
sbin/start-all.sh start
sbin/start-history-server.sh
jps查看节点进程
通过web端的18080端口查看是否启动成功
http://192.168.32.128:18080
bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.32.128:7077 lib/spark-examples-1.6.2-hadoop2.6.0.jar 10
至此,spark完成分布式集群搭建完毕。