spark集群搭建

spark集群搭建

虚拟机配置

bigdata-hmaster 192.168.135.112 4核心 32GB
bigdata-hnode1 192.168.135.113 4核心 16GB
bigdata-hnode2 192.168.135.114 4核心 16GB

spark常用端口:

8081:主界面
18080:历史服务,该配置在配置文件中指定

hosts配置,且三台机器中master节点能够通过ssh免密登录其它两台机器
192.168.135.112 bigdata-hmaster
192.168.135.113 bigdata-hnode1
192.168.135.114 bigdata-hnode2

1、安装包这里我们下载3.3.2版本的

需要提前配置好scala环境

安装包下载连接:https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz
解压,并修改文件夹名称

tar  -zxvf  spark-3.3.2-bin-hadoop3.tgz
mv spark-3.3.2-bin-hadoop3 spark-3.3.2

2、修改配置文件workers、spark-env.sh、spark-defaults.conf

cp workers.template workers
cp spark-env.sh.template spark-env.sh
cp spark-defaults.conf.template spark-defaults.conf

workers

bigdata-hnode1
bigdata-hnode2

spark-env.sh

export JAVA_HOME=/usr/local/lib/jdk1.8.0_333
export HADOOP_HOME=/usr/local/lib/hadoop-3.2.4
export HADOOP_CONF_DIR=/usr/local/lib/hadoop-3.2.4/etc/hadoop
export SPARK_CLASSPATH=/usr/local/lib/spark-3.3.2
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.fs.logDirectory=hdfs://bigdata-hmaster:8020/spark/sparklog -Dspark.history.retainedApplications=30"
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1G

spark-defaults.conf

spark.master                     spark://bigdata-hmaster:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://bigdata-hmaster:8020/spark/sparklog
spark.yarn.jars                  hdfs://bigdata-hmaster:8020/spark/sparkjar/*           
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.cores               2
spark.driver.memory              2g
spark.cores.max                  4
spark.yarn.historyServer.address=bigdata-hmaster:18080
spark.history.ui.port=18080
spark.executor.extraJavaOptions  -XX:+PrintGCDetails

配置分发:

scp -r conf bigdata-hnode1:$PWD

scp -r conf bigdata-hnode2:$PWD

配置三台机器的环境变量

export SPARK_HOME=/usr/local/lib/spark-3.3.2
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SPARK_HOME/yarn

source /etc/profile

3、修改spark的启动命令

cd $SPARK_HOME/sbin
mv start-all.sh spark-start-all.sh
mv stop-all.sh spark-stop-all.sh 

启动spark服务和历史服务。

由于上边spark-defaults.conf中配置了spark.eventLog.dir和spark.yarn.jars ,需要启动hdfs并创建相应的目录

spark-start-all.sh

start-history-server.sh

spark的主页访问master的:8081

spark历史记录访问master的:18080

你可能感兴趣的:(大数据,spark,大数据)