服务器 | IP 地址 | 角色 |
---|---|---|
Master | 192.168.1.100 | NameNode + ResourceManager + Spark Master |
Worker1 | 192.168.1.101 | DataNode + NodeManager + Spark Worker |
Worker2 | 192.168.1.102 | DataNode + NodeManager + Spark Worker |
在 所有节点 上执行:
sudo apt update
sudo apt install -y openjdk-8-jdk
验证 Java:
java -version
在 Master
节点执行:
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
然后将公钥 id_rsa.pub
复制到 所有 节点:
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]
验证:
ssh [email protected]
ssh [email protected]
ssh [email protected]
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xzf hadoop-3.3.6.tar.gz
sudo mv hadoop-3.3.6 /usr/local/hadoop
在 Master
(192.168.1.100)上:
vim ~/.bashrc
添加:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/sbin:$PATH
复制到 Worker 节点:
scp ~/.bashrc [email protected]:~/
scp ~/.bashrc [email protected]:~/
在 所有节点 执行:
source ~/.bashrc
vim $HADOOP_HOME/etc/hadoop/core-site.xml
Master 上配置:
xml<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.100:9000</value>
</property>
</configuration>
分发到 Worker:
scp $HADOOP_HOME/etc/hadoop/core-site.xml [email protected]:$HADOOP_HOME/etc/hadoop/
scp $HADOOP_HOME/etc/hadoop/core-site.xml [email protected]:$HADOOP_HOME/etc/hadoop/
vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Master 配置:
xml<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
分发到 Worker:
scp $HADOOP_HOME/etc/hadoop/hdfs-site.xml [email protected]:$HADOOP_HOME/etc/hadoop/
scp $HADOOP_HOME/etc/hadoop/hdfs-site.xml [email protected]:$HADOOP_HOME/etc/hadoop/
Master 配置 yarn-site.xml
:
vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
xml<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.1.100</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
分发到 Worker:
scp $HADOOP_HOME/etc/hadoop/yarn-site.xml [email protected]:$HADOOP_HOME/etc/hadoop/
scp $HADOOP_HOME/etc/hadoop/yarn-site.xml [email protected]:$HADOOP_HOME/etc/hadoop/
vim $HADOOP_HOME/etc/hadoop/slaves
添加:
192.168.1.101
192.168.1.102
hdfs namenode -format
start-dfs.sh
start-yarn.sh
如果遇到“ERROR: JAVA_HOME is not set and could not be found”的报错信息,执行下面的语句:
sudo vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#保存后将文件分发到Worker
scp $HADOOP_HOME/etc/hadoop/hadoop-env.sh [email protected]:$HADOOP_HOME/etc/hadoop/
scp $HADOOP_HOME/etc/hadoop/hadoop-env.sh [email protected]:$HADOOP_HOME/etc/hadoop/
验证:
jps
wget https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz
tar -xzf spark-3.4.1-bin-hadoop3.tgz
sudo mv spark-3.4.1-bin-hadoop3 /usr/local/spark
分发 Spark 到 Worker:
scp -r /usr/local/spark [email protected]:/usr/local/
scp -r /usr/local/spark [email protected]:/usr/local/
修改 spark-env.sh
:
vim $SPARK_HOME/conf/spark-env.sh
添加:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_MASTER_HOST=192.168.1.100
分发到 Worker:
scp $SPARK_HOME/conf/spark-env.sh [email protected]:$SPARK_HOME/conf/
scp $SPARK_HOME/conf/spark-env.sh [email protected]:$SPARK_HOME/conf/
在 Master
节点启动:
start-history-server.sh
在 Worker1、Worker2 上:
start-slave.sh spark://192.168.1.100:7077
spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
$SPARK_HOME/examples/jars/spark-examples_2.12-3.4.1.jar 10
访问 UI:
http://192.168.1.100:8088
http://192.168.1.100:18080