Spark 官网下载: https://spark.apache.org/downloads.html
Hadoop 官网下载: https://hadoop.apache.org/releases.html
目前使用Spark 版本为: spark-2.4.3 Hadoop版本为: hadoop-2.10.1
检测是否可以自登陆,不需要密码则配置正常:
ssh localhost
在搭建Hadoop环境时,出现localhost.localdomain: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)问题,
这个问题是由于即使是本机使用SSH服务也是需要对自己进行公私钥授权的,所以在本机通过ssh-keygen创建好公私钥,然后将公钥复制到公私钥的认证文件中就可以了
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
也有可能还会有权限问题报错
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
增加ssh keys权限
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
sudo yum -y install gcc gcc-c++ make openssl-devel gmp-devel mpfr-devel libmpc-devel emacs-filesystem libmpcdevel libaio numactl autoconf automake libtool libffi-devel snappy snappy-devel zlib zlib-devel bzip2 bzip2-devel lz4-devel libasan lsof sysstat telnet psmisc && sudo yum install -y which java-1.8.0-openjdk java-1.8.0-openjdk-devel && sudo yum clean all
查找并配置JAVA_HOME
which java
ls -lrt /usr/bin/java
ls -lrt /etc/alternatives/java
通过该命令查询到openjdk的安装路径后,编辑/etc/profile文件中配置JAVA_HOME
export JAVA_HOME=/data/etc/java/jdk1.8.0_291
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}
tar -xzvf spark-2.4.3-bin-hadoop2.7.tgz
cd spark-2.4.3-bin-hadoop2.7/conf
cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
spark.executor.heartbeatInterval 110s
spark.rpc.message.maxSize 1024
spark.hadoop.dfs.replication 1
# 临时文件路径
spark.local.dir /data/spark_test/temp/spark-tmp
spark.driver.memory 10g
spark.driver.maxResultSize 10g
vi sbin/start-master.sh
./sbin/start-all.sh
查看自身ip:
ifconfig | grep inet
打开web : 127.0.0.1:19080
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://页面上显示的端口ip ./examples/jars/spark-examples_2.11-2.4.3.jar
tar -xzvf hadoop-2.10.1.tar.gz
cd hadoop-2.10.1
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/spark_test/temp/hdfs/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/spark_test/temp/hdfs/dfs/data</value>
</property>
<property><name>dfs.permission</name><value>false</value></property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>NEVER</value>
</property>
<property><name>dfs.permissions.enabled</name><value>false</value></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value></property>
</configuration>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://自身ip:19000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<!-- 配置到hadoop目录下temp文件夹 -->
<value>/data/spark_test/temp/hdfs-tmp</value>
</property>
<!-- 当前用户全设置成root -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
</configuration>
配置自身JAVA_HOME到 env.sh中
/bin/hdfs namenode -format
./sbin/start-dfs.sh
lsof -i:19000
查看自身ip:
ifconfig | grep inet