Hadoop-Spark集群安装

注意事项

  1. Hadoop是根据%JAVA_HOME%,%HADOOP_HOME%来确定位置的,所以需要在环境变量中设置这两个值,如下图:


    图片.png
  2. Hadoop路径设置好了,需要配置四个配置未见:core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml,同时这些文件编码格式需要为UTF-8,否则在执行hdfs namenode -format 指令时会出现Invalid UTF-8报错

图片.png

3.ResourceManager启动报错“UnResolvedAddress”,估计是yarn-site.xml中的文件配置的yarn.resourcemanager.hostname有问题,hostname配置为主机名称即可,如下图:


图片.png

4."there is no HDFS_NAMENODE_USER defined. Aborting operation."问题
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数

HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root

start-yarn.sh,stop-yarn.sh顶部也需添加以下

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

5.免密登录关键在于各个节点用户名必须一样,否则以当前用户名去启动其他节点是不会成功的;

6.添加免密登录后,如果报错localhost Permission Denied,那是因为主节点没有把id_rsa.pub添加到authorized_keys,如下图添加:


图片.png

命令为:

cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

免密登录关键三条命令:

    1. ssh-keygen -t rsa
  • 2 scp id_rsa.pub 从节点ip@从节点主机名:.../dirForRsa(文件夹名称)
  • 3 cat .../dirForRsa/id_rsa.pub>>~/.ssh/authorized_keys

7.如果启动start_dfs.sh出现错误“localhost: ERROR: Cannot set priority of datanode process 130126”,localhost说明它是在自身节点设置datanode,那是因为hadoop/etc/works配置中默认从节点是localhost,改为指定的从节点即可如下图:


图片.png

启动成功,则会出现如下提示:


图片.png

jps查看Master:


图片.png

jps查看Node0:
图片.png

jps查看Node1:


图片.png

总结

Hadoop集群搭建:

    1. 环境变量配置/etc/profile如下配置
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
export PATH="$PATH:/snap/bin"
export JAVA_HOME=/opt/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

if [ "${PS1-}" ]; then

-2 . 配置hadoop集群四大配置文件:core-site.xml,hdfs-site.xml,mapred-site.xml和yarn-site.xml;
其中需要注意的是,如果要配置secondarynamenode,需要在hdfs-site.xml和core-site.xml里面进行相应配置;
各配置文件参考如下:
core-site.xml:


        
    
                fs.defaultFS
                hdfs://192.168.1.113:9000
    
    
    
             hadoop.tmp.dir
             /opt/hadoop-3.1.2/tmp 
    
    
        
        fs.checkpoint.period 
        3600
    

 

hdfs-site.xml








    
    
            dfs.replication
            2
    
     
            dfs.nameservices
            hadoop-cluster
     
    
    
             dfs.namenode.name.dir 
             file:///data/hadoop/hdfs/namenode
    

    
    
             dfs.namenode.checkpoint.dir
             file:///data/hadoop/hdfs/secnamenode
    
    
            dfs.namenode.checkpoint.edits.dir
            file:///data/hadoop/hdfs/secnamenode
    
    
            dfs.datanode.data.dir
            file:///data/hadoop/hdfs/datanode
    
    
    
          dfs.namenode.http-address 
          master:50070
    
   
          dfs.namenode.secondary.http-address 
          node1:50090
    


mapred-site.xml:




        
              mapreduce.framework.name
              yarn
       

yarn-site.xml:


    
    
            yarn.nodemanager.aux-services
            mapreduce_shuffle
    
         
    
            yarn.resourcemanager.hostname
            master
     
        
        
        yarn.log-aggregation-enable 
        true    
    
    
            yarn.nodemanager.local-dirs
            file:///data/hadoop/yarn/namenode
    


    1. 设置workers或者slaves
    1. master,slave节点设置/etc/hosts,如下:


      图片.png
  • 5 集群设置免密登录

利用ssh-keygen -t rsa生成密钥,在通过scp传送pub密钥给slave,然后将pub密钥添加到authorized_keys中

    1. 在start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh等sh脚本头部添加相应的参数,参照注意事项4
    1. 利用hdfs namenode -format格式化
  • 8.运行start-dfs.sh,start-yarn.sh,成功运行

Hadoop+Spark集群搭建

  • 1.搭建好hadoop
    1. 下载scala和spark安装包
  • 3.修改环境变量,添加scala和spark路径,/etc/profile配置如下:
export PATH="$PATH:/snap/bin"
export JAVA_HOME=/opt/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export SCALA_HOME=/opt/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

  • 4.修改spark/conf/spark-env.sh中的配置,如下:
#!/usr/bin/env bash
export JAVA_HOME=/opt/jdk
export SCALA_HOME=/opt/scala
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export SPARK_MASTER_IP=192.168.1.113
export SPARK_MASTER_HOST=192.168.1.113
export SPARK_LOCAL_IP=192.168.1.112
export SPARK_WORKER_MEMORY=512m
export SPARK_WORKER_CORES=2
export SPARK_HOME=/opt/spark
export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath)

    1. 修改spark/conf/slaves配置,如下:
master
node0
node1
  • 6 .使用tar -zcvf spark.tar.gz spark将spark文件目录打包,然后使用scp将/etc/profile和该压缩包发送给slave节点,slave节点需要将spark/conf/SPARK_LOCAL_IP同步成成自身IP,同时将profile文件复制到/etc/,执行source /etc/profile,使得环境变量生效

  • 7.启动spark/sbin/start-master.sh

你可能感兴趣的:(Hadoop-Spark集群安装)