Hadoop3.1.0完全分布式集群部署超详细记录

Hadoop3.1.0完全分布式集群部署,三台服务器部署结构如下github配置文件源码地址

Hadoop3.1.0完全分布式集群部署超详细记录_第1张图片

#部署完成后
root@servera:/opt/hadoop/hadoop-3.1.0# jps
14056 SecondaryNameNode
14633 Jps
13706 NameNode
14317 ResourceManager

root@serverb:~# jps
5288 NodeManager
5162 DataNode
5421 Jps


root@serverc:~# jps
4545 NodeManager
4371 DataNode
4678 Jps

如上图,一共三台机器作为集群,servera作为master,其他两台作为worker。

2.开始部署-前期准备(三台机器都需要进行如下操作)

  • 2.1.配置hosts文件【三台】
vim /etc/hosts
10.80.80.110    servera
10.80.80.111    serverb
10.80.80.112    serverc
  • 2.2.jdk 安装【三台】

    • 下载jdk
    wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u172-b11/a58eab1ec242421181065cdc37240b08/jdk-8u172-linux-x64.tar.gz
    • 解压
    mkdir /opt/java
    
    wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u172-b11/a58eab1ec242421181065cdc37240b08/jdk-8u172-linux-x64.tar.gz
    
    tar -zxf jdk-8u172-linux-x64.tar.gz
    
    mv jdk1.8.0_172/ /opt/java/
    
    • 配置JAVA变量
    vim /etc/profile.d/jdk-1.8.sh
    
    #!/bin/sh
    
    
    # Author:wangxiaolei 王小雷
    
    
    # Blog: http://blog.csdn.net/dream_an
    
    
    # Github: https://github.com/wangxiaoleiai
    
    
    # web: www.xiaolei.wang
    
    
    # Date: 2018.05
    
    
    # Path: /etc/profile.d/
    
    
    export JAVA_HOME=/opt/java/jdk1.8.0_172
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOME}/bin:$PATH
    
    # 使环境变量生效
    
    source /etc/profile
    
    # 查看 Java
    
    java --version

Hadoop3.1.0完全分布式集群部署超详细记录_第2张图片

  • 2.3.pdsh、ssh安装【三台】
root@servera:~# apt install ssh pdsh
echo ssh>/etc/pdsh/rcmd_default
  • 2.4.免密码登录自身【三台】
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
ssh localhost(首次需输入yes)
  • 2.5.servera免密码登录其他机器(master免密码登录worker)【单台,只需在servera上执行】
ssh-copy-id -i ~/.ssh/id_rsa.pub servera
ssh-copy-id -i ~/.ssh/id_rsa.pub serverb
ssh-copy-id -i ~/.ssh/id_rsa.pub serverc

3.hadoop3+配置文件 github配置文件源码地址

共需要配置/opt/hadoop/hadoop-3.1.0/etc/hadoop/下的六个个文件,分别是

hadoop-env.sh、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、workers

  • 3.1. hadoop-env.sh 添加如下内容
export JAVA_HOME=/opt/java/jdk1.8.0_172/

export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
  • 3.2. core-site.xml
<configuration>

  
  <property>
      <name>fs.defaultFSname>
      <value>hdfs://ruizhia:9000value>
  property>

  <property>
      <name>io.file.buffer.sizename>
      <value>131072value>
  property>

configuration>
  • 3.3. hdfs-site.xml

<configuration>



<property>
  <name>dfs.namenode.name.dirname>
  <value>/var/lib/hadoop/hdfs/name/value>
property>

<property>
  <name>dfs.blocksizename>
  <value>268435456value>
property>

<property>
  <name>dfs.namenode.handler.count  name>
  <value>100value>
property>



<property>
  <name>dfs.datanode.data.dirname>
  <value>/var/lib/hadoop/hdfs/data/value>
property>

<property>
    <name>dfs.replicationname>
    <value>1value>
property>


configuration>
  • 3.4. yarn-site.xml

<configuration>






  <property>
          <name>yarn.resourcemanager.hostnamename>
          <value>serveravalue>
  property>
  
  

  <property>
          <name>yarn.nodemanager.aux-servicesname>
          <value>mapreduce_shufflevalue>
  property>

configuration>
  • 3.5. mapred-site.xml
<configuration>
  
  <property>
       <name>mapreduce.framework.namename>
       <value>yarnvalue>
   property>
configuration>
  • 3.6. workers
serverb
serverc

4. 复制Hadoop文件到其他集群、配置Hadoop环境变量、格式化hdfs、开启集群、查看、关闭、重置集群

  • 4.1. 将步骤3配置好的hadoop文件复制到其他同样位置的机器上
    /opt/hadoop/hadoop-3.1.0
  • 4.2 配置Hadoop环境变量【三台机器都操作】
vim /etc/profile.d/hadoop-3.1.0.sh
#!/bin/sh
# Author:wangxiaolei 王小雷
# Blog: http://blog.csdn.net/dream_an
# Github: https://github.com/wangxiaoleiai
# Date: 201805
# web: www.xiaolei.wang
# Path: /etc/profile.d/

export HADOOP_HOME="/opt/hadoop/hadoop-3.1.0"
export PATH="$HADOOP_HOME/bin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
source /etc/profile
  • 4.3. 格式化HDFS [只有首次部署才可使用]【谨慎操作,只在servera上操作】
/opt/hadoop/hadoop-3.1.0/bin/hdfs namenode -format myClusterName
  • 4.4. 开启 【只在servera上操作】
/opt/hadoop/hadoop-3.1.0/sbin/start-dfs.sh
/opt/hadoop/hadoop-3.1.0/sbin/start-yarn.sh
  • 4.5. 查看 【三台】
jps

Hadoop3.1.0完全分布式集群部署超详细记录_第3张图片

  • 4.6. web端localhost:8088查看【localhost只定servera的localhost,也可以换成外网ip,在详见步骤3.4. yarn-site.xml 】

Hadoop3.1.0完全分布式集群部署超详细记录_第4张图片

  • 4.7. 关闭 【只在servera上操作】
/opt/hadoop/hadoop-3.1.0/sbin/stop-dfs.sh
/opt/hadoop/hadoop-3.1.0/sbin/stop-yarn.sh
  • 4.8. 重置hadoop环境 [移除hadoop hdfs log文件] 【谨慎操作,只在servera上操作】
rm -rf /opt/hadoop/hadoop-3.1.0/logs/*
rm -rf /var/lib/hadoop/

5.遇到的坑 pdsh@servera: servera: connect: Connection refused

root@servera:/opt/hadoop/hadoop-3.1.0# sbin/start-dfs.sh
Starting namenodes on [servera]
pdsh@servera: servera: connect: Connection refused
Starting datanodes
pdsh@servera: serverc: connect: Connection refused
pdsh@servera: serverb: connect: Connection refused
Starting secondary namenodes [servera]
pdsh@servera: servera: connect: Connection refused

Hadoop3.1.0完全分布式集群部署超详细记录_第5张图片

  • 解决方法步骤2.3中
echo ssh>/etc/pdsh/rcmd_default

6.Hadoop集群部署相关文档参考

Hadoop Cluster Setup

超详细从零记录Hadoop2.7.3完全分布式集群部署过程

你可能感兴趣的:(Hadoop,YARN)