Hadoop 2.7.7 完全分布式安装

环境准备:

Centos7 64位
JDK1.8
hadoop-2.7.7.tar.gz (这里选择2.7.7 是为了后面与hbase-2.0.0匹配,版本匹配问题可查阅hbase官网)

服务器规划

192.168.222.3 hadoop-namenode # 该节点只运行namenode服务
192.168.222.4 hadoop-yarn # 该节点运行resourcemanager服务
192.168.222.5 hadoop-datanode1 # 数据节点
192.168.222.6 hadoop-datanode2 # 数据节点
192.168.222.7 hadoop-datanode3 # 数据节点

我们以 hadoop-namenode 为例讲解安装,(安装时配置一样,只是在启动时选择启动不同项即可)
为了测试方便,会关闭所有服务器的防火墙,在所有服务器上执行关闭防火墙

systemctl stop firewalld.service # 停止firewall
systemctl disable firewalld.service # 禁止firewall开机启动
firewall-cmd --state # 查看默认防火墙装状态(关闭后显示notrunning, 开启显示running)

关闭所有服务器的SLNEX

vim /etc/selinux/config
SELINUX=disabled

修改hostname

vim /etc/hostname
hadoop-namenode

配置hosts

192.168.222.3 hadoop-namenode
192.168.222.4 hadoop-yarn
192.168.222.5 hadoop-datanode1
192.168.222.6 hadoop-datanode2
192.168.222.7 hadoop-datanode3

SSH免密码登录(同理将其它节点的公钥追加进来,即:每个节点都拥有其它机器的公钥)

ssh-keygen -t rsa  # 一路回车即可,在~/.ssh 目录下回生成id_rsa.pub 文件,将该文件追加到authorized_keys
cd ~/.ssh
cat id_rsa.pub >> authorized_keys

安装JDK8,注意hadoop-3.x版本要使用jdk1.8,否则会版本不兼容的错误信息

[参考博文](https://blog.csdn.net/hwm_life/article/details/81699882)

解压hadoop到指定目录

tar -zxvf hadoop-3.x.tar.gz -C /usr/local # -C 参数指定解压目录

配置hadoop环境变量

vim ~/.bash_profile
export HADOOP_HOME=/usr/local/hadoop-2.7.7
export PATH=$PATH:$HADOOP_HOME/bin

soruce ~/.bash_profile # 让配置立即生效,否则要重启系统才生效

配置hadoop-env.sh、mapred-env.sh、yarn-env.sh,在这三个文件中添加JAVA_HOME路径,如下

export JAVA_HOME=/usr/local/jdk8

修改core-site.xml


    
        fs.defaultFS
        hdfs://hadoop-namenode:9000
        namenode的地址
    
    
        dfs.namenode.name.dir
        file:///usr/local/hadoop-2.7.7/dfs/tmp
        namenode存放数据的目录
    
    
        io.file.buffer.size
        131072
    

修改hdfs-site.xml


    
        dfs.namenode.http-address
        hadoop-namenode:50070
    
    
        dfs.namenode.secondary.http-address
        hadoop-namenode:50090
    
    
        dfs.replication
        1
        文件副本数,一般指定多个,测试指定一个
    
    
        dfs.namenode.name.dir
        file:///usr/local/hadoop-2.7.7/dfs/name
    
    
        dfs.datanode.data.dir
        file:///usr/local/hadoop-2.7.7/dfs/data
    
    
        dfs.permissions
        false
    
    
        dfs.blocksize
        16m
    



修改mapred-site.xml


    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.jobhistory.address
        hadoop-yarn:10020
    
    
        mapreduce.jobhistory.webapp.address
        hadoop-yarn:19888
    

    
        mapreduce.application.classpath
        
            /usr/local/hadoop-2.7.7/etc/hadoop,
            /usr/local/hadoop-2.7.7/share/hadoop/common/*,
            /usr/local/hadoop-2.7.7/share/hadoop/common/lib/*,
            /usr/local/hadoop-2.7.7/share/hadoop/hdfs/*,
            /usr/local/hadoop-2.7.7/share/hadoop/hdfs/lib/*,
            /usr/local/hadoop-2.7.7/share/hadoop/mapreduce/*,
            /usr/local/hadoop-2.7.7/share/hadoop/mapreduce/lib/*,
            /usr/local/hadoop-2.7.7/share/hadoop/yarn/*,
            /usr/local/hadoop-2.7.7/share/hadoop/yarn/lib/*
        
    



修改yarn-site.xml




    
        yarn.resourcemanager.hostname
        hadoop-yarn
    
    
        yarn.resourcemanager.address
        hadoop-yarn:8032
    
    
        yarn.resourcemanager.resource-tracker.address
        hadoop-yarn:8031
    
    
        yarn.resourcemanager.scheduler.address
        hadoop-yarn:8030
    
    
        yarn.resourcemanager.admin.address
        hadoop-yarn:8033
    
    
        yarn.resourcemanager.webapp.address
        hadoop-yarn:8088
    
    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
        yarn.nodemanager.vmem-check-enabled
        flase
    
    
        yarn.nodemanager.vmem-pmem-ratio
        6
        每个任务使用的虚拟内存占物理内存的百分比
    


通过scp命令将上述修改的文件复制到其它服务器,复制已hadoop-yarn为例说明:

scp -r hadoop-2.7.7 hadoop-yarn:/usr/local

在hadoop-namenode上进行NameNode的格式化

cd /usr/local/hadoop-2.7.7
./bin/hdfs namenode -format

在hadoop-namenode上启动 namenode

./bin/hdfs --daemon start namenode

在hadoop-yarn上启动resourcemanaer,nodemanager

./bin/yarn --daemon start resourcemanager
./bin/yarn --daemon start nodemanager

在hadoop-datanode1,hadoop-datanode2,hadoop-datanode3上启动datanode,nodemanager

./bin/hdfs --daemon start datanode
./bin/yarn --daemon start nodemanager

通过jps命令可以查看启动的进程
Hadoop 2.7.7 完全分布式安装_第1张图片
通过自带例子测试hadoop集群安装的正确性

./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar pi 1 2

看到输出结果说明正确
Hadoop 2.7.7 完全分布式安装_第2张图片
通过管理界面查看集群情况

hadoop-namenode:50070
hadoop-yarn:8088

你可能感兴趣的:(Hadoop)