搭建hadoop分布式环境

Hadoop分布式安装

说明

  1. 本文利用三台机器,均安装centos6
  2. 除了最后启动和停止,所有操作均要在三台机器上做
  3. 有些配置文件可以先在一台机器上完成配置,然后通过scp发送到另外的机器,以减少工作量

1.修改主机名及映射

[root@hadoop1 ~]# vim /etc/sysconfig/network

[root@hadoop1 ~]# vim /etc/hosts

2.创建Hadoop用户

[root@hadoop3 ~]# useradd hadoop   

[root@hadoop2 ~]# passwd hadoop

[root@hadoop3 ~]# vim /etc/sudoers 

3.免密登陆

切换到hadoop用户
[hadoop@hadoop1 ~]$ ssh-keygen
传输密钥
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop1
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop2
[hadoop@hadoop1 ~]$ ssh-copy-id hadoop3

验证ssh无需密码
[hadoop@hadoop1 ~]$ ssh hadoop1
[hadoop@hadoop1 ~]$ ssh hadoop2
[hadoop@hadoop1 ~]$ ssh hadoop3

4.安装JDK

[hadoop@hadoop1 ~]$ tar -zxvf jdk-8u161-linux-x64.tar.gz 
[hadoop@hadoop1 ~]$ mv jdk1.8.0_161 jdk8
切换到root
[root@hadoop1 ~]# vim /etc/profile

在文件尾加上
#Java
export JAVA_HOME=/home/hadoop/jdk8
export PATH=$PATH:$JAVA_HOME/bin

切换到hadoop
[hadoop@hadoop1 ~]$ source /etc/profile

验证
[hadoop@hadoop1 ~]$ java -version

5.安装hadoop

1.分布式系统规划

hdfs yarn
hadoop1 namenode、datanode nodemanager
hadoop2 datanode、secondarynamenode nodemanager
hadoop3 datanode resourcemanager、nodemanager
[hadoop@hadoop1 ~]$ wget https://www-us.apache.org/dist/hadoop/common/hadoop-2.7.6/hadoop-2.7.6.tar.gz 
[hadoop@hadoop1 ~]$ tar -zxvf hadoop-2.7.6.tar.gz
[root@hadoop1 ~]# vim /etc/profile
[hadoop@hadoop1 ~]$ source /etc/profile

验证
[hadoop@hadoop1 ~]$ hadoop version

2.配置文件

1.hadoop环境变量配置

#Java
export JAVA_HOME=/home/hadoop/jdk8
export HADOOP_HOME=/home/hadoop/hadoop-2.7.6
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

2.修改配置文件

配置文件目录

hadoop-2.7.6/etc/hadoop/
[hadoop@hadoop3 ~]$ cd hadoop-2.7.6/etc/hadoop/
1.hadoop-env.sh

2.core-site.xml

    
        fs.defaultFS
        hdfs://hadoop1:9000
    
    
        hadoop.tmp.dir
        /home/hadoop/hadoopdata/tmp
    
3.hdfs-site.xml

    
        dfs.namenode.secondary.http-address
        hadoop2:50090
    
    
        dfs.replication
        2
    
    
        dfs.namenode.name.dir
        /home/hadoop/hadoopdata/name
    
    
        dfs.datanode.data.dir
        /home/hadoop/hadoopdata/data
    
4.yarn-site.xml

    
        yarn.resourcemanager.hostname
        hadoop3
    
    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
5.mapred-site.xml

    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.jobhistory.address
        hadoop1:10020
    
    
        mapreduce.jobhistory.webapp.address
        hadoop1:19888
    
6.slaves
hadoop1
hadoop2
hadoop3

3.格式化

[hadoop@hadoop1 hadoop-2.7.6]$ hadoop namenode -format

4.启动与停止

[hadoop@hadoop1 hadoop-2.7.6]$ start-dfs.sh

注意,yarn最好在resourcemanager节点上启动

[hadoop@hadoop3 hadoop-2.7.6]$ start-yarn.sh

6.时间同步

[root@hadoop1 ~]# date
Thu Jan 17 00:10:16 EST 2019  //国外vps
[root@hadoop1 ~]# rm -rf /etc/localtime
[root@hadoop1 ~]# ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@hadoop1 ~]# yum -y install ntpdate ntp
[root@hadoop1 ~]# ntpdate time.google.com
17 Jan 13:11:43 ntpdate[1691]: step time server 216.239.35.0 offset 2.120207 sec

你可能感兴趣的:(hadoop,大数据)