VirtualBox中搭建Hadoop 3.1.1完全分布式集群环境(二)CentOS7中安装及配置Hadoop

一、安装及配置java运行环境

1、安装OpenJDK1.8 

#安装jdk
#yum install java-1.8.0-openjdk-devel.x86_64

注:这里体验yum软件包管理工具:
(1)查找包(1)
#yum list | grep telnet-server
(2)查找包(2)
#yum search MySQL
(3)安装MySQL(在CentOS7中叫MariaDB)
#yum install mariadb-server.x86_64
(4)安装telnet
#yum install telnet.*

2、配置OpenJDK1.8运行环境 

#vi ~/.bashrc (或~/profile或/etc/profile)

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.e17_6.x86_64/jre
export CLASSPATH=.:$JAVA_HOME/lib/rt.jar:$JAVA_HOME/lib/tools.jar 
export PATH=$PATH:$JAVA_HOME/bin 

二、安装及配置HADOOP运行环境

1、安装HADOOP

下载(参考curl命令:Linux curl命令详解)Hadoop3.1.1二进制发行包(参考hadoop-3.1.1.tar.gz下载页面),并解压至/home/hadoop/hadoop-3.1.1目录

#curl -O http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz  /home/hadoop/hadoop-3.1.1.tar.gz
#tar -xvf hadoop-3.1.1.tar.gz

2、配置HADOOP(不是必须)

#vi /etc/profile( 或/root/profile或 /root/.bashrc)
export HADOOP_INSTALL=/home/hadoop/hadoop-3.1.1
export PATH=${HADOOP_INSTALL}/bin:${HADOOP_INSTALL}/sbin:${PATH}
export HADOOP_MAPRED_HOME=${HADOOP_INSTALL}
export HADOOP_COMMON_HOME=${HADOOP_INSTALL} 
export HADOOP_HDFS_HOME=${HADOOP_INSTALL}
export YARN_HOME=${HADOOP_INSTALL}
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_INSTALL}/lib:${HADOOP_INSTALL}/lib/native"
#source /etc/profile

三、通过VirtualBox管理器复制上术首台虚拟机获得其他虚拟机

1、复制过程:略。

 2、网络配置:启动各虚拟机后,分别按照规划配置主机名(hn2、hn3)、网络地址(hn2:192.168.56.101)和(hn3:192.168.56.102)。主机名修改文件/etc/hostname文件(略)。配置本地域名解析文件/etc/hosts,增加配置内容如下。

                                    hn1 192.168.56.100

                                    hn2 192.168.56.101

                                    hn3 192.168.56.102

 

3、网络规划:hn1(192.168.56.100):NameNode, Secondary NameNode, ResourceManager

hn2(192.168.56.101)、hn3(192.168.56.102):DataNode, NodeManager

四、配置SSH环境

hn1中执行如下三条命令: 

#ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
#chmod 0600 ~/.ssh/authorized_keys

从hn1中分别登录hn2、hn3:

                      [root@hn1]#ssh [email protected]                          #登录hn2

                                  依次执行hn1中执行的SSH配置指令(如上,略)

                                   注:输入密码后切换至hn2,可通过hostname命令查看当前主机名

                      [root@hn2]#exit                                                             #从hn2中退出远程主机登录

                      [root@hn1]#ssh [email protected]                          #登录hn3 

                                  依次执行hn1中执行的SSH配置指令(如上,略)

                      [root@hn3]#exit                                                             #从hn3中退出远程主机登录

[root@hn1]#scp [email protected]:/root/.ssh/id_rsa.pub /root/id_rsa_hn2.pub
[root@hn1]#scp [email protected]:/root/.ssh/id_rsa.pub /root/id_rsa_hn3.pub
[root@hn1]#cat /root/id_rsa_hn2.pub >> /root/.ssh/authorized_keys
[root@hn1]#cat /root/id_rsa_hn3.pub >> /root/.ssh/authorized_keys 
[root@hn1]#scp /root/.ssh/authorized_keys [email protected]:/root/.ssh/authorized_keys
[root@hn1]#scp /root/.ssh/authorized_keys [email protected]:/root/.ssh/authorized_keys

 此时,三台机器可以互相通过SSH远程登录。

五、hadoop文件配置

参考:ClusterSetup 

 主要配置文件在${HADOOP_HOME}/etc/hadoop目录下的core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml和workers。

core-site.xml


    
        fs.defaultFS
        hdfs://hn1:9000
    
    
        Hadoop.tmp.dir
        /home/Hadoop/tmp
    

hdfs-site.xml


    
        dfs.replication
        1
    
    
        dfs.namenode.name.dir
        file:///home/hadoop/dfs/name
    
    
        dfs.datanode.name.dir
        file:///home/hadoop/dfs/data
    
    
        dfs.namenode.http-address
        hn1:50070
    
    
        dfs.namenode.secondary.http-address
        hn1:50090
    
    
        dfs.permissions.enabled
        false
    

mapred-site.xml 


    
        mapreduce.framework.name
        yarn
    
    
        mapred.job.tracker
        hn1:100220
    
    
        mapred.jobhistory.webapp.address
        hn1:19888
    

yarn-site.xml  


    
        yarn.nodemanager.aux-service
        mapreduce_shuffle
    
    
        yarn.resourcemanager.hostname
        hn1
    
    
        yarn.nodemanager.local-dirs
        file:///home/hadoop/yarn/nm
    

workers内容(在Hadoop2.X中是修改slaves文件)

hn2
hn3

 六、Hadoop运行环境配置

修改 ${HADOOP_HOME}/sbin目录下:start-dfs.sh、stop-dfs.sh、start-yarn.sh、stop-yarn.sh

1、在start-dfs.sh、stop-dfs.sh开头部分添加以下环境

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_NAMENODE_SECONDARYNAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

2、在start-yarn.sh、stop-yarn.sh开头部分添加以下环境 

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

七、分发Hadoop配置至其他虚拟机  

#scp ${HADOOP_HOME}/etc/hadoop/*.* [email protected]:/home/hadoop/hadoop-3.1.1/etc/hadoop/
#scp ${HADOOP_HOME}/etc/hadoop/*.* [email protected]:/home/hadoop/hadoop-3.1.1/etc/hadoop/
#scp ${HADOOP_HOME}/sbin/*.* [email protected]:/home/hadoop/hadoop-3.1.1/sbin
#scp ${HADOOP_HOME}/sbin/*.* [email protected]:/home/hadoop/hadoop-3.1.1/sbin

八、Hadoop启动及文件管理

1、文件管理

#${HADOOP_HOME}/bin/hdfs -namenode format        #启动Hadoop前必须执行语句
#${HADOOP_HOME}/bin/hdfs dfs -chmod ugo+rwx /    #给根目录授权,这里允许所有用户都可以读写执行,请根据实际设置
#${HADOOP_HOME}/bin/hdfs dfs -mkdir /user        #在根目录下创建目录文件/user
#${HADOOP_HOME}/bin/hdfs dfs -rm -f -r /user     #在根目录下删除目录文件/user
#${HADOOP_HOME}/bin/hdfs dfs -rm -f -r /user/input  #在根目录下删除目录文件/user/input
#${HADOOP_HOME}/bin/hdfs dfs -put ${HADOOP_HOME}/etc/hadoop/*.xml /user/input/ 上传文件

2、启动

${HADOOP_HOME}/sbin/start-all.sh

 九、管理页面

Windows7中,打开firefox(IE会报js方面错误)。访问:http://192.168.56.100:8088

HDFS访问:http://192.168.56.100:50070/

 

 

你可能感兴趣的:(大数据)