Hadoop完全分布式集群环境搭建步骤

hadoop完全分布式集群环境搭建步骤

以下适合CentOS7以前的版本,如CentOS6.4CentOS6.5

1、准备多台机器:

   本次测试3台机器:

Ip:  192.168.140.131                                   192.168.140.132                                 192.168.140.133

主机名:hadoop-chenxiang01                          hadoop-chenxiang02                   hadoop-chenxiang03

 

2、修改主机名(3台机器都要修改):

         hostname + 主机名(临时生效)

         vi /etc/sysconfig/network   永久生效

3、修改主机名和ip地址映射(3台机器都要修改):

         vi /etc/hosts

                            192.168.140.131                        hadoop-chenxiang01                          localhost01

                            192.168.140.132                        hadoop-chenxiang02                          localhost02

                            192.168.140.133                        hadoop-chenxiang03                          localhost03

使用主机名是为了方便可以不用,直接用ip也可以

4、在/opt下创建目录app用于放集群(三台机器都要做)

                   mkdir /opt/app

         将该目录赋予某用户使用的权限:

                   chown  –R chenxiang:chenxiang  /opt/app

5、安装jdk(本次测试使用jdk7,三台都要安装)

         1)一般linux系统都自带有开源jdk,先将其卸载:

                   *查看jdk的已安装程序:rpm-qa | grep java

                   *卸载jdk: rpm  -e  --nodeps + 上一步输出的程序名

         2)将jdk解压到/opt/modules下,配置环境变量:vi  /etc/profile

                   *将如下内容放到该文件的最下面:

                            export JAVA_HOME=/opt/modules/jdk(jdk安装路径)

                            export PATH=$PATH:$JAVA_HOME/bin

                  *使用此命令生效:source  /etc/profile

                  *使用java  –version 检验是否安装成功

6、安装hadoop

         将hadoop-2.5.0.tar.gz解压到/opt/app下

                   tar –xzvf hadoop-2.5.0.tar.gz–C /opt/app

7、规划机器和服务

 

192.168.140.131

192.168.140.132

192.168.140.133

Hdfs

nameNode

 

 

dataNode

dataNode

dataNode

 

 

secondaryNameNode

yarn

 

resourceManager

 

nodeManager

nodeManager

nodeManager

mapReduce

 

 

 

jobHistoryServer

 

 

8、配置

         1)hdfs

                   *hadoop-env.sh

                            配置jdk: export JAVA_HOME=/opt/modules/jdk

                   *core-site.xml

                            (1)

                       fs.defaultFS

hdfs://hadoop-chenxiang01:8020 指定nameNode所在主机

                            (2) 修改hadoop默认临时文件目录(先在hadoop安装目录下创建data/tmp)

                   

                           hadoop.tmp.dir

                            /opt/app/hadoop-2.5.0/data/tmp

                   

                            (3)垃圾回收机制

                        fs.trash.interval

                        420

                

                   *hdfs-site.xml

                            (1)指定secondaryNameNode所在主机

                        dfs.namenode.secondary.http-address

                        hadoop-chenxiang03:50090

                

                   *slaves

                            (1)指定dataNode所在主机(在三台主机上)

                                     hadoop-chenxiang01

hadoop-chenxiang02

hadoop-chenxiang03

         2)yarn

                   *yarn-env.sh

                            配置jdk: export JAVA_HOME=/opt/modules/jdk

                   *yarn-site.xml

                            (1)指定resourceManager所在主机

                                    

                       yarn.resourcemanager.hostname

                       hadoop-chenxiang02

                   

                            (2)

                       yarn.nodemanager.aux-services

                       mapreduce_shuffle

               

                            (3)

                                    

                       yarn.nodemanager.resource.memory-mb

                       4096

                  

                            (4)

                                    

                       yarn.nodemanager.resource.cpu-vcores

                       4

               

                            (4)

                                    

                       yarn.log-aggregation-enable

                       true

                   

                            (5)

                                    

                       yarn.log-aggregation.retain-seconds

                       64088

                   

                   *slaves

                            (1)指定NodeManager所在主机(在三台主机上)

                                     hadoop-chenxiang01

hadoop-chenxiang02

hadoop-chenxiang03

         3)mapReduce

                   *mapred-env.sh

                            配置jdk: export JAVA_HOME=/opt/modules/jdk

                   *mapend-site.xml

                            (1)

                                    

                       mapreduce.framework.name

                       yarn

                   

                            (2)

                                    

                       mapreduce.jobhistory.address

                       hadoop-chenxiang01:10020

                   

                            (3)

                                    

                       mapreduce.jobhistory.webapp.address

                       hadoop-chenxiang01:19888

                   

9、分发hadoop安装包到各个机器节点

         1)配置ssh无密钥登录

                   根据以上机器和服务规划需要在.131和.132上对另外机器设置无密钥登录

                   (1)、生成公私密钥

                            *进入到用户主目录下的.ssh文件夹下生成公钥和私钥:

              [root@localhost .ssh]# ssh-keygen -t rsa 回车4次(在131,132上做)

                *将生成的密钥复制到需要连接的主机

              [root@localhost .ssh]# ssh-copy-id + 主机名(3台远程主机名,包含自身)

                            以上两步均在131,132上操作

                                     测试是否能连接远程主机:ssh+ 主机名

         2)将hadoop分发的各个机器节点

进入到hadoop的安装目录(131)将hadoop的整个目录copy到另外两台机器相应的目录下:scp  -r ./hadoop-2.5.0/  账户名@主机名:/opt/app

注意:分发前进入hadoop目录将share下的doc及其下的文件全部删除,该目录下全部是一些文档,无用,删除后分发速度会更快

至此整个集群环境搭建完毕,下面就可以启动测试了

[root@localhost hadoop-2.5.0]# bin/hdfsnamenode –format      格式化

[root@localhost hadoop-2.5.0]#sbin/start-dfs.sh  启动namenode,datanode,secondarynamenode

[root@localhost hadoop-2.5.0]#sbin/start-yarn.sh启动resourcemanager,nodemanager

[root@localhost hadoop-2.5.0]#jps         查看进程

你可能感兴趣的:(Hadoop完全分布式集群环境搭建步骤)