Hadoop全分布式集群配置指南

这里以centOS7为例

先配置好一台master,另外两台slave1和slave2克隆master即可

创建hadoop用户

必须在root用户下进行新用户的创建(或删除):

[root@master hadoop]# useradd test

[root@master hadoop]# passwd test     //为新用户修改密码

正式搭建

1、准备好需要的包:hadoop….tar.gz;jdk….tar.gz

 

2、解压到/home/hadoop/bigdata(~/bigdata)目录下

 

3、编辑环境变量:vim /etc/profiehadoop本地环境参数配置以及hadoop配置文件

Profie:

#JAVA       

export JAVA_HOME=/home/hadoop/bigdata/jdk

export PATH=${JAVA_HOME}/bin:$PATH

export JRE_HOME=${JAVA_HOME}/jre

#HADOOP

export HADOOP_HOME=/home/hadoop/bigdata/hadoop

export PATH=${HADOOP_HOME}/bin:$PATH

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native

export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib"

 

 

hadoop-env.sh:

sudo vim bigdata/hadoop/etc/hadoop/hadoop-env.sh   在此文件中确保有下面这两句

export JAVA_HOME=${JAVA_HOME}

export JAVA_HOME=/home/hadoop/bigdata/jdk

 

 

修改 core-site.xml

    

            hadoop.tmp.dir

            file:/home/user/bigdata/hadoop/tmp

            Abase for other temporary directories.

       

       

            fs.defaultFS

            hdfs://localhost:9000

       

      

hdfs-site.xml

        dfs.replication

        1

   

   

        dfs.namenode.name.dir

        file:/home/user/bigdata/hadoop/tmp/dfs/name

   

   

        dfs.datanode.data.dir

        file:/home/user/bigdata/hadoop/tmp/dfs/data

   

       

        dfs.blocksize

        268435456

   

   

        dfs.namenode.handler.count

        100

   

 

yarn-site.xml

    yarn.resourcemanager.hostname

    master

 

 

    yarn.nodemanager.aux-services

    mapreduce_shuffle

 

 

 yarn.nodemanager.auxservices.mapreduce.shuffle.class

    org.apache.hadoop.mapred.ShuffleHandler

 

 

        yarn.nodemanager.aux-services

        mapreduce_shuffle

 

        yarn.nodemanager.aux-services.mapreduce.shuffle.class

       org.apache.hadoop.mapred.ShuffleHandler

 

  

        yarn.resourcemanager.admin.address

        master:8033

  

  

        yarn.resourcemanager.address

        master:8032

  

  

        yarn.resourcemanager.resource-tracker.address

        master:8031

  

        yarn.resourcemanager.scheduler.address

        master:8030

  

  

        yarn.nodemanager.resource.memory-mb

        2048

  

  

        yarn.nodemanager.resource.cpu-vcores

        1

  

 

mapred-site.xml

        mapreduce.framework.name

        yarn

 

 

4、重启环境变量使其生效:source /etc/profie

确认:

[hadoop@master ~]$ java -version

java version "1.8.0_181"

Java(TM) SE Runtime Environment (build 1.8.0_181-b13)

Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

[hadoop@master ~]$ hadoop version

Hadoop 2.7.7

Subversion Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac

Compiled by stevel on 2018-07-18T22:47Z

Compiled with protoc 2.5.0

From source with checksum 792e15d20b12c74bd6f19a1fb886490

This command was run using /home/hadoop/bigdata/hadoop/share/hadoop/common/hadoop-common-2.7.7.jar

 

 

 

5、实现ssh无密码登陆:即master和slave可以随意切换 ssh master / ssh slave这样子

SSH协议在某些系统里不自带,所以在启动SSH协议前,需要进行ssh和rsync两个服务的检查,确认是否已经安装。Rsync是一个远程数据同步工具,可通过LAN/WAN快速同步多台主机间的文件:

[hadoop@master ~]$ rpm -qa | grep openssh

openssh-clients-7.4p1-16.el7.x86_64

openssh-server-7.4p1-16.el7.x86_64

openssh-7.4p1-16.el7.x86_64

[hadoop@master ~]$ rpm -qa | grep rsync

rsync-3.1.2-4.el7.x86_64

如果没有:yum install -y openssh-clients

                       openssh-server

                       rsync

ssh安装与配置:

[hadoop@master ~]$ sudo vim /etc/ssh/sshd_config

确保下面三句存在(没有就打上去hhhhhg)

 

[hadoop@master ~]$ hostname

master

[hadoop@master ~]$ ssh-key

ssh-keygen   ssh-keyscan 

[hadoop@master ~]$ ssh-keygen -t rsa -P ''

//出现问题时直接点回车,-P ‘’(中间有空格 即认为密码是空)

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

Created directory '/home/hadoop/.ssh'.

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

SHA256:gQ6c9NzxdQ8Amc36e8yo0UT/d8xj6ZSJEef2+g3pqg8 hadoop@master

The key's randomart image is:

+---[RSA 2048]----+

|    .   . .*o.o  |

|   o + o oo.o. o |

|    + + o .... ..|

|     o   ... .+  |

|      . S  ....o |

|           o. +==|

|          E .*oBB|

|           oo.Bo=|

|          o+ooooo|

+----[SHA256]-----+

 

sshd_config中指定的公钥路径:authorized_keys

 

这是一对密钥对:

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

 

这是错的:[hadoop@master ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/id_rsa

id_rsa      id_rsa.pub 

 

authorized_keys需要自己创建,但是这里只要直接输就可以

将生成的密钥写入sshd_config指定公钥文件路径中~/.ssh/authorized_keys

[hadoop@master ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[hadoop@master ~]$ chmod 600 ~/.ssh/authorized_keys

[hadoop@master ~]$ sudo systemctl restart sshd.service  //centOS7的服务命令有些不一样

[hadoop@master ~]$ ssh master

The authenticity of host 'master (fe80::d514:6ab7:d788:8294%ens33)' can't be established.

ECDSA key fingerprint is SHA256:E30LEGLP/hSoJ5Er+gjh5INzhzIf5OCCTmZad5+7yaU.

ECDSA key fingerprint is MD5:13:45:bc:c5:63:bd:87:55:b2:c3:72:f4:85:5c:91:18.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'master,fe80::d514:6ab7:d788:8294%ens33' (ECDSA) to the list of known hosts.

Last login: Fri Mar 29 19:05:41 2019 from 192.168.40.1

[hadoop@master ~]$

 

6、网络配置

①修改IP地址:

[hadoop@master ~]$ ip addr

1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

2: ens33: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 00:0c:29:f7:f4:11 brd ff:ff:ff:ff:ff:ff

    inet 192.168.40.160/24 brd 192.168.40.255 scope global noprefixroute dynamic ens33

       valid_lft 1218sec preferred_lft 1218sec

    inet6 fe80::d514:6ab7:d788:8294/64 scope link noprefixroute

       valid_lft forever preferred_lft forever

我们用的网卡是ens33;通过下面的步骤看网关

Hadoop全分布式集群配置指南_第1张图片

[hadoop@master bigdata]$ sudo cp /etc/sysconfig/network-scripts/ifcfg-ens33 /etc/sysconfig/network-scripts/ifcfg-ens33-beifen

[hadoop@master bigdata]$ sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33

TYPE=Ethernet #配置为互联网网卡

BOOTPROTO=static #配置获取IP地址形式为静态获取

IPADDR=192.168.40.160 #配置网卡地址

NETMASK=255.255.255.0 #配置子网掩码

NETWORK=192.168.40.0 #配置网络地址(虚拟机中叫子网地址)

NAME=enp0s3 #配置网卡名字

DEVICE=ens33 #网卡硬件名字

ONBOOT=yes #开启自动重启

DNS1=114.114.114.114 #配置dns

GATEWAY=192.168.40.2 #配置网关

[hadoop@master bigdata]$ sudo systemctl restart network

[hadoop@master bigdata]$ ping www.baidu.com

PING www.a.shifen.com (111.13.100.92) 56(84) bytes of data.

64 bytes from 111.13.100.92 (111.13.100.92): icmp_seq=1 ttl=128 time=52.4 ms

64 bytes from 111.13.100.92 (111.13.100.92): icmp_seq=2 ttl=128 time=50.9 ms

 

7、时间同步

①所有机器时区一致的设置方法:

[hadoop@master bigdata]$ date

Sat Mar 30 09:35:54 CST 2019

[hadoop@master bigdata]$ tzselect

Please identify a location so that time zone rules can be set correctly.

Please select a continent or ocean.

 1) Africa

 2) Americas

 3) Antarctica

 4) Arctic Ocean

 5) Asia

 6) Atlantic Ocean

 7) Australia

 8) Europe

 9) Indian Ocean

10) Pacific Ocean

11) none - I want to specify the time zone using the Posix TZ format.

#?

 1) Africa

 2) Americas

 3) Antarctica

 4) Arctic Ocean

 5) Asia

 6) Atlantic Ocean

 7) Australia

 8) Europe

 9) Indian Ocean

10) Pacific Ocean

11) none - I want to specify the time zone using the Posix TZ format.

#? y

Please enter a number in range.

#? 5

Please select a country.

 1) Afghanistan             18) Israel               35) Palestine

 2) Armenia            19) Japan              36) Philippines

 3) Azerbaijan               20) Jordan             37) Qatar

 4) Bahrain             21) Kazakhstan     38) Russia

 5) Bangladesh              22) Korea (North)         39) Saudi Arabia

 6) Bhutan             23) Korea (South)         40) Singapore

 7) Brunei               24) Kuwait            41) Sri Lanka

 8) Cambodia                25) Kyrgyzstan      42) Syria

 9) China                26) Laos                43) Taiwan

10) Cyprus              27) Lebanon                 44) Tajikistan

11) East Timor               28) Macau            45) Thailand

12) Georgia            29) Malaysia                 46) Turkmenistan

13) Hong Kong              30) Mongolia               47) United Arab Emirates

14) India          31) Myanmar (Burma)          48) Uzbekistan

15) Indonesia          32) Nepal              49) Vietnam

16) Iran            33) Oman             50) Yemen

17) Iraq            34) Pakistan

#? 9

Please select one of the following time zone regions.

1) Beijing Time

2) Xinjiang Time

#? 1

 

The following information has been given:

 

       China

       Beijing Time

 

Therefore TZ='Asia/Shanghai' will be used.

Local time is now: Sat Mar 30 09:37:10 CST 2019.

Universal Time is now:  Sat Mar 30 01:37:10 UTC 2019.

Is the above information OK?

1) Yes

2) No

#? yes

Please enter 1 for Yes, or 2 for No.

#? 1

 

You can make this change permanent for yourself by appending the line

       TZ='Asia/Shanghai'; export TZ

to the file '.profile' in your home directory; then log out and log in again.

 

Here is that TZ value again, this time on standard output so that you

can use the /usr/bin/tzselect command in shell scripts:

Asia/Shanghai

 

[hadoop@master bigdata]$ hwclock -w

hwclock: Sorry, only the superuser can change the Hardware Clock.

[hadoop@master bigdata]$ sudo hwclock -w

 

②设置时间同步的主服务器

[hadoop@master bigdata]$ rpm -qa | grep ntp

fontpackages-filesystem-1.44-8.el7.noarch

ntp-4.2.6p5-28.el7.centos.x86_64

ntpdate-4.2.6p5-28.el7.centos.x86_64

[hadoop@master bigdata]$ sudo vim /etc/ntp.conf   //追加

server 127.127.1.0

Fudge 127.127.1.0 stratum 10

[hadoop@master bigdata]$ sudo systemctl stop firewalld

==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===

Authentication is required to manage system services or units.

Authenticating as: root

Password:

==== AUTHENTICATION COMPLETE ===

[hadoop@master bigdata]$ sudo systemctl restart ntpd

==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===

Authentication is required to manage system services or units.

Authenticating as: root

Password:

==== AUTHENTICATION COMPLETE ===

[hadoop@master bigdata]$ watch ntpq -p

Hadoop全分布式集群配置指南_第2张图片

 

③节点机的配置

这里就可以开始克隆啦~

克隆之后的账号密码都与原来的一样

hadoop 123456

可以看到需要我们更改主机名master为node1以及IP地址

Hadoop全分布式集群配置指南_第3张图片

克隆出来的机器改:

改主机名:hostnamectl set-hostname node1

改ip:sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33(通过ip addr查看网卡是哪一块)

以及配置文件…

所有机器都要改映射:

sudo vim /etc/hosts

192.168.40.160 master

192.168.40.40 node1

192.168.40.41 node2

改完之后互相ping 主机名和www.baidu.com

 

所有机器都执行(xshell发送键盘输入的所有会话)

[hadoop@master ~]$ sudo vim /etc/crontab

加最后一句实现定时自动同步时间

Hadoop全分布式集群配置指南_第4张图片

 

所有机器:

[hadoop@master hadoop]$ vim etc/hadoop/slaves

node1

node2

 

 

确保所有机器的防火墙都已关闭

[hadoop@master hadoop]$sudo systemctl disable firewalld.service

只在master上格式化和启动

[hadoop@master hadoop]$ hdfs namenode -format

[hadoop@master hadoop]$ sbin/start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

19/03/30 22:00:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [master]

master: starting namenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-namenode-master.out

node2: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-node2.out

node1: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-node1.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out

19/03/30 22:01:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-resourcemanager-master.out

node2: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-node2.out

node1: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-node1.out

[hadoop@master hadoop]$ jps

7984 SecondaryNameNode

7687 NameNode

8135 ResourceManager

8392 Jps

Start-all.sh之后才会有logs

[hadoop@master hadoop]$ hdfs dfsadmin -report

19/03/30 22:02:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Configured Capacity: 36477861888 (33.97 GB)

Present Capacity: 30252290048 (28.17 GB)

DFS Remaining: 30252281856 (28.17 GB)

DFS Used: 8192 (8 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

Missing blocks (with replication factor 1): 0

 

-------------------------------------------------

Live datanodes (2):

 

Name: 192.168.40.40:50010 (node1)

Hostname: node1

Decommission Status : Normal

Configured Capacity: 18238930944 (16.99 GB)

DFS Used: 4096 (4 KB)

Non DFS Used: 3112660992 (2.90 GB)

DFS Remaining: 15126265856 (14.09 GB)

DFS Used%: 0.00%

DFS Remaining%: 82.93%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sat Mar 30 22:02:14 CST 2019

 

Name: 192.168.40.41:50010 (node2)

Hostname: node2

Decommission Status : Normal

Configured Capacity: 18238930944 (16.99 GB)

DFS Used: 4096 (4 KB)

Non DFS Used: 3112910848 (2.90 GB)

DFS Remaining: 15126016000 (14.09 GB)

DFS Used%: 0.00%

DFS Remaining%: 82.93%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sat Mar 30 22:02:16 CST 2019

你可能感兴趣的:(Hadoop)