Hadoop完全分布式集群式(master/slave)主从架构。
因为Hadoop是由java编写的,所以需要Java的环境支持,作为开发者我们需要安装jdk。
安装jdk的教程http://t.csdn.cn/6qJKg
下载Hadoop的安装包
Hadoop官网:http://hadoop.apache.org/
Hadoop版本下载地址:http://archive.apache.org/dist/hadoop/core/hadoop-3.3.1/hadoop-3.3.1.tar.gz
安装jdk之后再克隆
节点192.168.184.136
[[email protected] ~]# hostnamectl set-hostname master
[[email protected] ~]# reboot
[root@master ~]#
[root@master ~]# vi /etc/hosts
192.168.184.136 master hadoop.master.com
192.168.184.137 slave01 hadoop.slave01.com
192.168.184.138 slave02 hadoop.slave02.com
192.168.184.139 slave03 hadoop.slave03.com
节点192.168.184.137
[[email protected] ~]# hostnamectl set-hostname slave01
[[email protected] ~]# reboot
[root@slave01 ~]#
[root@slave01 ~]# vi /etc/hosts
192.168.184.136 master hadoop.master.com
192.168.184.137 slave01 hadoop.slave01.com
192.168.184.138 slave02 hadoop.slave02.com
192.168.184.139 slave03 hadoop.slave03.com
节点192.168.184.138
[[email protected] ~]# hostnamectl set-hostname slave02
[[email protected] ~]# reboot
[root@slave02 ~]#
[root@slave02 ~]# vi /etc/hosts
192.168.184.136 master hadoop.master.com
192.168.184.137 slave01 hadoop.slave01.com
192.168.184.138 slave02 hadoop.slave02.com
192.168.184.139 slave03 hadoop.slave03.com
节点192.168.184.139
[[email protected] ~]# hostnamectl set-hostname slave03
[[email protected] ~]# reboot
[root@slave03 ~]#
[root@slave03 ~]# vi /etc/hosts
192.168.184.136 master hadoop.master.com
192.168.184.137 slave01 hadoop.slave01.com
192.168.184.138 slave02 hadoop.slave02.com
192.168.184.139 slave03 hadoop.slave03.com
# 四台虚拟机都需要关闭掉防火墙
systemctl stop firewalld
[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:83y1SQv5wiYdQFuHuTvnWeSRGfhZJYLjs+za+mdkt+4 root@master
The key's randomart image is:
+---[RSA 2048]----+
| ..oo+ o|
| .ooo+ o.|
| .o. .. *|
| o... *.|
| S. o+.oo.|
| +oo=*o+o|
| .+o*==+ |
| ..+o.+ |
| o+oo oE |
+----[SHA256]-----+
[root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub master
[root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub slave01
[root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub slave02
[root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub slave03
验证ssh是否能够免密登录,
[root@master ~]# ssh slave01
Last login: Tue May 23 16:01:11 2023 from master
[root@slave01 ~]# exit
登出
Connection to slave01 closed.
[root@master ~]# ssh slave02
Last login: Tue May 23 15:57:48 2023 from slave01
[root@slave02 ~]# exit
登出
Connection to slave02 closed.
[root@master ~]# ssh slave03
Last login: Tue May 23 15:58:01 2023 from slave01
[root@slave03 ~]# exit
登出
Connection to slave03 closed
主要在 master
节点中安装即可,安装完毕可以通过scp
命令直接拷贝文件分发到不同的节点中。赋予用户/data/hadoop
目录的读写权限:
[root@master ~]# cd /usr/local
安装包:hadoop-3.3.1.tar.gz放置在/usr/local文件夹中
[root@master local]# ls
hadoop-3.3.1.tar.gz jdk1.8.0_291 jdk-8u291-linux-x64.tar.gz
[root@master hadoop]# tar -zxvf hadoop-3.3.1.tar.gz
[root@master ~]# vi /etc/profile
# 最后添加一下代码
export HADOOP_HOME=/usr/local/hadoop-3.3.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
[root@master ~]# source /etc/profile
[root@ master ~]# hadoop version
# 有以下信息说明安装成功
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /usr/local/hadoop/hadoop-3.3.1/share/hadoop/common/hadoop-common-3.3.1.jar
[root@master ~]# cd /usr/local/hadoop-3.3.1/etc/hadoop/
[root@master hadoop]# ll
总用量 176
-rw-r--r--. 1 1000 1000 9213 6月 15 2021 capacity-scheduler.xml
-rw-r--r--. 1 1000 1000 1335 6月 15 2021 configuration.xsl
-rw-r--r--. 1 1000 1000 2567 6月 15 2021 container-executor.cfg
-rw-r--r--. 1 1000 1000 774 6月 15 2021 core-site.xml
-rw-r--r--. 1 1000 1000 3999 6月 15 2021 hadoop-env.cmd
-rw-r--r--. 1 1000 1000 16654 6月 15 2021 hadoop-env.sh
-rw-r--r--. 1 1000 1000 3321 6月 15 2021 hadoop-metrics2.properties
-rw-r--r--. 1 1000 1000 11765 6月 15 2021 hadoop-policy.xml
-rw-r--r--. 1 1000 1000 3414 6月 15 2021 hadoop-user-functions.sh.example
-rw-r--r--. 1 1000 1000 683 6月 15 2021 hdfs-rbf-site.xml
-rw-r--r--. 1 1000 1000 775 6月 15 2021 hdfs-site.xml
-rw-r--r--. 1 1000 1000 1484 6月 15 2021 httpfs-env.sh
-rw-r--r--. 1 1000 1000 1657 6月 15 2021 httpfs-log4j.properties
-rw-r--r--. 1 1000 1000 620 6月 15 2021 httpfs-site.xml
-rw-r--r--. 1 1000 1000 3518 6月 15 2021 kms-acls.xml
-rw-r--r--. 1 1000 1000 1351 6月 15 2021 kms-env.sh
-rw-r--r--. 1 1000 1000 1860 6月 15 2021 kms-log4j.properties
-rw-r--r--. 1 1000 1000 682 6月 15 2021 kms-site.xml
-rw-r--r--. 1 1000 1000 13700 6月 15 2021 log4j.properties
-rw-r--r--. 1 1000 1000 951 6月 15 2021 mapred-env.cmd
-rw-r--r--. 1 1000 1000 1764 6月 15 2021 mapred-env.sh
-rw-r--r--. 1 1000 1000 4113 6月 15 2021 mapred-queues.xml.template
-rw-r--r--. 1 1000 1000 758 6月 15 2021 mapred-site.xml
drwxr-xr-x. 2 1000 1000 24 6月 15 2021 shellprofile.d
-rw-r--r--. 1 1000 1000 2316 6月 15 2021 ssl-client.xml.example
-rw-r--r--. 1 1000 1000 2697 6月 15 2021 ssl-server.xml.example
-rw-r--r--. 1 1000 1000 2681 6月 15 2021 user_ec_policies.xml.template
-rw-r--r--. 1 1000 1000 10 6月 15 2021 workers
-rw-r--r--. 1 1000 1000 2250 6月 15 2021 yarn-env.cmd
-rw-r--r--. 1 1000 1000 6329 6月 15 2021 yarn-env.sh
-rw-r--r--. 1 1000 1000 2591 6月 15 2021 yarnservice-log4j.properties
-rw-r--r--. 1 1000 1000 690 6月 15 2021 yarn-site.xml
# 查看本机名称
[root@ master hadoop]# hostname
master
[root@ master hadoop]# vi core-site.xml
添加如下代码
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://master:8020value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/data/hadoop/tempvalue>
property>
configuration>
fs.defaultFS
:nameNode
的HDFS协议的文件系统通信地址hadoop.tmp.dir
:Hadoop
集群在工作的时候存储的一些临时文件的目录[root@master hadoop]# vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_291
[root@master hadoop]# vi yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_291
[root@hadoop1 hadoop]# vi hdfs-site.xml
添加如下代码:
<configuration>
<property>
<name>dfs.namenode.name.dirname>
<value>file:///data/hadoop/dfs/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>file:///data/hadoop/dfs/datavalue>
property>
<property>
<name>dfs.replicationname>
<value>3value>
property>
<property>
<name>dfs.nomenode.secondary.http-addressname>
<value>master:50090value>
property>
<property>
<name>dfs.http.addressname>
<value>192.168.184.136:50070value>
property>
<property>
<name>dfs.permissionsname>
<value>falsevalue>
property>
configuration>
dfs.namenode.name.dir
:NameNode
的数据存放目录dfs.datanode.data.dir
:DataNode
的数据存放目录dfs.replication
:HDFS
的副本数dfs.secondary.http.address
:SecondaryNameNode
节点的HTTP
入口地址dfs.http.address
:通过HTTP
访问HDFS
的Web
管理界面的地址[root@ master hadoop]# vi mapred-site.xml
添加如下代码:
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>master:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>master:19888value>
property>
<property>
<name>yarn.app.mapreduce.am.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
<property>
<name>mapreduce.map.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
<property>
<name>mapreduce.reduce.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
configuration>
mapreduce.framework.name
:选用yarn
,也就是MR
框架使用YARN
进行资源调度。[root@ master hadoop]# vi yarn-site.xml
添加如下代码:
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.resourcemanager.addressname>
<value>192.168.184.136:8032value>
property>
<property>
<name>yarn.resourcemanager.scheduler.addressname>
<value>192.168.184.136:8030value>
property>
<property>
<name>yarn.resourcemanager.resource-tracker.addressname>
<value>192.168.184.136:8031value>
property>
<property>
<name>yarn.resourcemanager.admin.addressname>
<value>192.168.184.136:8033value>
property>
<property>
<name>yarn.resourcemanager.webapp.addressname>
<value>192.168.184.136:8088value>
property>
<property>
<name>yarn.log-aggregation-enablename>
<value>truevalue>
property>
configuration>
[root@master hadoop]# vi workers
# 删除localhost
# 根据集群填写从机名称
slave01
slave02
slave03
[root@master ~]# cd /usr/local/hadoop-3.3.1/sbin/
[root@master sbin]# vi start-dfs.sh
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
[root@master sbin]# vi stop-dfs.sh
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
[root@master sbin]# vi start-yarn.sh
YARN_RESOURCEmANAGER_USER=root
HADOOP_SECURE_DE_USER=yarn
YARN_NODEMANAGER_USER=root
[root@master sbin]# vi stop-yarn.sh
YARN_RESOURCEmANAGER_USER=root
HADOOP_SECURE_DE_USER=yarn
YARN_NODEMANAGER_USER=root
Hadoop
安装包位置和配置信息必须一致在节点 master
使用scp
命令进行分发:
[root@ master ~]# scp -r /usr/local/hadoop-3.3.1 root@slave01:/usr/local
[root@ master ~]# scp -r /usr/local/hadoop-3.3.1 root@slave02:/usr/local
[root@ master ~]# scp -r /usr/local/hadoop-3.3.1 root@slave03:/usr/local
[root@hadoop1 hadoop]# hdfs namenode -format
出现下以下情况说明成功
C:\Windows\System32\drivers\etc\host
host是一个没有扩展名的系统文件,起作用是将一些常用的域名与其对应的ip地址建立一个关联,
192.168.184.136 master hadoop.master.com
192.168.184.137 slave01 hadoop.slave01.com
192.168.184.138 slave02 hadoop.slave02.com
192.168.184.139 slave03 hadoop.slave03.com
[root@master hadoop]# start-dfs.sh
[root@master hadoop]# start-yarn.sh
[root@master hadoop]# jps
4150 SecondaryNameNode
5053 Jps
3886 NameNode
3498 resourcemanager
# 查看三个三台从机的信息
[root@slave01 hadoop]# jps
1376 DataNode
1656 Jps
7655 NodeManager
[root@slave02 hadoop]# jps
1378 DataNode
1610 Jps
7659 NodeManager
[root@slave03 hadoop]# jps
1380 DataNode
1661 Jps
7658 NodeManager
http://master:50070
http://master:8088/cluster
最后开启防火墙,放行相关的端口号,切记(防火墙相关命令)