Hadoop是一个开源的分布式计算和存储框架,由Apache基金会开发和维护。Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
Hadoop使用Java开发,所以可以在多种不同硬件平台的计算机上部署和使用。其核心部件包括分布式文件系统 (Hadoop DFS,HDFS) 和MapReduce。
Hadoop Distributed File System,Hadoop分布式文件系统,简称HDFS
Hadoop 1.x版本默认为64M,Hadoop 2.x版本之后默认为128M。
SecondaryNameNode有两个作用:一是镜像备份,二是日志与镜像的定期合并。两个过程同时进行,称为checkpoint。
镜像备份:定期备份fsimage文件;
Checkpoint:将日志与镜像的定期合并操作,避免edit log过大,通过创建检查点checkpoint来合并(合并周期与镜像大小可以通过core-site.xml设置)。它会维护一个合并后的namespace image副本, 可用于在Namenode完全崩溃时恢复数据,因此SNN并非是NN的备用节点,并不会再NN异常时主动切换。具体流程如下:
Secondary Namenode通常运行在另一台机器,因为合并操作需要耗费大量的CPU和内存。其数据落后于Namenode,因此当Namenode完全崩溃时,会出现数据丢失问题。 通常做法是拷贝NFS中的备份元数据到Secondary NameNode,将其作为新的Namenode。
DataNode通常直接从磁盘读取数据,但是频繁使用的Block可以在内存中缓存。
默认情况下,一个Block只有一个数据节点会缓存。但是可以针对每个文件设置个性化配置,作业调度器可以利用缓存提升性能,例如MapReduce可以把任务运行在有Block缓存的节点上,用户或者应用可以向NameNode发送缓存指令(缓存哪个文件,缓存多久), 缓存池的概念用于管理一组缓存的权限和资源。
我们知道NameNode的内存会制约文件数量,HDFS Federation提供了一种横向扩展NameNode的方式。在Federation模式中,每个NameNode管理命名空间的一部分,例如一个NameNode管理/user目录下的文件, 另一个NameNode管理/share目录下的文件。
每个NameNode管理一个namespace volumn,所有volumn构成文件系统的元数据。每个NameNode同时维护一个Block Pool,保存Block的节点映射等信息。各NameNode之间是独立的,一个节点的失败不会导致其他节点管理的文件不可用。
客户端使用mount table将文件路径映射到NameNode。mount table是在Namenode群组之上封装了一层,这一层也是一个Hadoop文件系统的实现,通过viewfs:协议访问。
参考官网文章
在读取和写入的过程中,namenode在分配Datanode的时候,会考虑节点之间的距离。HDFS中距离没有
采用带宽来衡量,因为实际中很难准确度量两台机器之间的带宽。
Hadoop把机器之间的拓扑结构组织成树结构,并且用到达公共父节点所需跳转数之和作为距离。事实上这是一个距离矩阵的例子。下面的例子简明地说明了距离的计算:
注意:Hadoop集群的拓扑结构需要手动配置,如果没配置,Hadoop默认所有节点位于同一个数据中心的同一机架上!
元数据持久化为2种形式:namespcae image 镜像文件 + edit log 日志文件,存储路径通过在core-site.xml配置文件里设置参数hadoop.tmp.dir的值。
可以看到在dfs文件下的name目录存放edits与fsimage文件,其内容均为二进制,无法直接查看(由于数据量较大,因此HDFS取消了编码效果,可以通过工具查看)
$ hdfs oev -i <edits/fsimage_file> -o <tmp_file.xml>
# 将二进制文件输出为xml格式文件查看,文件路径和名称支持自定义
Hadoop 1.x 架构
此架构会有以下问题:
Hadoop 2.x 架构
YARN就是将JobTracker的职责进行拆分,将资源管理和任务调度监控拆分成独立的进程:一个全局的资源管理和一个每个作业的管理(ApplicationMaster)ResourceManager和NodeManager提供了计算资源的分配和管理,而ApplicationMaster则完成应用程序的运行。
两种版本架构对比
YARN架构下形成了一个通用的资源管理平台和一个通用的应用计算平台,避免了旧架构的单点问题和资源利用率问题,同时也让运行的应用不再局限于MapReduce形式。
YARN将JobTracker拆分为四个组件 + 一个故障组件。
负责全局的资源管理和任务调度,把整个集群当成计算资源池,只关注分配,不负责应用与容错。
资源管理
任务调度
内部结构
Node节点下的Container管理。
内部结构
NodeStatusUpdater: 启动向ResourceManager注册,报告该节点的可用资源情况,通信的端口和后续状态的维护;
ContainerManager: 接收RPC请求(启动、停止),资源本地化(下载应用需要的资源到本地,根据需要共享这些资源)
PUBLIC: /filecache
PRIVATE: /usercache//filecache
APPLICATION: /usercache//appcache//(在程序完成后会被删除)
ContainersLauncher: 加载或终止Container;
ContainerMonitor: 监控Container的运行和资源使用情况;
ContainerExecutor: 和底层操作系统交互,加载要运行的程序。
单个作业的资源管理和任务监控。
功能描述
ApplicationMaster可以是用任何语言编写的程序,它和ResourceManager和NodeManager之间是通过ProtocolBuf交互,以前是一个全局的JobTracker负责的,现在每个作业都一个,可伸缩性更强,至少不会因为作业太多,造成JobTracker瓶颈。同时将作业的逻辑放到一个独立的ApplicationMaster中,使得灵活性更加高,每个作业都可以有自己的处理方式,不用绑定到MapReduce的处理模式上。
一般的MapReduce是根据Block数量来定Map和Reduce的计算数量,然后一般的Map或Reduce就占用一个Container;数据本地化是通过HDFS的Block分片信息获取的。
Job任务失败
运行时异常或者JVM退出都会报告给ApplicationMaster;
通过心跳来检查挂住的任务(timeout),会检查多次(可配置)才判断该任务是否失效;
一个作业的任务失败率超过配置,则认为该作业失败;
失败的任务或作业都会有ApplicationMaster重新运行。
ApplicationMaster失败
NodeManager失败
ResourceManager失败
MapReduce的思想就是“分而治之”,Map负责 分,Reduce负责 治。
一种分布式的计算方式指定一个Map函数,用来把一组键值对映射成一组新的键值对,指定并发的Reduce(归约)函数,用来保证所有映射的键值对中的每一个共享相同的键组。
map: (K1, V1) → list(K2, V2) combine: (K2, list(V2)) → list(K2, V2) reduce: (K2, list(V2)) → list(K3, V3)
MapReduce主要是先读取文件数据,然后进行Map处理,接着Reduce处理,最后把处理结果写到文件中,因此Map输出格式和Reduce输入格式一定是相同的。
Mapper负责“分”,即把复杂的任务分解为若干个“简单的任务”来处理。“简单的任务”包含三层含义:
Reducer负责对map阶段的结果进行汇总。至于需要多少个Reducer,用户可以根据具体问题,通过在mapred-site.xml配置文件里设置参数mapred.reduce.tasks的值,默认为1。
通过InputFormat决定读取的数据的类型,然后拆分成一个个InputSplit,每个InputSplit对应一个Map处理,RecordReader读取InputSplit的内容给Map。
决定读取数据的格式,可以是文件或数据库等。
代表一个个逻辑分片,并没有真正存储数据,只是提供了一个如何将数据分片的方法,Split内有Location信息,利于数据局部化,一个InputSplit给一个单独的Map处理。
将InputSplit拆分成一个个
主要是读取InputSplit的每一个Key,Value对并进行处理
对Map的结果进行排序并传输到Reduce进行处理 Map的结果并不是直接存放到硬盘,而是利用缓存做一些预排序处理 Map会调用Combiner,压缩,按key进行分区、排序等,尽量减少结果的大小 每个Map完成后都会通知Task,然后Reduce就可以进行处理。
[root@master hadoop]# ll
total 116
drwxr-xr-x 2 1024 1024 4096 Jul 29 2022 bin
drwxr-xr-x 3 1024 1024 4096 Jul 29 2022 etc
drwxr-xr-x 2 1024 1024 4096 Jul 29 2022 include
drwxr-xr-x 3 1024 1024 4096 Jul 29 2022 lib
drwxr-xr-x 4 1024 1024 4096 Jul 29 2022 libexec
-rw-rw-r-- 1 1024 1024 24707 Jul 29 2022 LICENSE-binary
drwxr-xr-x 2 1024 1024 4096 Jul 29 2022 licenses-binary
-rw-rw-r-- 1 1024 1024 15217 Jul 17 2022 LICENSE.txt
-rw-rw-r-- 1 1024 1024 29473 Jul 17 2022 NOTICE-binary
-rw-rw-r-- 1 1024 1024 1541 Apr 22 2022 NOTICE.txt
-rw-rw-r-- 1 1024 1024 175 Apr 22 2022 README.txt
drwxr-xr-x 3 1024 1024 4096 Jul 29 2022 sbin
drwxr-xr-x 4 1024 1024 4096 Jul 29 2022 share
[root@master hadoop]# ll etc/hadoop/
total 180
-rw-r--r-- 1 1024 1024 9213 Jul 29 2022 capacity-scheduler.xml
-rw-r--r-- 1 1024 1024 1335 Jul 29 2022 configuration.xsl
-rw-r--r-- 1 1024 1024 2567 Jul 29 2022 container-executor.cfg
-rw-r--r-- 1 1024 1024 970 Jul 11 14:08 core-site.xml # 核心站点配置
-rw-r--r-- 1 1024 1024 3999 Jul 29 2022 hadoop-env.cmd
-rw-r--r-- 1 1024 1024 16721 Jul 11 14:15 hadoop-env.sh # Hadoop环境变量
-rw-r--r-- 1 1024 1024 3321 Jul 29 2022 hadoop-metrics2.properties
-rw-r--r-- 1 1024 1024 11765 Jul 29 2022 hadoop-policy.xml
-rw-r--r-- 1 1024 1024 3414 Jul 29 2022 hadoop-user-functions.sh.example
-rw-r--r-- 1 1024 1024 683 Jul 29 2022 hdfs-rbf-site.xml
-rw-r--r-- 1 1024 1024 1040 Jul 11 14:02 hdfs-site.xml # HDFS配置
-rw-r--r-- 1 1024 1024 1484 Jul 29 2022 httpfs-env.sh
-rw-r--r-- 1 1024 1024 1657 Jul 29 2022 httpfs-log4j.properties
-rw-r--r-- 1 1024 1024 620 Jul 29 2022 httpfs-site.xml
-rw-r--r-- 1 1024 1024 3518 Jul 29 2022 kms-acls.xml
-rw-r--r-- 1 1024 1024 1351 Jul 29 2022 kms-env.sh
-rw-r--r-- 1 1024 1024 1860 Jul 29 2022 kms-log4j.properties
-rw-r--r-- 1 1024 1024 682 Jul 29 2022 kms-site.xml
-rw-r--r-- 1 1024 1024 13700 Jul 29 2022 log4j.properties
-rw-r--r-- 1 1024 1024 951 Jul 29 2022 mapred-env.cmd
-rw-r--r-- 1 1024 1024 1764 Jul 29 2022 mapred-env.sh # MapReduce环境变量
-rw-r--r-- 1 1024 1024 4113 Jul 29 2022 mapred-queues.xml.template
-rw-r--r-- 1 1024 1024 758 Jul 29 2022 mapred-site.xml # MapReduce配置
drwxr-xr-x 2 1024 1024 4096 Jul 29 2022 shellprofile.d
-rw-r--r-- 1 1024 1024 2316 Jul 29 2022 ssl-client.xml.example
-rw-r--r-- 1 1024 1024 2697 Jul 29 2022 ssl-server.xml.example
-rw-r--r-- 1 1024 1024 2681 Jul 29 2022 user_ec_policies.xml.template
-rw-r--r-- 1 1024 1024 10 Jul 29 2022 workers
-rw-r--r-- 1 1024 1024 2250 Jul 29 2022 yarn-env.cmd
-rw-r--r-- 1 1024 1024 6329 Jul 29 2022 yarn-env.sh # YARN环境变量
-rw-r--r-- 1 1024 1024 2591 Jul 29 2022 yarnservice-log4j.properties
-rw-r--r-- 1 1024 1024 690 Jul 29 2022 yarn-site.xml # YARN配置
若是在etc/hadoop/目录下没有mapred-site.xml文件,倒是有一个mapred-site.xml.template样板,可以复制一份进行修改配置。
官网安装包下载链接:https://hadoop.apache.org/releases.html
,下载二进制文件即可。
Hadoop单机部署的情况下没有太大必要部署YARN服务,只是单纯部署HDFS作为测试环境完成实验即可。
[root@master tar]# tar -xf hadoop-3.3.4.tar.gz -C /middleware/
[root@master tar]# cd /middleware/
[root@master middleware]# ln -s hadoop-3.3.4/ hadoop
[root@master middleware]# cd hadoop
[root@master hadoop]# ll
total 116
drwxr-xr-x 2 1024 1024 4096 Jul 29 2022 bin
drwxr-xr-x 3 1024 1024 4096 Jul 29 2022 etc
drwxr-xr-x 2 1024 1024 4096 Jul 29 2022 include
drwxr-xr-x 3 1024 1024 4096 Jul 29 2022 lib
drwxr-xr-x 4 1024 1024 4096 Jul 29 2022 libexec
-rw-rw-r-- 1 1024 1024 24707 Jul 29 2022 LICENSE-binary
drwxr-xr-x 2 1024 1024 4096 Jul 29 2022 licenses-binary
-rw-rw-r-- 1 1024 1024 15217 Jul 17 2022 LICENSE.txt
-rw-rw-r-- 1 1024 1024 29473 Jul 17 2022 NOTICE-binary
-rw-rw-r-- 1 1024 1024 1541 Apr 22 2022 NOTICE.txt
-rw-rw-r-- 1 1024 1024 175 Apr 22 2022 README.txt
drwxr-xr-x 3 1024 1024 4096 Jul 29 2022 sbin
drwxr-xr-x 4 1024 1024 4096 Jul 29 2022 share
[root@master hadoop]# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/middleware/jdk
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
[root@master hadoop]# source etc/hadoop/hadoop-env.sh
[root@master hadoop]# ./bin/hadoop version
Hadoop 3.3.4
Source code repository https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb
Compiled by stevel on 2022-07-29T12:32Z
Compiled with protoc 3.7.1
From source with checksum fb9dd8918a7b8a5b430d61af858f6ec
This command was run using /middleware/hadoop-3.3.4/share/hadoop/common/hadoop-common-3.3.4.jar
[root@master hadoop]# vim etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/middleware/hadoop/</value>
</property>
</configuration>
[root@master hadoop]# vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>0.0.0.0:50070</value>
</property>
</configuration>
[root@master hadoop]# bin/hdfs namenode -format
~
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/9.134.244.180
************************************************************/
[root@master hadoop]# tree dfs/
dfs/
└── name
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION
2 directories, 4 files
[root@master hadoop]# ./sbin/start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [localhost]
Last login: Tue Jul 11 13:55:17 CST 2023 from 10.95.19.138 on pts/0
localhost: ssh: connect to host localhost port 22: Connection refused
Starting datanodes
Last login: Tue Jul 11 14:10:44 CST 2023 on pts/0
localhost: ssh: connect to host localhost port 22: Connection refused
Starting secondary namenodes [master]
Last login: Tue Jul 11 14:10:44 CST 2023 on pts/0
master: ssh: connect to host master port 22: Connection refused
[root@master hadoop]# vim etc/hadoop/hadoop-env.sh
export HADOOP_SSH_OPTS="-p 36000"
[root@master hadoop]# ./sbin/start-dfs.sh
环境准备:JDK安装与升级 + 集群免密登录。
# 制作密钥文件(持续回车保持默认配置)
[root@master hadoop]# ssh-keygen -t rsa -b 4096 # t为加密算法类型,b为秘钥长度
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): # 秘钥文件存储位置
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:FbNOpySF2pNAo2vR9L53bm91fSgJf/MYnHBwTenf35o [email protected]
The key's randomart image is:
+---[RSA 4096]----+
| .+ .+ oo|
| +.o.. +. ...|
| o .+o.= .o. |
| o..+*.o. .. |
| o S.oo = o+|
| . . + B B|
| . . .o *=|
| . o. o.+|
| ..oE. |
+----[SHA256]-----+
# 集群节点互相发放密钥文件,首次发放需要进行登录密码验证,三台机器共计发放秘钥文件六次(本机不需要发送到本机)
[root@master hadoop]# ssh-copy-id master
[root@master hadoop]# ssh-copy-id node1
[root@master hadoop]# ssh-copy-id node2
# 配置hosts,将IP替换为当前环境下地址即可
[root@master hadoop]# vim /etc/hosts
# [hadoop_cluster]
$ip1 master
$ip2 node1
$ip3 node2
# 解压过程省略
[root@master hadoop]# vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/middleware/jdk
export HADOOP_SSH_OPTS="-p 36000"
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
[root@master hadoop]# vim etc/hadoop/core-site.xml
<configuration>
<!--指定HDFS的数量-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--secondary namenode 所在主机的IP和端口-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
[root@master hadoop]# vim etc/hadoop/hdfs-site.xml
<configuration>
<!--指定HDFS的数量-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--secondary namenode 所在主机的IP和端口-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:50090</value>
</property>
</configuration>
[root@master hadoop]# vim etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@master hadoop]# vim etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8034</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
</configuration>
[root@master hadoop]# vim workers
master
node1
node2
# 格式化HDFS
[root@master hadoop]# ./bin/hdfs namenode -format
~
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/9.134.244.180
************************************************************/
# 启动服务
[root@master hadoop]# ./sbin/start-dfs.sh
Starting namenodes on [master]
Last login: Fri Sep 8 12:30:14 CST 2023 from 10.91.28.109 on pts/2
Starting datanodes
Last login: Fri Sep 8 15:24:19 CST 2023 on pts/1
node2: WARNING: /middleware/hadoop-3.3.4/logs does not exist. Creating.
node1: WARNING: /middleware/hadoop-3.3.4/logs does not exist. Creating.
Starting secondary namenodes [node1]
Last login: Fri Sep 8 15:24:21 CST 2023 on pts/1
# 进程验证
[root@master hadoop]# jps
806826 Jps
801889 DataNode
801667 NameNode
[root@node1 middleware]# jps
1109584 Jps
1095393 DataNode
1095563 SecondaryNameNode
[root@node2 /middleware]# jps
29733 Jps
12063 DataNode
HDFS Web端验证:http://9.134.244.180:50090,端口配置参考本文2.1章节core-site.xml中dfs.namenode.http-address参数详解。
HDFS参数详解:
-ls # 查看目录
-mkdir # 创建目录
-rmdir # 删除空目录
-rm # 删除文件或文件夹
-r # 强制删除非空目录
-moveFromLocal # 从本地剪切到HDFS
-copyFromLocal # 从本地复制到HDFS
-copyToLocal # 从HDFS复制到本地
-appendToFile # 追加文件到末尾
-cat # 查看文件内容
-cp # 在HDFS上复制文件
-mv # 在HDFS上移动文件
-get # = copyToLocal
-put # = copyFromLocal
-getmerge # 合并下载多个文件
-tail # 从末尾查看
-chgrp/-chmod/-chwon # 修改文件所属权限
-du # 统计文件夹的大小
-setrep # 设置HDFS中文件副本数(取决于DN数量)
# 上传或下载测试
[root@master hadoop]# echo "this is test!" > ./aaa.txt
[root@master hadoop]# ./bin/hdfs dfs -put ./aaa.txt /
[root@master hadoop]# ./bin/hdfs dfs -ls /
Found 1 items
-rw-r--r-- 3 root supergroup 14 2023-09-11 14:29 /aaa.txt
HDFS的数据存储路径为core-site.xml内dfs.datanode.data.dir参数的值。
[root@master hadoop]# tree dfs/data/current/BP-1096417423-9.134.244.180-1694157712400/current/finalized/subdir0/subdir0/
dfs/data/current/BP-1096417423-9.134.244.180-1694157712400/current/finalized/subdir0/subdir0/
├── blk_1073741825
├── blk_1073741825_1001.meta
0 directories, 2 files
[root@master hadoop]# cd dfs/data/current/BP-1096417423-9.134.244.180-1694157712400/current/finalized/subdir0/subdir0/
[root@master subdir0]# cat blk_1073741825
this is test!
[root@node1 hadoop]# cd dfs/data/current/BP-1096417423-9.134.244.180-1694157712400/current/finalized/subdir0/subdir0/
[root@node1 subdir0]# cat blk_1073741825
this is test!
[root@node2 hadoop]# cd dfs/data/current/BP-1096417423-9.134.244.180-1694157712400/current/finalized/subdir0/subdir0/
[root@node2 subdir0]# cat blk_1073741825
this is test!
# 三台机器的存储数据完全一致,原因:core-site.xml内dfs.replication的副本数量默认值为3
# 配置MapReduce、YARN并分发到node1与node2
[root@master hadoop]# vimetc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@master hadoop]# vim etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.nodemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
</configuration>
[root@master hadoop]# ./sbin/start-yarn.sh
Starting resourcemanager
Last login: Mon Sep 11 14:35:42 CST 2023 on pts/2
Starting nodemanagers
Last login: Mon Sep 11 15:19:31 CST 2023 on pts/1
# 进程验证
[root@master hadoop]# jps
809885 Jps
809508 NodeManager
809302 ResourceManager
801889 DataNode
801667 NameNode
[root@node1 hadoop]# jps
1095393 DataNode
1117415 Jps
1095563 SecondaryNameNode
1116957 NodeManager
[root@node2 /middleware]# jps
6407 NodeManager
6909 Jps
12063 DataNode
YARN Web端验证:http://9.134.244.180:8088,端口配置参考本文2.4章节yarn-site.xml中yarn.resourcemanager.webapp.address参数详解。
# 创建测试环境,上传一个文本文档到HDFS中
[root@master hadoop]# ./bin/hdfs dfs -mkdir -p /input
[root@master hadoop]# vim word.txt
I’m LiHua , a Chinese student taking summer course in your university . I’m writing to ask for help . I came here last month and found my courses interesting .But I have some difficulty with note-taking and I have no idea of how to use the library . I was told the learning center provides help for students and I’m anxious to get help from you. I have no class on Tuesdays mornings and Friday afternoons . Please let me know which day is ok with you. You may email or phone me . Here are my email address and phone number :[email protected] ; 1234567.
[root@master hadoop]# ./bin/hdfs dfs -put ./word.txt /input
# 使用YARN进行文件单词统计操作,输入文件(存储在HDFS服务):/input/word.txt,输出文件:/output/count.txt
[root@master hadoop]# ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /input/word.txt /output/count.txt
2023-09-11 15:52:39,678 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master/9.134.244.180:8032
2023-09-11 15:52:39,932 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1694418178447_0001
2023-09-11 15:52:42,335 INFO input.FileInputFormat: Total input files to process : 1
2023-09-11 15:52:42,762 INFO mapreduce.JobSubmitter: number of splits:1
2023-09-11 15:52:45,028 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1694418178447_0001
2023-09-11 15:52:45,029 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-09-11 15:52:45,141 INFO conf.Configuration: resource-types.xml not found
2023-09-11 15:52:45,141 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-09-11 15:52:45,326 INFO impl.YarnClientImpl: Submitted application application_1694418178447_0001
2023-09-11 15:52:45,359 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1694418178447_0001/
2023-09-11 15:52:45,360 INFO mapreduce.Job: Running job: job_1694418178447_0001
2023-09-11 15:52:50,413 INFO mapreduce.Job: Job job_1694418178447_0001 running in uber mode : false
2023-09-11 15:52:50,414 INFO mapreduce.Job: map 0% reduce 0%
2023-09-11 15:52:50,426 INFO mapreduce.Job: Job job_1694418178447_0001 failed with state FAILED due to: Application application_1694418178447_0001 failed 2 times due to AM Container for appattempt_1694418178447_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2023-09-11 15:52:49.413]Exception from container-launch.
Container id: container_1694418178447_0001_02_000001
Exit code: 1
[2023-09-11 15:52:49.446]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your <HADOOP_HOME>/etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
[2023-09-11 15:52:49.447]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your <HADOOP_HOME>/etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
For more detailed output, check the application tracking page: http://master:8088/cluster/app/application_1694418178447_0001 Then click on links to logs of each attempt.
. Failing the application.
2023-09-11 15:52:50,440 INFO mapreduce.Job: Counters: 0
# 失败原因:mapreduce配置缺失
[root@master hadoop]# ./bin/hadoop classpath # 将下面输出的':'修改为','
/middleware/hadoop-3.3.4/etc/hadoop:/middleware/hadoop-3.3.4/share/hadoop/common/lib/*:/middleware/hadoop-3.3.4/share/hadoop/common/*:/middleware/hadoop-3.3.4/share/hadoop/hdfs:/middleware/hadoop-3.3.4/share/hadoop/hdfs/lib/*:/middleware/hadoop-3.3.4/share/hadoop/hdfs/*:/middleware/hadoop-3.3.4/share/hadoop/mapreduce/*:/middleware/hadoop-3.3.4/share/hadoop/yarn:/middleware/hadoop-3.3.4/share/hadoop/yarn/lib/*:/middleware/hadoop-3.3.4/share/hadoop/yarn/*
# 追加配置到MapReduce,并且分发到node1与node2
[root@master hadoop]# vim etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/middleware/hadoop-3.3.4/etc/hadoop,
/middleware/hadoop-3.3.4/share/hadoop/common/lib/*,
/middleware/hadoop-3.3.4/share/hadoop/common/*,
/middleware/hadoop-3.3.4/share/hadoop/hdfs,
/middleware/hadoop-3.3.4/share/hadoop/hdfs/lib/*,
/middleware/hadoop-3.3.4/share/hadoop/hdfs/*,
/middleware/hadoop-3.3.4/share/hadoop/mapreduce/*,
/middleware/hadoop-3.3.4/share/hadoop/yarn,
/middleware/hadoop-3.3.4/share/hadoop/yarn/lib/*,
/middleware/hadoop-3.3.4/share/hadoop/yarn/*</value>
</property>
# 服务重启
[root@master hadoop]# ./sbin/stop-yarn.sh && ./sbin/start-yarn.sh
Stopping nodemanagers
Last login: Mon Sep 11 15:42:57 CST 2023 on pts/1
master: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
node1: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
Stopping resourcemanager
Last login: Mon Sep 11 16:04:48 CST 2023 on pts/1
Starting resourcemanager
Last login: Mon Sep 11 16:04:56 CST 2023 on pts/1
Starting nodemanagers
Last login: Mon Sep 11 16:04:58 CST 2023 on pts/1
# 重新执行
[root@master hadoop]# ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /input/word.txt /output/count.txt
2023-09-11 16:06:15,116 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master/9.134.244.180:8032
2023-09-11 16:06:15,361 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1694419501011_0001
2023-09-11 16:06:15,826 INFO input.FileInputFormat: Total input files to process : 1
2023-09-11 16:06:16,259 INFO mapreduce.JobSubmitter: number of splits:1
2023-09-11 16:06:16,591 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1694419501011_0001
2023-09-11 16:06:16,591 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-09-11 16:06:16,706 INFO conf.Configuration: resource-types.xml not found
2023-09-11 16:06:16,707 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-09-11 16:06:16,904 INFO impl.YarnClientImpl: Submitted application application_1694419501011_0001
2023-09-11 16:06:16,945 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1694419501011_0001/
2023-09-11 16:06:16,945 INFO mapreduce.Job: Running job: job_1694419501011_0001
2023-09-11 16:06:25,035 INFO mapreduce.Job: Job job_1694419501011_0001 running in uber mode : false
2023-09-11 16:06:25,035 INFO mapreduce.Job: map 0% reduce 0%
2023-09-11 16:06:30,080 INFO mapreduce.Job: map 100% reduce 0%
2023-09-11 16:06:35,100 INFO mapreduce.Job: map 100% reduce 100%
2023-09-11 16:06:36,110 INFO mapreduce.Job: Job job_1694419501011_0001 completed successfully
2023-09-11 16:06:36,191 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=915
FILE: Number of bytes written=555667
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=654
HDFS: Number of bytes written=601
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2216
Total time spent by all reduces in occupied slots (ms)=2858
Total time spent by all map tasks (ms)=2216
Total time spent by all reduce tasks (ms)=2858
Total vcore-milliseconds taken by all map tasks=2216
Total vcore-milliseconds taken by all reduce tasks=2858
Total megabyte-milliseconds taken by all map tasks=2269184
Total megabyte-milliseconds taken by all reduce tasks=2926592
Map-Reduce Framework
Map input records=1
Map output records=106
Map output bytes=980
Map output materialized bytes=915
Input split bytes=98
Combine input records=106
Combine output records=77
Reduce input groups=77
Reduce shuffle bytes=915
Reduce input records=77
Reduce output records=77
Spilled Records=154
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=99
CPU time spent (ms)=1150
Physical memory (bytes) snapshot=595423232
Virtual memory (bytes) snapshot=5482131456
Total committed heap usage (bytes)=546832384
Peak Map Physical memory (bytes)=325435392
Peak Map Virtual memory (bytes)=2638544896
Peak Reduce Physical memory (bytes)=269987840
Peak Reduce Virtual memory (bytes)=2843586560
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=556
File Output Format Counters
Bytes Written=601
# 执行完成
[root@master hadoop]# ./bin/hdfs dfs -ls /output/count.txt
Found 2 items
-rw-r--r-- 3 root supergroup 0 2023-09-11 16:06 /output/count.txt/_SUCCESS # 执行结果标识
-rw-r--r-- 3 root supergroup 601 2023-09-11 16:06 /output/count.txt/part-r-00000 # 执行结果
[root@master hadoop]# ./bin/hdfs dfs -cat /output/count.txt/part-r-00000
, 1
. 5
.But 1
1234567. 1
:[email protected] 1
~
# 结果太多,此处省略
新任务job_1694419501011_0001执行成功,网页展示如下:
点击任务可查看详细信息:Attempt:应用Application的尝试任务;Logs:尝试任务日志;Node:执行尝试任务节点。
Attempt尝试任务详情:
这些指标对于优化Hadoop集群的性能和资源利用率非常重要。例如,如果Num Node Local Containers的数量太少,可能表示某些节点的资源利用率不足;如果Num Off Switch Containers的数量太多,可能表示集群的网络拓扑结构需要优化。
YARN资源控制流程如下:
YARN执行任务案例:
[root@master hadoop]# ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /input/word.txt /output/count.txt
2023-09-12 11:03:02,112 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master/9.134.244.180:8032 # 客户端申请RM连接
2023-09-12 11:03:02,385 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0001 # RM返回客户端HDFS的job id存储路径
# 插入当时查看记录
[root@master bin]# ./hdfs dfs -ls /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0002
Found 6 items
-rw-r--r-- 10 root supergroup 280990 2023-09-12 11:23 /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0002/job.jar
-rw-r--r-- 10 root supergroup 105 2023-09-12 11:23 /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0002/job.split
-rw-r--r-- 3 root supergroup 34 2023-09-12 11:23 /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0002/job.splitmetainfo
-rw-r--r-- 3 root supergroup 236995 2023-09-12 11:23 /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0002/job.xml
-rw-r--r-- 3 root supergroup 0 2023-09-12 11:23 /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0002/job_1694487467398_0002_1.jhist
-rw-r--r-- 3 root supergroup 273861 2023-09-12 11:23 /tmp/hadoop-yarn/staging/root/.staging/job_1694487467398_0002/job_1694487467398_0002_1_conf.xml
2023-09-12 11:03:03,491 INFO input.FileInputFormat: Total input files to process : 1 # 需要处理的文件总数
2023-09-12 11:03:03,993 INFO mapreduce.JobSubmitter: number of splits:1
2023-09-12 11:03:04,356 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1694487467398_0001 # 任务令牌
2023-09-12 11:03:04,356 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-09-12 11:03:04,477 INFO conf.Configuration: resource-types.xml not found
2023-09-12 11:03:04,477 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-09-12 11:03:04,667 INFO impl.YarnClientImpl: Submitted application application_1694487467398_0001 # 提交的job任务
2023-09-12 11:03:04,703 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1694487467398_0001/
# 跟踪查看job任务执行的URL
2023-09-12 11:03:04,704 INFO mapreduce.Job: Running job: job_1694487467398_0001 # 运行中的job任务
2023-09-12 11:03:12,785 INFO mapreduce.Job: Job job_1694487467398_0001 running in uber mode : false
2023-09-12 11:03:12,785 INFO mapreduce.Job: map 0% reduce 0% # MapReduce进度
2023-09-12 11:03:16,824 INFO mapreduce.Job: map 100% reduce 0% # 当Map执行完成开始Reduce
2023-09-12 11:03:21,846 INFO mapreduce.Job: map 100% reduce 100% # Map+Reduce均完成
2023-09-12 11:03:22,855 INFO mapreduce.Job: Job job_1694487467398_0001 completed successfully # job任务完成标识
2023-09-12 11:03:22,934 INFO mapreduce.Job: Counters: 54
File System Counters # 文件系统计数统计
FILE: Number of bytes read=915
FILE: Number of bytes written=555663
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=654
HDFS: Number of bytes written=601
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters # job任务计数统计
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2280
Total time spent by all reduces in occupied slots (ms)=1981
Total time spent by all map tasks (ms)=2280
Total time spent by all reduce tasks (ms)=1981
Total vcore-milliseconds taken by all map tasks=2280
Total vcore-milliseconds taken by all reduce tasks=1981
Total megabyte-milliseconds taken by all map tasks=2334720
Total megabyte-milliseconds taken by all reduce tasks=2028544
Map-Reduce Framework # MapReduce框架信息
Map input records=1
Map output records=106
Map output bytes=980
Map output materialized bytes=915
Input split bytes=98
Combine input records=106
Combine output records=77
Reduce input groups=77
Reduce shuffle bytes=915
Reduce input records=77
Reduce output records=77
Spilled Records=154
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=72
CPU time spent (ms)=880
Physical memory (bytes) snapshot=553959424
Virtual memory (bytes) snapshot=5285777408
Total committed heap usage (bytes)=591921152
Peak Map Physical memory (bytes)=327335936
Peak Map Virtual memory (bytes)=2640453632
Peak Reduce Physical memory (bytes)=226623488
Peak Reduce Virtual memory (bytes)=2645323776
Shuffle Errors # Shuffle错误统计
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters # 文件输入格式计数器
Bytes Read=556
File Output Format Counters # 文件输出格式计数器
Bytes Written=601
#!/bin/bash
read -p "please input 'hdfs/yarn/all/help':" a1
cd /middleware/hadoop/sbin # hadoop安装路径
case ${a1:-help} in
hdfs )
read -p "please input 'start/stop/restart':" b2
case ${b2:-help} in
start )
bash start-dfs.sh
;;
stop )
bash stop-dfs.sh
;;
restart )
bash stop-dfs.sh && bash start-dfs.sh
;;
* )
echo "ERROR,please input 'start/stop/restart'!"
;;
esac
;;
yarn )
read -p "please input 'start/stop/restart':" c3
case ${c3:-help} in
start )
bash start-yarn.sh
;;
stop )
bash stop-yarn.sh
;;
restart )
bash stop-yarn.sh && bash start-yarn.sh
;;
* )
echo "ERROR,please input 'start/stop/restart'!"
;;
esac
;;
all )
read -p "please input 'start/stop/restart':" d4
case ${d4:-help} in
start )
bash start-all.sh
;;
stop )
bash stop-all.sh
;;
restart )
bash stop-all.sh && bash start-all.sh
;;
* )
echo "ERROR,please input 'start/stop/restart'!"
;;
esac
;;
* )
echo "ERROR,please input 'hdfs/yarn/all/help'!"
;;
esac
cd - > /dev/null 2>&1
模拟NN节点异常,将SNN fsimage数据导入NN进行恢复的场景。
[root@master hadoop]# kill -9 19514 # NameNode进程
1.2
[root@node1 hadoop]# ll dfs/
total 8
drwx------ 3 root root 4096 Sep 11 14:35 data
drwxr-xr-x 3 root root 4096 Sep 11 14:35 namesecondary
# 备份NN数据
[root@master hadoop]# mkdir /tmp/nn_backup/
[root@master hadoop]# mv dfs/name/* /tmp/nn_backup/
# 将SNN的数据拷贝到NN
[root@node1 hadoop]# scp -r -P 36000 dfs/namesecondary/* master:/middleware/hadoop/dfs/name/
# 导入元数据
[root@master hadoop]# ./bin/hadoop namenode –importCheckpoint
# 启动HDFS
[root@master hadoop]# ./sbin/start-dfs.sh
# 导入fsimage文件后,可能需要等待一段时间才能完成所有块的复制和状态更新。在此期间,HDFS可能无法对外提供完全正常的服务
HDFS集群正常开启时处于短暂的安全模式,不能执行重要操作(写操作),待集群启动完成后,自动退出安全模式。
# 查看
[root@master hadoop-1]# ./bin/hdfs dfsadmin -safemode get
Safe mode is OFF
# 开启
[root@master hadoop-1]# ./bin/hdfs dfsadmin -safemode enter
Safe mode is ON
# 关闭
[root@master hadoop-1]# ./bin/hdfs dfsadmin -safemode leave
Safe mode is OFF
# 等待
[root@master hadoop-1]# ./bin/hdfs dfsadmin -safemode wait
Safe mode is OFF
新加入的节点,没有数据块的存储,或HDFS长期使用过程发现其数据分布不均衡,部分节点远超其他节点时,使得集群整体来看负载还不均衡。因此需要对HDFS数据存储设置负载均衡。
# 默认的数据传输带宽比较低,可以设置为64M
[root@master hadoop]# ./bin/hdfs dfsadmin -setBalancerBandwidth 67108864
# 默认balancer的threshold为10%(各个节点与集群总的存储使用率相差不超过10%)
[root@master hadoop]# ./sbin/start-balancer.sh -threshold 5
# 配置免密登录,参考本文3.3章节环境准备操作
[root@master hadoop]# ssh-keygen -t rsa -b 4096
# 下发密钥到其他节点,并拷贝其他节点密钥到本地,共计6次(对外3次,对内3次)
[root@master hadoop]# ssh-copy-id master
# 附加hosts
[root@node3 hadoop]# vim /etc/hosts
~
$ip4 node3
# 附加workers
[root@node3 hadoop]# vim workers
~
node3
# 启动node3 DdataNode与NodeManager
[root@node3 hadoop]# hadoop-daemon.sh start datanode
[root@node3 hadoop]# yarn-daemon.sh start nodemanager
# 可以使用工具查看YARN NM机器列表;HDFS DN列表可通过浏览器查看ip:50090中DataNodes页面
[root@master hadoop]# ./bin/yarn node -list
2023-09-12 14:28:47,291 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master/9.134.244.180:8032
Total Nodes:4
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
master:41295 RUNNING master:8042 0
node2:38853 RUNNING node2:8042 0
node1:41665 RUNNING node1:8042 0
node3:41573 RUNNING node3:8042 0
# 开启node3为DN后数据块不会立刻存储到node3上,可以使用数据平衡操作重新平衡数据分配
[root@node3 hadoop]# ./bin/hdfs dfsadmin -setBalancerBandwidth 104857600
[root@node3 hadoop]# ./bin/hdfs balancer -threshold 1 # 集群数据较少,因此阈值设置为1,平衡操作参考本文4.3章节,等待集群自均衡完成即可
DataNode缩容不像扩容那样启动一个节点即可, 缩容需要把当前节点数据移出去,Hadoop已经提供了下线功能,前提是在NameNode机器的hdfs-site.xml配置文件中需要提前配置dfs.hosts.exclude属性,该属性指向的文件就是所谓的黑名单列表,会被NameNode排除在集群之外。如果文件内容为空,则意味着不禁止任何机器。所以在安装Hadoop的时候就需要指定好改配置,如果最开始没有配置该参数,则需要添加后重启NameNode 。
# 在NameNode所在服务器的配置目录etc/hadoop下创建dfs.hosts.exclude文件,并添加需要退役的主机名称。
[root@master hadoop]# vim hdfs-site.xml
~
<property>
<name>dfs.hosts.exclude</name>
<value>/middleware/hadoop/etc/hadoop/excludes</value>
</property>
# 重启NN使其配置生效
[root@master hadoop]# vim /middleware/hadoop/etc/hadoop/excludes
node3
# 如果副本数是3,在线的节点小于等于3,是不能下线成功的,需要修改副本数后才能下线。
# 刷新NameNode、刷新ResourceManager进程
[root@master hadoop]# ./bin/hdfs dfsadmin -refreshNodes
[root@master hadoop]# ./bin/yarn rmadmin –refreshNodes
# 等待退役节点状态为decommissioned(所有块已经复制完成),停止该节点及节点资源管理器
# 停止node3进程
[root@master hadoop]# ./sbin/hadoop-daemon.sh stop datanode
[root@master hadoop]# ./sbin/yarn-daemon.sh stop nodemanager
在HDFS中删除的文件不会直接彻底清掉,会先丢弃到回收站中(HDFS回收站路径:/user/root/.Trash/),过一段时间之后,自动清空垃圾桶当中的文件。
在core-site.xml配置中fs.trash.interval=0代表禁用回收站,大于0时表示启用回收站,以分钟为单位的垃圾回收时间;默认值fs.trash.checkpoint.interval=0,如果是0,值等同于fs.trash.interval,以分钟为单位的垃圾回收检查间隔。要求fs.trash.checkpoint.interval<=fs.trash.interval。
[root@master hadoop]# vim core-site.xml
<!-- 开启hdfs的垃圾桶机制,删除掉的数据可以从垃圾桶中回收,单位分钟 -->
<property>
<name>fs.trash.interval</name>
<value>10</value>
</property>
[root@master hadoop]# ./sbin/stop-dfs.sh && ./sbin/start-dfs.sh
# 测试删除
[root@master hadoop]# ./bin/hdfs dfs -put ./aaa.txt /delete.txt
[root@master hadoop]# ./bin/hdfs dfs -rm /delete.txt
2023-09-11 19:37:43,757 INFO fs.TrashPolicyDefault: Moved: 'hdfs://master:9000/delete.txt' to trash at: hdfs://master:9000/user/root/.Trash/Current/delete.txt
[root@master hadoop]# ./bin/hdfs dfs -ls /user/root/.Trash/Current/
Found 1 items
-rw-r--r-- 3 root supergroup 14 2023-09-11 19:37 /user/root/.Trash/Current/delete.txt
# 恢复数据
[root@master hadoop]# ./bin/hadoop fs -mv /user/root/.Trash/Current/delete.txt /
[root@master hadoop]# ./bin/hdfs dfs -ls /delete.txt
-rw-r--r-- 3 root supergroup 14 2023-09-11 19:37 /delete.txt
# 清空回收站
[root@master bin]# ./hadoop fs -expunge
2023-09-12 11:02:07,077 INFO fs.TrashPolicyDefault: TrashPolicyDefault#deleteCheckpoint for trashRoot: hdfs://master:9000/user/root/.Trash
2023-09-12 11:02:07,077 INFO fs.TrashPolicyDefault: TrashPolicyDefault#deleteCheckpoint for trashRoot: hdfs://master:9000/user/root/.Trash
2023-09-12 11:02:07,087 INFO fs.TrashPolicyDefault: TrashPolicyDefault#createCheckpoint for trashRoot: hdfs://master:9000/user/root/.Trash
2023-09-12 11:02:07,094 INFO fs.TrashPolicyDefault: Created trash checkpoint: /user/root/.Trash/230912110207
Snapshot是HDFS整个文件系统,或者某个目录在某个时刻的镜像。**快照不是数据的简单拷贝,快照只做差异的记录!**HDFS 快照的核心功能包括:数据恢复、数据备份、数据测试。
HDFS 中可以针对整个文件系统或者文件系统中某个目录创建快照,但是创建快照的前提是相应的目录开启快照的功能。
# 启用快照功能
[root@master hadoop]# ./hdfs dfsadmin -allowSnapshot $path
# 禁用快照功能
[root@master hadoop]# ./hdfs dfsadmin -disallowSnapshot $path
# 对目录创建快照
[root@master hadoop]# ./hdfs dfs -createSnapshot $path
# 指定名称创建快照
[root@master hadoop]# ./hdfs dfs -createSnapshot $path $name
# 重命名快照
[root@master hadoop]# ./hdfs dfs -renameSnapshot $path $name1 $name2
# 列出当前用户所有可快照目录
[root@master hadoop]# ./hdfs lsSnapshottableDir
# 比较两个快照目录的不同之处
[root@master hadoop]# ./hdfs snapshotDiff $path1 $path2
# # 删除快照
[root@master hadoop]# ./hdfs dfs -deleteSnapshot $path $name
[root@master hadoop]# ./bin/hdfs dfsadmin -allowSnapshot /input
Allowing snapshot on /input succeeded
[root@master hadoop]# ./bin/hdfs dfs -createSnapshot /input
Created snapshot /input/.snapshot/s20230912-144829.624
[root@master hadoop]# ./bin/hdfs dfs -createSnapshot /input mysnap1
Created snapshot /input/.snapshot/mysnap1
[root@master hadoop]# ./bin/hdfs dfs -renameSnapshot /input mysnap1 mysnap2
Renamed snapshot mysnap1 to mysnap2 under hdfs://master:9000/input
[root@master hadoop]# ./bin/hdfs lsSnapshottableDir
drwxr-xr-x 0 root supergroup 0 2023-09-12 14:49 2 65536 /input
[root@master hadoop]# echo 222 > 2.txt
[root@master hadoop]# ./bin/hadoop fs -appendToFile 2.txt /input/1.txt
[root@master hadoop]# ./bin/hadoop fs -cat /input/1.txt
222
[root@master hadoop]# ./bin/hdfs dfs -createSnapshot /input mysnap3
Created snapshot /input/.snapshot/mysnap3
[root@master hadoop]# ./bin/hadoop fs -put 2.txt /input
[root@master hadoop]# ./bin/hdfs dfs -createSnapshot /input mysnap4
Created snapshot /input/.snapshot/mysnap4
[root@master hadoop]# ./bin/hdfs snapshotDiff /input mysnap2 mysnap4
Difference between snapshot mysnap2 and snapshot mysnap4 under directory /input:
M .
+ ./1.txt
+ ./2.txt
# 表头备注:
# + The file/directory has been created.
# - The file/directory has been deleted.
# M The file/directory has been modified.
# R The file/directory has been renamed.
[root@master hadoop]# ./bin/hdfs dfs -deleteSnapshot /input mysnap4
Deleted snapshot mysnap4 under hdfs://master:9000/input
# 拥有快照的目录不允许被删除(强制-r仍不可删除),某种程度上也保护了文件安全
[root@master hadoop]# ./bin/hadoop fs -rm -r /input
rm: Failed to move to trash: hdfs://master:9000/input: The directory /input cannot be deleted since /input is snapshottable and already has snapshots
java版本过高,与当前Hadoop部分代码不兼容导致功能异常,报错如下:
降低Java版本,本环境使用TencentJDK 17(为何不用Oracle JDK的原因:虚拟机为测试机器,版权问题无法使用),降低为TencentJDK 8。
[root@node2 ~]# tar -xf /root/TencentKona8.0.15.b2_jdk_linux-x86_64_8u382.tar.gz -C /middleware/
[root@node2 ~]# cd /middleware/
# 由于之前JDK部署为软链接,因此降低版本只需要更新软链接即可
export JAVA_HOME=/middleware/jdk
export PATH=${JAVA_HOME}/bin:$PATH
export CLASSPATH=.:${JAVA_HOME}/lib
[root@node2 /middleware]# unlink jdk
[root@node2 /middleware]# ln -s TencentKona-8.0.15-382/ jdk
[root@node2 /middleware]# source /etc/profile
[root@node2 /middleware]# java -version
openjdk version "1.8.0_382"
OpenJDK Runtime Environment (Tencent Kona 8.0.15) (build 1.8.0_382-b2)
OpenJDK 64-Bit Server VM (Tencent Kona 8.0.15) (build 25.382-b2, mixed mode, sharing)
# 重启服务
[root@master hadoop]# ./sbin/stop-dfs.sh && ./sbin/start-dfs.sh
重新使用浏览器登录或刷新页面即可查看到数据展示正常。
上传一个文件后,使用Head eth file或Tail the file时无法正常查看文件内容,报错如下:
这是因为机器内部发送请求为http://node2:9864/webhdfs/v1/aaa.txt
请求时,当前物理机器无法识别node2的ip地址导致异常,需要更改本地Hosts,添加master、node1、node2三个机器的ip地址。
MAC:/etc/hosts
Windows:C:\Windows\System32\drivers\etc\hosts