HDFS处于安全模式,因为报告的块未达到总块的0.9990

After a node failure and restarting the HDFS, the NameNode reports:

节点发生故障并重新启动HDFS之后,NameNode报告:

“The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.”

“所报告的块1968810需要另外的5071个块才能达到总块1975856的阈值0.9990。安全模式将自动关闭。”

in the log.

在日志中。

Why this happens? And how to fix it?

为什么会这样? 以及如何解决?

About why the NameNode stays in the safe mode:

关于NameNode为何保持安全模式的原因:

At startup time, the namenode reads its namespace from disk (the
FSImage and edits files). This includes all the HDFS filenames and
block lists that it should know, but not the mappings of block
replicas to datanodes. Then it waits in safe mode for all or most of
the datanodes to send their Initial Block Reports, which let the
namenode build its map of which blocks have replicas in which
datanodes. It keeps waiting until dfs.namenode.safemode.threshold-pct
of the blocks that it knows about from FSImage have been reported from
at least dfs.namenode.replication.min (default 1) datanodes [so that’s
a third config parameter I didn’t mention earlier]. If this threshold
is achieved, it will post a log that it is ready to leave safe mode,
wait for dfs.namenode.safemode.extension seconds, then automatically
leave safe mode and generate replication requests for any
under-replicated blocks (by default, those with replication < 3).

在启动时,namenode从磁盘读取其名称空间(
FSImage并编辑文件 )。 这包括所有HDFS文件名和
它应该知道的块列表,但不是块映射
复制到数据节点。 然后,它将在安全模式下等待所有或大部分
数据节点发送其初始块报告,这使
namenode建立其映射,其中哪些块具有副本
数据节点。 它一直等到dfs.namenode.safemode.threshold-pct
它从FSImage知道的块中已报告了
至少dfs.namenode.replication.min( 默认为 1)个datanodes [因此
我之前没有提到的第三个配置参数]。 如果这个门槛
实现后,它将发布一条日志,表明它已准备好退出安全模式,
等待dfs.namenode.safemode.extension秒,然后自动
退出安全模式并生成任何复制请求
复制不足的块(默认情况下,复制小于3的块)。

and

If it doesn’t reach the “safe replication for all known blocks”
threshold, then it will not leave safe mode automatically. It logs
the condition and waits for an admin to decide what to do, because
generally it means whole datanodes or sets of datanodes did not come
up or are not able to communicate with the namenode. Hadoop wants a
human to look at the situation before hadoop starts trying to madly
generate re-replication commands for under-replicated blocks, and
deleting blocks with zero replicas available.

如果没有达到“对所有已知块的安全复制”
阈值,则不会自动退出安全模式。 它记录
条件并等待管理员决定要做什么,因为
通常,这意味着没有整个数据节点或数据节点集出现
或无法与namenode通信。 Hadoop想要一个
人类在hadoop开始疯狂尝试之前先看一下情况
为复制不足的块生成复制命令,以及
删除具有零个副本的块。

By Matthew Foley.

Matthew Foley着 。

If you are sure that the blocks will never be reported in. You can force the NameMode to leave safemode by

如果您确定将永远不会报告这些阻止。您可以通过以下方式强制NameMode离开安全模式:

hadoop dfsadmin -safemode leave

You may then run hdfs fsck -move or hdfs fdck -delete to move or delete corrupted files if you are sure you will not need these affected files any more.

如果您确定不再需要这些受影响的文件,则可以运行hdfs fsck -move hdfs fdck -deletehdfs fdck -delete来移动或删除损坏的文件。

Answered by Eric Z Ma.
埃里克·马(Eric Z Ma)回答。

翻译自: https://www.systutorials.com/hdfs-stays-in-safe-mode-because-of-reported-blocks-not-reaching-0-9990-of-total-blocks/

你可能感兴趣的:(java,hadoop,python,大数据,数据库)