Hadoop SecondNameNode工作机制、常用命令与常用设置

1. SNN****(****secondNamenode****)

1.1****secondNamenode****工作机制

1606651196(1).jpg

1.secondNamenode执行checkpoint动作的时候,namenode会停止使用当前的edit文件515-516,会暂时将读写操作记录到一个新的edit文件中 517

2.secondNamenode将namenode的fsImage 514 和 edits文件 515-516 远程下载到本地

3.secondNamenode将fsimage 514加载到内存中,将 edits文件 515-516 内容之内存中从头到尾的执行一次,创建一个新的fsimage文件 516

4.secondNamenode将新的fsimage 516推送给namenode

5.namenode接受到fsimage 516.ckpt 滚动为fsimage 516,新的edit文件中 517.new 滚动为 edit 517 是一份最新edits文件

1.2 secondNamenode**** 学习的价值

SNN操作流程 一般主要是面试,但是一定要了解 帮助对hdfs的底层实现基本掌握。

生产上我们是不用secondNamenode ,是用HDFS HA (热备)

会有两个namenode

NN active NN standby 热备

2. hadoop命令

2.1 hadoop 命令来源

[hadoop@ruozedata001 bin]$ ./hadoop

Usage: hadoop [--config confdir] COMMAND

       where COMMAND is one of:

  fs                   run a generic filesystem user client

  version              print the version

  jar             run a jar file

  checknative [-a|-h]  check native hadoop and compression libraries availability

  distcp   copy file or directories recursively

  archive -archiveName NAME -p  *  create a hadoop archive

  classpath            prints the class path needed to get the

  credential           interact with credential providers

                       Hadoop jar and the required libraries

  daemonlog            get/set the log level for each daemon

  s3guard              manage data on S3

  trace                view and modify Hadoop tracing settings

 or

  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

2.2 hadoop 常见的压缩格式

hadoop: zlib: snappy: lz4: bzip2: openssl:

2.3 查看****是否支持压缩

[hadoop@ruozedata001 bin]$ hadoop  checknative

20/11/28 20:51:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Native library checking:

hadoop:  false

zlib:    false

snappy:  false

lz4:     false

bzip2:   false

openssl: false

编译: https://blog.csdn.net/u010452388/article/details/99691421

涉及到maven

执行或者程序抛异常

[hadoop@ruozedata001 bin]$ hadoop classpath

/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/etc/hadoop:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/yarn/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/yarn/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce/*:/home/hadoop/app/hadoop/contrib/capacity-scheduler/*.jar

[hadoop@ruozedata001 bin]$

http://cn.voidcc.com/question/p-tenieuea-bex.html

3. hdfs命令

3.1 hdfs命令的来源

[hadoop@ruozedata001 bin]$ ./hdfs

Usage: hdfs [--config confdir] COMMAND

       where COMMAND is one of:

  dfs                  run a filesystem command on the file systems supported in Hadoop.

  namenode -format     format the DFS filesystem

  secondarynamenode    run the DFS secondary namenode

  namenode             run the DFS namenode

  journalnode          run the DFS journalnode

  zkfc                 run the ZK Failover Controller daemon

  datanode             run a DFS datanode

  dfsadmin             run a DFS admin client

  diskbalancer         Distributes data evenly among disks on a given node

  haadmin              run a DFS HA admin client

  fsck                 run a DFS filesystem checking utility

  balancer             run a cluster balancing utility

  jmxget               get JMX exported values from NameNode or DataNode.

  mover                run a utility to move block replicas across

                       storage types

  oiv                  apply the offline fsimage viewer to an fsimage

  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage

  oev                  apply the offline edits viewer to an edits file

  fetchdt              fetch a delegation token from the NameNode

  getconf              get config values from configuration

  groups               get the groups which users belong to

  snapshotDiff         diff two snapshots of a directory or diff the

                       current directory contents with a snapshot

  lsSnapshottableDir   list all snapshottable dirs owned by the current user

                                                Use -help to see options

  portmap              run a portmap service

  nfs3                 run an NFS version 3 gateway

  cacheadmin           configure the HDFS cache

  crypto               configure HDFS encryption zones

  storagepolicies      list/get/set block storage policies

  version              print the version

Most commands print help when invoked w/o parameters.

[hadoop@ruozedata001 bin]$

3.2 温馨提示

hadoop fs 和 hdfs dfs 是等价的

脚本里面执行的内容是一样的

Hadoop fs

    # the core commands

    if [ "$COMMAND" = "fs" ] ; then

      CLASS=org.apache.hadoop.fs.FsShell

Hdfs dfs

    elif [ "$COMMAND" = "dfs" ] ; then

      CLASS=org.apache.hadoop.fs.FsShell

hdfs dfs命令:

Usage: hadoop fs [generic options]

    [-cat [-ignoreCrc]  ...]

    [-chmod [-R]  PATH...]

    [-chown [-R] [OWNER][:[GROUP]] PATH...]

    [-copyFromLocal [-f] [-p] [-l]  ... ]  等价于put

    [-copyToLocal [-p] [-ignoreCrc] [-crc]  ... ] 等价于get

[-put [-f] [-p] [-l] ... ]

    [-get [-p] [-ignoreCrc] [-crc]  ... ]

    [-cp [-f] [-p | -p[topax]]  ... ]

    [-du [-s] [-h] [-x]  ...]

    [-find  ...  ...]

    [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [ ...]]

    [-mkdir [-p]  ...]

    [-mv  ... ]  【生产上不建议使用移动,原因是移动过程中假如有问题,会导致数据不全。建议是使用cp ,验证通过,再去删除源端】

    [-rm [-f] [-r|-R] [-skipTrash]  ...]  【-skipTrash  不建议使用】

    [-rmdir [--ignore-fail-on-non-empty]  ...]

3.3.dfs管理操作命令

[hadoop@ruozedata001 bin]$ hdfs dfsadmin

Usage: hdfs dfsadmin

Note: Administrative commands can only be run as the HDFS superuser.

        [-report [-live] [-dead] [-decommissioning]]

        [-safemode ]   【安全模式】

        [-saveNamespace]

        [-rollEdits]

        [-restoreFailedStorage true|false|check]

        [-refreshNodes]

        [-setQuota  ...]

        [-clrQuota ...]

        [-setSpaceQuota  ...]

        [-clrSpaceQuota ...]

        [-finalizeUpgrade]

        [-rollingUpgrade []]

        [-refreshServiceAcl]

        [-refreshUserToGroupsMappings]

        [-refreshSuperUserGroupsConfiguration]

        [-refreshCallQueue]

        [-refresh   [arg1..argn]

        [-reconfig   ]

        [-printTopology]

        [-refreshNamenodes datanode_host:ipc_port]

        [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]

        [-setBalancerBandwidth ]

        [-fetchImage ]

        [-allowSnapshot ]

        [-disallowSnapshot ]

        [-shutdownDatanode  [upgrade]]

        [-getDatanodeInfo ]

        [-metasave filename]

        [-triggerBlockReport [-incremental] ]

        [-listOpenFiles [-blockingDecommission] [-path ]]

        [-help [cmd]]

3.4 ** shell脚本封装****,****获取HA切换状态预警脚本**

高级班 shell脚本封装 获取HA切换状态预警脚本

[hadoop@ruozedata001 bin]$ hdfs haadmin

Usage: DFSHAAdmin [-ns ]

[-transitionToActive  [--forceactive]]

[-transitionToStandby ]

[-failover [--forcefence] [--forceactive]  ]

[-getServiceState ]

[-checkHealth ]

getconf get config values from configuration

健康检查

[hadoop@ruozedata001 bin]$ hdfs fsck /

20/11/28 21:14:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Connecting to namenode via http://ruozedata001:50070/fsck?ugi=hadoop&path=%2F

FSCK started by hadoop (auth:SIMPLE) from /192.168.0.3 for path / at Sat Nov 28 21:14:58 CST 2020

.

/1.log:  Under replicated BP-1245831-192.168.0.3-1605965291938:blk_1073741868_1044\. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).

.

/2.log:  Under replicated BP-1245831-192.168.0.3-1605965291938:blk_1073741869_1045\. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).

.....................................Status: HEALTHY

 Total size:    257432 B

 Total dirs:    19

 Total files:   39

 Total symlinks:                0

 Total blocks (validated):      37 (avg. block size 6957 B)

 Minimally replicated blocks:   37 (100.0 %)

 Over-replicated blocks:        0 (0.0 %)

 Under-replicated blocks:       2 (5.4054055 %)

 Mis-replicated blocks:         0 (0.0 %)

 Default replication factor:    2

 Average block replication:     1.0

 Corrupt blocks:                0

 Missing replicas:              2 (5.1282053 %)

 Number of data-nodes:          1

 Number of racks:               1

FSCK ended at Sat Nov 28 21:14:58 CST 2020 in 4 milliseconds

The filesystem under path '/' is HEALTHY

[hadoop@ruozedata001 bin]$

4. 安全模式

[hadoop@ruozedata001 bin]$ hdfs dfsadmin -safemode get 【先开启hdfs】

20/11/28 21:37:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Safe mode is OFF

[hadoop@ruozedata001 bin]$

OFF关闭 读写都ok

ON开启 写不行,读ok

[hadoop@ruozedata001 bin]$ hdfs dfs -put 3.log  /

20/11/28 21:39:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

put: Cannot create file/3.log._COPYING_. Name node is in safe mode.

[hadoop@ruozedata001 bin]$

[hadoop@ruozedata001 bin]$

[hadoop@ruozedata001 bin]$ hdfs dfs -ls /

20/11/28 21:40:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 6 items

-rw-r--r-- 2 hadoop supergroup 4 2020-11-25 22:31 /1.log

-rw-r--r-- 2 hadoop supergroup 4 2020-11-25 22:33 /2.log

drwxr-xr-x - hadoop supergroup 0 2020-11-28 19:09 /system

drwx------ - hadoop supergroup 0 2020-11-22 19:52 /tmp

drwxr-xr-x - hadoop supergroup 0 2020-11-21 21:50 /user

drwxr-xr-x - hadoop supergroup 0 2020-11-22 19:52 /wordcount

[hadoop@ruozedata001 bin]$ hdfs dfs -cat /1.log

20/11/28 21:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

123

[hadoop@ruozedata001 bin]$

4.1 被动--》安全模式

未来必然hdfs查看日志出现安全模式的英文单词,不要大惊小怪,

必然说明你的hdfs集群是有问题的,相当于处于一个保护模式

一般需要你尝试手动执行命令,离开安全模式 【优先操作】

4.2 主动--》安全模式,做 维护操作

这个时间段保证hdfs不会有新数据进入

5. 回收站

5.1设置回收站时间

hdfs-site.xml文件中:

 
        fs.trash.interval
        10080
 

72460=10080

[hadoop@ruozedata001 ~]$ hdfs dfs -rm /1.log

20/11/28 21:59:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

20/11/28 21:59:48 INFO fs.TrashPolicyDefault: Moved: 'hdfs://ruozedata001:9000/1.log' to trash at: hdfs://ruozedata001:9000/user/hadoop/.Trash/Current/1.log

hdfs://ruozedata001:9000/user/hadoop/.Trash/Current/1.log这个是回收站的地址

[hadoop@ruozedata001 ~]$ hdfs dfs -rm -skipTrash /2.log

20/11/28 22:00:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Deleted /2.log

[hadoop@ruozedata001 ~]$

【生产上必须要回收站,且回收站默认时间尽量长,7天;】

【涉及到删除,不准使用 -skipTrash,就是让文件进入回收站,以防万一 】

6. 各个节点平衡

[hadoop@ruozedata001 sbin]$ sh ./start-balancer.sh

[hadoop@ruozedata001 sbin]$ cat /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-balancer-ruozedata001.log

2020-11-28 22:07:35,135 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: namenodes  = [hdfs://ruozedata001:9000]

2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false]

2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: included nodes = []

2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: excluded nodes = []

2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: source nodes = []

2020-11-28 22:07:35,242 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2020-11-28 22:07:36,086 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)

2020-11-28 22:07:36,086 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)

2020-11-28 22:07:36,087 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)

2020-11-28 22:07:36,087 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)

2020-11-28 22:07:36,090 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)

2020-11-28 22:07:36,103 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.0.3:50010

2020-11-28 22:07:36,104 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 over-utilized: []

2020-11-28 22:07:36,104 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 underutilized: []

[hadoop@ruozedata001 sbin]$

threshold = 10.0  

每个节点磁盘使用率-平均磁盘使用率<10%

第一个节点 90% -76% = 14% 多了4%

第二个节点 60% -76% = -16% 少-16%

第三个节点 80% -76%= 4% 满足

230%/3=76%

【生产上,写个定时脚本,每天晚上业务低谷去执行一下】

./start-balancer.sh

参数 dfs.datanode.balance.bandwidthPerSec 10m--》50m

控制数据平衡操作的带宽大小

假如生产就3台机器 3个副本,请问这个定时脚本有用吗?没有用

7.单个节点多块磁盘平衡

7.1 设置 hdfs-site-xml

 

        dfs.datanode.data.dir 

        /data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn

 

/data01 100G

/data02 200G

/data03 490G

[hadoop@ruozedata001 sbin]$ hdfs diskbalancer

usage: hdfs diskbalancer [command] [options]

DiskBalancer distributes data evenly between different disks on a

datanode. DiskBalancer operates by generating a plan, that tells datanode

how to move data between disks. Users can execute a plan by submitting it

to the datanode.

To get specific help on a particular command please run

hdfs diskbalancer -help .

    --help    valid commands are plan | execute | query | cancel |

                   report

[hadoop@ruozedata001 sbin]$

Apache hadoop2.x 没戏 不支持 dfs.disk.balancer.enabled 搜索不到

https://hadoop.apache.org/docs/r2.10.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Apache hadoop3.x 支持 dfs.disk.balancer.enabled 搜索到 是true

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

CDH hadoop2.x 支持 dfs.disk.balancer.enabled 搜索到 是false

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

如何去执行呢?

文档:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html

 

        dfs.disk.balancer.enabled

        true

 

[hadoop@ruozedata001 hadoop]$ hdfs diskbalancer -plan ruozedata001

20/11/28 22:37:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

20/11/28 22:37:02 INFO planner.GreedyPlanner: Starting plan for Node : ruozedata001:50020

20/11/28 22:37:02 INFO planner.GreedyPlanner: Compute Plan for Node : ruozedata001:50020 took 1 ms

20/11/28 22:37:03 INFO command.Command: No plan generated. DiskBalancing not needed for node: ruozedata001 threshold used: 10.0

hdfs diskbalancer -execute ruozedata001.plan.json 执行

hdfs diskbalancer -query ruozedata001

生产

【生产上,写个定时脚本,每日晚上业务低谷去执行一下】

8.总结:

1.先自己分析,必须找到log-->error

2.百度谷歌搜索

3.问老师,问同事,问群友

4.apache issue网站

5.源代码导入idea debug

6.如何找到log文件:

配置文件 my.cnf data/hostname.err文件

当前目录的logs文件夹

/var/log

ps -ef 查看进程描述

作业:

1.snn整理

2.hadoop hdfs命令梳理

3.如上四个整理

4.写到博客

5.编译hadoop 支持压缩

学习参考文档:

https://www.bilibili.com/video/BV1Tb411c7nW

https://www.bilibili.com/video/BV1Dv411k7uJ

你可能感兴趣的:(Hadoop SecondNameNode工作机制、常用命令与常用设置)