4 HDFS常用命令 2018-05-24

1.jps命令,查看进程

[hadoop@hadoop003 ~]$ jps
2034 NameNode
2148 DataNode
2633 NodeManager
5129 Jps
2521 ResourceManager
2364 SecondaryNameNode

查看详细进程

[hadoop@hadoop003 ~]$ jps -l
2034 org.apache.hadoop.hdfs.server.namenode.NameNode
5170 sun.tools.jps.Jps
2148 org.apache.hadoop.hdfs.server.datanode.DataNode
2633 org.apache.hadoop.yarn.server.nodemanager.NodeManager
2521 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
2364 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode

查看某进程相关信息

[hadoop@hadoop003 ~]$ ps -el|grep 2148
0 S   515  2148     1  0  80   0 - 690720 futex_ ?       00:00:42 java

若有残留进程,则利用jps命令+ps命令,删除残留信息:

[hadoop@hadoop003 ~]$ rm -f 2148

2.hadoop和hdfs 文件系统命令

hdfs dfs等价于Hadoop fs
查看主目录下内容:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2018-05-23 12:27 /lizhigangdir001
drwx------   - hadoop supergroup          0 2018-05-17 12:33 /tmp
drwxr-xr-x   - hadoop supergroup          0 2018-05-17 12:33 /user

创建目录:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfs -mkdir -p /lizhignagdir001/001

新建文件并写入000000:

[hadoop@hadoop003 hadoop]$ echo "000000">lizhigang.log

将lizhigang.log上传到/lizhigangdir001/001/目录下:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfs -put lizhigang.log /lizhigangdir001/001

查看/lizhigangdir001/001/lizhigang.log内容:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfs -cat /lizhigangdir001/001/lizhigang.log

将/lizhigangdir001/001目录下lizhigang.log文件下载到/tmp/目录下:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfs -get /lizhigangdir001/001/lizhigang.log /tmp/

将/lizhigangdir001/001目录下lizhigang.log文件下载到/tmp/目录下,并重命名为lizhigang1.log:

bin/hdfs dfs -get /lizhigangdir001/001/lizhigang.log /tmp/lizhigang1.log

[-moveFromLocal ... ]上传
[-moveToLocal ]下载

将/lizhigangdir001/001目录下lizhigang.log文件删除,此操作被删除文件放入回收站,一定时间内可恢复:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfs -rm -f /lizhigangdir001/001/lizhigang.log

将/lizhigangdir001/001目录下lizhigang.log文件删除,此操作被删除文件不放入回收站,不可恢复:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfs -rm -f  -skipTrash /lizhigangdir001/001/lizhigang.log

3.hdfs dfsadmin
查看磁盘空间:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfsadmin -report
Configured Capacity: 39900024832 (37.16 GB)
Present Capacity: 23986335744 (22.34 GB)
DFS Remaining: 23986073600 (22.34 GB)
DFS Used: 262144 (256 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (1):
Name: 192.168.137.200:50010 (hadoop003)
Hostname: hadoop003
Decommission Status : Normal
Configured Capacity: 39900024832 (37.16 GB)
DFS Used: 262144 (256 KB)
Non DFS Used: 13886844928 (12.93 GB)
DFS Remaining: 23986073600 (22.34 GB)
DFS Used%: 0.00%
DFS Remaining%: 60.12%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri May 25 09:38:02 CST 2018

安全模式进入、退出、get、wait:

[hadoop@hadoop003 hadoop]$ bin/hdfs dfsadmin -safemode [ enter | leave | get | wait ]

4.hadoop fsck
检查整个文件系统的健康状况:

[hadoop@hadoop003 hadoop]$ bin/hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://hadoop003:50070/fsck?ugi=hadoop&path=%2F
FSCK started by hadoop (auth:SIMPLE) from /192.168.137.200 for path / at Fri May 25 09:45:02 CST 2018
...Status: HEALTHY
 Total size:    194589 B
 Total dirs:    13
 Total files:   3
 Total symlinks:                0
 Total blocks (validated):      3 (avg. block size 64863 B)
 Minimally replicated blocks:   3 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          1
 Number of racks:               1
FSCK ended at Fri May 25 09:45:02 CST 2018 in 7 milliseconds
The filesystem under path '/' is HEALTHY

打印出hadoop的环境变量:

[hadoop@hadoop003 hadoop]$ bin/hadoop classpath
/opt/software/hadoop-2.8.1/etc/hadoop:/opt/software/hadoop-2.8.1/share/hadoop/common/lib/*:/opt/software/hadoop-2.8.1/share/hadoop/common/*:/opt/software/hadoop-2.8.1/share/hadoop/hdfs:/opt/software/hadoop-2.8.1/share/hadoop/hdfs/lib/*:/opt/software/hadoop-2.8.1/share/hadoop/hdfs/*:/opt/software/hadoop-2.8.1/share/hadoop/yarn/lib/*:/opt/software/hadoop-2.8.1/share/hadoop/yarn/*:/opt/software/hadoop-2.8.1/share/hadoop/mapreduce/lib/*:/opt/software/hadoop-2.8.1/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar

5.start-blancer.sh
1.多台机器的磁盘存储分布不均匀?
解决方案:
1.1 不加新机器,原机器的磁盘分布不均匀:

       [hadoop@rzdatahadoop002 ~]$ hdfs dfsadmin -setBalancerBandwidth  52428800
       Balancer bandwidth is set to 52428800
       [hadoop@rzdatahadoop002 ~]$ 
       [hadoop@rzdatahadoop002 sbin]$ ./start-balancer.sh  
       等价
       [hadoop@rzdatahadoop002 sbin]$ hdfs balancer 
       Apache Hadoop集群环境:  shell脚本每晚业务低谷时调度
       CDH集群环境: 忽略
       http://blog.itpub.net/30089851/viewspace-2052138/

1.2 加新机器,原机器的磁盘比如450G(500G),现在的新机器磁盘规格是5T,在业务低谷时,先将多台新机器加入到HDFS,做DN;然后选一台的DN下架掉,等待hdfs自我修复块,恢复3份(网络和io最高的,也是最有风险性的)

2.一台机器的多个磁盘分布不均匀?
2.1.无论加不加磁盘,且多块磁盘的分布不均匀

    https://hadoop.apache.org/docs/r3.0.0-alpha2/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html
    hdfs diskbalancer -plan node1.mycluster.com
    hdfs diskbalancer -execute /system/diskbalancer/nodename.plan.json

    Apache Hadoop3.x
    CDH5.12+

你可能感兴趣的:(4 HDFS常用命令 2018-05-24)