在前2章内, 我们分别介绍了Hadoop安装的3种形式(Standalone mode
/ Pseudo-Distributed mode
/Cluster mode
). 本章, 我们介绍如何使用HDFS命令进行一些基本的操作. 官方的操作文档可以查看Hadoop Shell命令.
已经安装Hadoop
集群, 并启动. 从页面可以看到, 我们HDFS
系统的文件目录.
对于文件系统用的最多的就是, 增删查改
与权限系统
一直是我们操作文件系统的基本命令.它们的基本操作分别如下所示:
ls
# 本地仓库
localhost:current Sean$ hadoop fs -ls /
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/30 16:15:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log
drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp
drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user
# 全路径
localhost:current Sean$ hadoop fs -ls hdfs://localhost:9000/
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/30 16:16:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 hdfs://localhost:9000/1.log
drwx------ - Sean supergroup 0 2019-03-25 12:11 hdfs://localhost:9000/tmp
drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 hdfs://localhost:9000/user
put
# 上传文件
localhost:current Sean$ hadoop fs -put hello2019.sh /
# 查询上传的文件
localhost:current Sean$ hadoop fs -ls /
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 4 items
-rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log
-rw-r--r-- 1 Sean supergroup 10 2019-03-30 16:19 /hello2019.sh
drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp
drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user
# 手动合并(文件是可以还原的.)
cat blk_1073741983 >> tmp.file
cat blk_1073741984 >> tmp.file
默认文件切分大小为128M, 大于的话会切分成2快.
cat
# 通过hadoop查看
localhost:current Sean$ hadoop fs -cat /hello2019.sh
hello2019
#通过本地linux查看
localhost:current Sean$ cat finalized/subdir0/subdir0/blk_1073741983
hello2019
localhost:current Sean$ pwd
/Users/Sean/Software/hadoop/current/tmp/dfs/data/current/BP-586017156-127.0.0.1-1553485799471/current
get
localhost:current Sean$ hadoop fs -get /hello2019.sh
localhost:current Sean$ ls
VERSION dfsUsed finalized hello2019.sh rbw
localhost:current Sean$ cat hello2019.sh
hello2019
mkdir
localhost:current Sean$ hadoop fs -mkdir -p /wordcount/input
localhost:current Sean$ hadoop fs -ls /wordcount
Found 1 items
drwxr-xr-x - Sean supergroup 0 2019-03-30 16:40 /wordcount/input
hdfs的命令操作, 可以通过hadoop fs
直接显示所以命令.
localhost:mapreduce Sean$ hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile ... ]
[-cat [-ignoreCrc] ...]
[-checksum ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] ... ]
[-copyToLocal [-p] [-ignoreCrc] [-crc] ... ]
[-count [-q] [-h] ...]
[-cp [-f] [-p | -p[topax]] ... ]
[-createSnapshot []]
[-deleteSnapshot ]
[-df [-h] [ ...]]
[-du [-s] [-h] ...]
[-expunge]
[-find ... ...]
[-get [-p] [-ignoreCrc] [-crc] ... ]
[-getfacl [-R] ]
[-getfattr [-R] {-n name | -d} [-e en] ]
[-getmerge [-nl] ]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [ ...]]
[-mkdir [-p] ...]
[-moveFromLocal ... ]
[-moveToLocal ]
[-mv ... ]
[-put [-f] [-p] [-l] ... ]
[-renameSnapshot ]
[-rm [-f] [-r|-R] [-skipTrash] ...]
[-rmdir [--ignore-fail-on-non-empty] ...]
[-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]]
[-setfattr {-n name [-v value] | -x name} ]
[-setrep [-R] [-w] ...]
[-stat [format] ...]
[-tail [-f] ]
[-test -[defsz] ]
[-text [-ignoreCrc] ...]
[-touchz ...]
[-truncate [-w] ...]
[-usage [cmd ...]]
Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
help
输出命令参数手册.
mkdir
创建目录. hadoop fs -mkdir -p /abc/acc
moveFromLocal / moveToLocal
从本地移动HDFS(本地原文件删除). hadoop fs -moveFromLocal abc.txt /
从HDFS移动本地(HDFS原文件删除). hadoop fs -moveFromLocal abc.txt /
appendToFile
追加到文件上面. hadoop fs -appendToFile abc.txt /hello2019.txt
localhost:mapreduce Sean$ echo xxoo >> hello.txt
localhost:mapreduce Sean$ hadoop fs -appendToFile hello.txt /hello2019.sh
localhost:mapreduce Sean$ hadoop fs -cat /hello2019.sh
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
hello2019
xxoo
cat
显示文件. hadoop fs -cat /hello2019.sh
. 文件过多hadoop fs -cat /hello2019.sh | more
或者hadoop fs -tail /hello2019.sh
tail
显示文件末尾. hadoop fs -tail /hello2019.sh
text
已字符形式打印一个文件的内容.hadoop fs -text /hello2019.sh
.
chgrp / chmod / chown
chgrp
更改文件组; chmod
更改权限; chown
更改用户和组.
hadoop fs -chmod 666 /hello2019.txt
hadoop fs -chown someuser:somegrp /hello2019.txt
localhost:mapreduce Sean$ hadoop fs -chmod 777 /hello2019.sh
localhost:mapreduce Sean$ hadoop fs -ls /
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/30 17:09:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 5 items
-rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log
-rwxrwxrwx 1 Sean supergroup 15 2019-03-30 16:55 /hello2019.sh
drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp
drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user
drwxr-xr-x - Sean supergroup 0 2019-03-30 16:43 /wordcount
# hadoop内没有用户的设计.(所以没创建该用户也可以这样改造.)
localhost:mapreduce Sean$ hadoop fs -chown hellokitty:hello /hello2019.sh
localhost:mapreduce Sean$ hadoop fs -ls /
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/30 17:10:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 5 items
-rw-r--r-- 1 Sean supergroup 2 2019-03-25 11:55 /1.log
-rwxrwxrwx 1 hellokitty hello 15 2019-03-30 16:55 /hello2019.sh
drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp
drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user
drwxr-xr-x - Sean supergroup 0 2019-03-30 16:43 /wordcount
copyFromlocal / copyToLocal
从本地拷贝; 拷贝到本地
cp
hdfs内部进行拷贝. hadoop fs -cp /hello2019.sh /a/hello2019.sh
mv
hdfs内部进行移动. hadoop fs -mv /hello2019.sh /a/
get
获取到本地. 类似copyToLocal
. hadoop fs -get /hello.sh
getmerge
合并下载多个文件. hadoop fs -getmerge /wordcount/output/* hellomerge.sh
localhost:mapreduce Sean$ hadoop fs -getmerge /wordcount/output/* hellomerge.sh
localhost:mapreduce Sean$ cat hellomerge.sh
2019 1
able 1
cat 2
hello 1
kitty 1
pitty 2
put
下载到本地. 类似copyFromLocal
. hadoop fs -put hello2019.sh /
rm
删除. hadoop fs -rm -r /hello2019.sh
# -r recursive 递归的意思
localhost:mapreduce Sean$ hadoop fs -rm -r /1.log
Deleted /1.log
localhost:mapreduce Sean$ hadoop fs -ls /
Found 4 items
-rwxrwxrwx 1 hellokitty hello 15 2019-03-30 16:55 /hello2019.sh
drwx------ - Sean supergroup 0 2019-03-25 12:11 /tmp
drwxr-xr-x - Sean supergroup 0 2019-03-25 13:16 /user
drwxr-xr-x - Sean supergroup 0 2019-03-30 16:43 /wordcount
rmdir
删除空目录. hadoop fs - rmdir /abbc
df
统计文件系统的可用信息. hadoop fs -df -h /
localhost:mapreduce Sean$ hadoop fs -df -h /
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Filesystem Size Used Available Use%
hdfs://localhost:9000 465.7 G 5.9 M 169.1 G 0%
du
hadoop fs -du -s -h /abc/d
# -s 汇总 -h 带单位
localhost:mapreduce Sean$ hadoop fs -du -s -h /wordcount/
86 /wordcount
Most commands print help when invoked w/o parameters.
localhost:mapreduce Sean$ hadoop fs -du -s -h hdfs://localhost:9000/*
15 hdfs://localhost:9000/hello2019.sh
4.7 M hdfs://localhost:9000/tmp
266.0 K hdfs://localhost:9000/user
86 hdfs://localhost:9000/wordcount
count
统计一个目录下的文件数目. hadoop fs -count /aaa/
setrep
设置文件的副本数目. replication
.
localhost:mapreduce Sean$ hadoop fs -setrep 3 /wordcount/input/hello2019.sh
Replication 3 set: /wordcount/input/hello2019.sh
如果结点为3个, 但是设置为10个的时候.并不会设置10个, 这个是namenode
中的原数据的副本数目,但是不一定是真实的副本数目(视datanode的数目而定
).
localhost:mapreduce Sean$ hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input/ /wordcount/output
localhost:mapreduce Sean$ hadoop jar hadoop-mapreduce-examples-2.7.5.jar wordcount /wordcount/input/ /wordcount/output
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/03/30 16:43:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/03/30 16:43:31 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/03/30 16:43:32 INFO input.FileInputFormat: Total input paths to process : 1
19/03/30 16:43:32 INFO mapreduce.JobSubmitter: number of splits:1
19/03/30 16:43:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553933297569_0001
19/03/30 16:43:33 INFO impl.YarnClientImpl: Submitted application application_1553933297569_0001
19/03/30 16:43:33 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1553933297569_0001/
19/03/30 16:43:33 INFO mapreduce.Job: Running job: job_1553933297569_0001
19/03/30 16:43:43 INFO mapreduce.Job: Job job_1553933297569_0001 running in uber mode : false
19/03/30 16:43:43 INFO mapreduce.Job: map 0% reduce 0%
19/03/30 16:43:48 INFO mapreduce.Job: map 100% reduce 0%
19/03/30 16:43:54 INFO mapreduce.Job: map 100% reduce 100%
19/03/30 16:43:54 INFO mapreduce.Job: Job job_1553933297569_0001 completed successfully
19/03/30 16:43:54 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=74
FILE: Number of bytes written=243693
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=157
HDFS: Number of bytes written=44
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3271
Total time spent by all reduces in occupied slots (ms)=3441
Total time spent by all map tasks (ms)=3271
Total time spent by all reduce tasks (ms)=3441
Total vcore-milliseconds taken by all map tasks=3271
Total vcore-milliseconds taken by all reduce tasks=3441
Total megabyte-milliseconds taken by all map tasks=3349504
Total megabyte-milliseconds taken by all reduce tasks=3523584
Map-Reduce Framework
Map input records=7
Map output records=8
Map output bytes=74
Map output materialized bytes=74
Input split bytes=115
Combine input records=8
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=74
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=123
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=306184192
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=42
File Output Format Counters
Bytes Written=44
输出目录不能有其他内容, 否则会被进行覆盖.本次测试后的输出目录下出现2个文件:success / part-r-0000
. 第一个文件为输出标准, 第二个文件是真实的文件.
localhost:mapreduce Sean$ hadoop fs -cat /wordcount/output/part-r-00000
2019 1
able 1
cat 2
hello 1
kitty 1
pitty 2
localhost:mapreduce Sean$ ls
hadoop-mapreduce-client-app-2.7.5.jar hadoop-mapreduce-client-hs-plugins-2.7.5.jar hadoop-mapreduce-examples-2.7.5.jar
hadoop-mapreduce-client-common-2.7.5.jar hadoop-mapreduce-client-jobclient-2.7.5-tests.jar lib
hadoop-mapreduce-client-core-2.7.5.jar hadoop-mapreduce-client-jobclient-2.7.5.jar lib-examples
hadoop-mapreduce-client-hs-2.7.5.jar hadoop-mapreduce-client-shuffle-2.7.5.jar sources
这个文件夹下方有非常多的测试例子. 可以自己研究.
存储在HDFS内的文件其实还是存储在本地的, 只是它是一个分布式的文件系统而已. 我们看下我们之前存储的.
注意,由于我本地的是DataNode
与NameNode
安装在一起的, 所以文件目录下结点如下所示:
localhost:tmp Sean$ tree
.
├── dfs
│ ├── data
│ │ ├── current
│ │ │ ├── BP-586017156-127.0.0.1-1553485799471
│ │ │ │ ├── current
│ │ │ │ │ ├── VERSION
│ │ │ │ │ ├── dfsUsed
│ │ │ │ │ ├── finalized
│ │ │ │ │ │ └── subdir0
│ │ │ │ │ │ └── subdir0
│ │ │ │ │ │ ├── blk_1073741825
│ │ │ │ │ │ ├── blk_1073741825_1001.meta
│ │ │ │ │ └── rbw
│ │ │ │ ├── scanner.cursor
│ │ │ │ └── tmp
│ │ │ └── VERSION
│ │ └── in_use.lock
│ ├── name
│ │ ├── current
│ │ │ ├── VERSION
│ │ │ ├── edits_0000000000000000001-0000000000000000118
│ │ │ ├── edits_inprogress_0000000000000001233
│ │ │ ├── fsimage_0000000000000001230
│ │ │ ├── fsimage_0000000000000001230.md5
│ │ │ ├── fsimage_0000000000000001232
│ │ │ ├── fsimage_0000000000000001232.md5
│ │ │ └── seen_txid
│ │ └── in_use.lock
│ └── namesecondary
│ ├── current
│ │ ├── VERSION
│ │ ├── edits_0000000000000000001-0000000000000000118
│ │ ├── edits_0000000000000000119-0000000000000000943
│ │ ├── edits_0000000000000001231-0000000000000001232
│ │ ├── fsimage_0000000000000001230
│ │ ├── fsimage_0000000000000001230.md5
│ │ ├── fsimage_0000000000000001232
│ │ └── fsimage_0000000000000001232.md5
│ └── in_use.lock
└── nm-local-dir
├── filecache
├── nmPrivate
└── usercache
hdfs dfsadmin -report
查看集群状态localhost:Desktop Sean$ hdfs dfsadmin -report
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
19/04/03 15:51:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 500068036608 (465.72 GB)
Present Capacity: 182055092460 (169.55 GB)
DFS Remaining: 182048903168 (169.55 GB)
DFS Used: 6189292 (5.90 MB)
DFS Used%: 0.00%
Under replicated blocks: 27
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 127.0.0.1:50010 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 500068036608 (465.72 GB)
DFS Used: 6189292 (5.90 MB)
Non DFS Used: 313001643796 (291.51 GB)
DFS Remaining: 182048903168 (169.55 GB)
DFS Used%: 0.00%
DFS Remaining%: 36.40%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Apr 03 15:51:21 CST 2019
[1].Hadoop Shell命令
[2]. 介绍hadoop中的hadoop和hdfs命令
[3]. Hadoop学习笔记4之HDFS常用命令