这一篇博文是【大数据技术●降龙十八掌】系列文章的其中一篇,点击查看目录:大数据技术●降龙十八掌
可以通过hadoop fs –help命令来查看HDFS Shell命令的说明。大部分的HDFS Shell和Linux的shell相似。
一般的shell命令格式为:
bin/hadoop command [genericOptions] [commandOptions]
command是命令;genericOptions是一般的参数;commandOptions是命令参数。
下面对各个shell命令进行详细地分析
-appendToFile ... : Appends the contents of all the given local files to the
given dst file. The dst file will be created if it does
not exist. If is -, then the input is read
from stdin.
appendToFile是将本地文件追加到HDFS的文件中,前面的参数是本地文件的路径,后面的参数是HDFS上的文件路径,指定的HDFS文件如果不存在就会先创建,本地文件的路径参数可以有多个,可以将多个本地文件同时追加到一个HDFS文件中。
示例:
hadoop fs -appendToFile ~/myfiles/file1 ~/myfiles/file2 /myhdfsfiles/hdfs1
-cat [-ignoreCrc] ...: Fetch all files that match the file pattern
and display their content on stdout.
显示指定路径的HDFS文件,可以一次显示多个文件。
指定-ignoreCrc参数后,可以使用正则表达式来筛选要显示的文件。
示例:
hadoop fs -cat hdfs://ClusterTest/myhdfsfiles/hdfs1
hdfs://ClusterTest/myhdfsfiles/wc.txt
hadoop fs -cat -ignoreCrc hdfs://ClusterTest/myhdfsfiles/*.txt
-checksum ...: Dump checksum information for files that match the file
pattern to stdout. Note that this requires a round-trip
to a datanode storing each block of the file, and thus is not
efficient to run on a large number of files. The checksum of a
file depends on its content, block size and the checksum
algorithm and parameters used for creating the file.
返回文件的校验信息。
举例:
hadoop fs -checksum hdfs://ClusterTest/myhdfsfiles/wc.txt
chgrp [-R] GROUP PATH...: This is equivalent to -chown ... :GROUP ...
改变文件所属的组。使用-R将使改变在目录结构下递归进行。命令的使用者必须是文件的所有者或者超级用户。
举例:
hadoop fs -chgrp -R centosg
hdfs://ClusterTest/myhdfsfiles/wc.txt
-chmod [-R] PATH...: Changes permissions of a file.
This works similar to shell's chmod with a few exceptions.
-R modifies the files recursively. This is the only option
currently supported.
MODE Mode is same as mode used for chmod shell command.
Only letters recognized are 'rwxXt'. E.g. +t,a+r,g-w,+rwx,o=r
OCTALMODE Mode specifed in 3 or 4 digits. If 4 digits, the first may
be 1 or 0 to turn the sticky bit on or off, respectively. Unlike shell command, it is not possible to specify only part of the mode
E.g. 754 is same as u=rwx,g=rx,o=r
If none of 'augo' is specified, 'a' is assumed and unlike
shell command, no umask is applied.
改变文件的权限. 使用-R 将使改变在目录结构下递归进行。命令的使用者必须是文件的所有者或者超级用户。
举例:
hadoop fs -chmod -R 777 hdfs://ClusterTest/myhdfsfiles
-chown [-R] [OWNER][:[GROUP]] PATH...: Changes owner and group of a file.
This is similar to shell's chown with a few exceptions.
-R modifies the files recursively. This is the only option
currently supported.
If only owner or group is specified then only owner or
group is modified.
The owner and group names may only consist of digits, alphabet,
and any of [-_./@a-zA-Z0-9]. The names are case sensitive.
WARNING: Avoid using '.' to separate user name and group though
Linux allows it. If user names have dots in them and you are
using local file system, you might see surprising results since
shell command 'chown' is used for local files.
改变文件的所有者,使用-R将使改变在目录结构下递归进行。命令的使用者必须是文件的所有者或者超级用户。
举例:
hadoop fs -chown -R centos:supergroup hdfs://ClusterTest/myhdfsfiles
-copyFromLocal [-f] [-p] ... : Identical to the -put command.
将本地文件拷贝到HDFS上。
举例:
hadoop fs -copyFromLocal
~/myfiles/* hdfs://ClusterTest/myhdfsfiles
-copyToLocal [-p] [-ignoreCrc] [-crc] ... : Identical to the -get command.
将HDFS上的文件拷贝到本地
举例:
hadoop fs -copyToLocal
hdfs://ClusterTest/myhdfsfiles/wc.txt ~/wx.txt.b
-count [-q] ...: Count the number of directories, files and bytes under the paths
that match the specified file pattern. The output columns are:
DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME or
QUOTA REMAINING_QUATA SPACE_QUOTA REMAINING_SPACE_QUOTA
DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME
统计目录个数,文件和目录下文件的大小。
输出列:DIR_COUNT,
FILE_COUNT, CONTENT_SIZE, PATHNAME
【目录个数,文件个数,总大小,路径名称】
输出列带有 -count
-q 是: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA,
DIR_COUNT, FILE_COUNT, CONTENT_SIZE, PATHNAME
【配置,其余指标,空间配额,剩余空间定额,目录个数,文件个数,总大小,路径名称】
举例:
hadoop fs -count hdfs://ClusterTest/
hadoop fs -count -q hdfs://ClusterTest/
-cp [-f] [-p] ... : Copy files that match the file pattern to a
destination. When copying multiple files, the destination
must be a directory. Passing -p preserves access and
modification times, ownership and the mode. Passing -f
overwrites the destination if it already exists.
这个命令允许复制多个文件到一个目录。
-f 选项如果文件已经存在将会被重写.
-p 选项保存文件属性 [topx] (timestamps, ownership, permission, ACL, XAttr). 如果指定 -p没有参数, 保存timestamps, ownership, permission. 如果指定 -pa, 保留权限因为ACL是一个权限的超级组。确定是否保存raw命名空间属性取决于是否使用-p决定
举例:
hadoop fs -cp -f hdfs://ClusterTest/myhdfsfiles/* hdfs://ClusterTest/bk
-createSnapshot []: Create a snapshot on a directory
创建一个目录的快照
-deleteSnapshot : Delete a snapshot from a directory
删除一个目录的快照
-df [-h] [ ...]: Shows the capacity, free and used space of the filesystem.
If the filesystem has multiple partitions, and no path to a
particular partition is specified, then the status of the root
partitions will be shown.
-h Formats the sizes of files in a human-readable fashion
rather than a number of bytes.
显示剩余空间,-h 选项会让人更加易读
举例:
hadoop fs -df -h
hdfs://ClusterTest/myhdfsfiles
-du [-s] [-h] ...: Show the amount of space, in bytes, used by the files that
match the specified file pattern. The following flags are optional:
-s Rather than showing the size of each individual file that
matches the pattern, shows the total (summary) size.
-h Formats the sizes of files in a human-readable fashion
rather than a number of bytes.
Note that, even without the -s option, this only shows size summaries
one level deep into a directory.
The output is in the form
size name(full path)
显示给定目录的文件大小及包含的目录,如果只有文件只显示文件的大小
-s 选项汇总文件的长度,而不是现实单个文件.
-h 选项显示格式更加易读
举例:
hadoop fs -du -h hdfs://ClusterTest
hadoop fs -du -h -s
hdfs://ClusterTest/
-expunge: Empty the Trash
清空垃圾回收站
举例:
hadoop fs -expunge
-get [-p] [-ignoreCrc] [-crc] ... : Copy files that match the file pattern
to the local name. is kept. When copying multiple,
files, the destination must be a directory. Passing
-p preserves access and modification times,
ownership and the mode.
复制文件到本地文件。
-ignorecrc选项复制CRC校验失败的文件
-crc选项复制文件以及CRC信息。
-p是保留文件属性。
举例:
hadoop fs -get -ignoreCrc -crc hdfs://ClusterTest/myhdfsfiles/file1
~
-getfacl [-R] : Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.
-R: List the ACLs of all files and directories recursively.
: File or directory to list.
显示访问控制列表(ACL)的文件和目录. 如果一个目录有默认的ACL, getfacl
也显示默认的ACL.
-R: 递归目录和列出所有文件的ACLs.
举例:
hadoop fs -getfacl -R hdfs://ClusterTest/myhdfsfiles/
.-getmerge [-nl] : Get all the files in the directories that
match the source file pattern and merge and sort them to only
one file on local fs. is kept.
-nl Add a newline character at the end of each file.
符合正则表达式的源目录文件合并到目标文件中。
-nl选项可以设置在每个文件末尾添加一个换行符。
举例:
hadoop fs -getmerge hdfs://ClusterTest/myhdfsfiles/file* ~/files
-help [cmd ...]: Displays help for given command or all commands if none is specified.
显示给定的命令说明或者是显示所有的命令说明。
举例:
hadoop fs -help
hadoop fs -help text
-ls [-d] [-h] [-R] [ ...]: List the contents that match the specified file pattern. If
path is not specified, the contents of /user/
will be listed. Directory entries are of the form
permissions - userid groupid size_of_directory(in bytes) modification_date(yyyy-MM-dd HH:mm) directoryName
and file entries are of the form
permissions number_of_replicas userid groupid size_of_file(in bytes) modification_date(yyyy-MM-dd HH:mm) fileName
-d Directories are listed as plain files.
-h Formats the sizes of files in a human-readable fashion
rather than a number of bytes.
-R Recursively list the contents of directories.
-d: 目录被列为纯文件。
-h: 文件格式变为易读 (例如 67108864显示 64.0m).
-R: 递归子目录列表中。
如果是文件,则按照如下格式返回文件信息:文件名<副本数>文件大小修改日期修改时间权限用户ID 组ID
如果是目录,返回列表的信息如下:目录名
举例:
hadoop fs -ls -h -R hdfs://ClusterTest/
-mkdir [-p] ...: Create a directory in specified location.
-p Do not fail if the directory already exists
创建目录
-p参数是指当目录存在时也不报错
举例:
hadoop fs -mkdir -p hdfs://ClusterTest/myhdfsfiles
-moveFromLocal ... : Same as -put, except that the source is
deleted after it's copied.
将文件从本地移动到HDFS上,实际上是在复制完成后将本地的文件删除。
举例:
hadoop fs -moveFromLocal ~/file1 hdfs://ClusterTest/myhdfsfiles
-mv ... : Move files that match the specified file pattern
to a destination . When moving multiple files, the
destination must be a directory.
移动文件,可以将符合正则表达式的文件都移动到一个目录下。如果移动多个文件,目标路径必须是要换个目录。
举例:
hadoop fs -mv hdfs://ClusterTest/myhdfsfiles/* hdfs://ClusterTest/bk2
-put [-f] [-p] ... : Copy files from the local file system
into fs. Copying fails if the file already
exists, unless the -f flag is given. Passing
-p preserves access and modification times,
ownership and the mode. Passing -f overwrites
the destination if it already exists.
复制本地文件到HDFS。
-f参数指定后如果目标文件已经存在就覆盖
-p参数指定后,将保留原文件的属性
举例:
hadoop fs -put -f -p ~/myfiles/file1 hdfs://ClusterTest/myhdfsfiles
-renameSnapshot : Rename a snapshot from oldName to newName
修改快照的的名称
-rm [-f] [-r|-R] [-skipTrash] ...: Delete all files that match the specified file pattern.
Equivalent to the Unix command "rm "
-skipTrash option bypasses trash, if enabled, and immediately
deletes
-f If the file does not exist, do not display a diagnostic
message or modify the exit status to reflect an error.
-[rR] Recursively deletes directories
删除符合给定正则表达式的文件
-f 如果文件不存在也不报错
-rf递归删除
-skipTrash不进回收站,直接删除
-rmdir [--ignore-fail-on-non-empty] ...: Removes the directory entry specified by each directory argument,provided it is empty.
删除目录,只能删除空目录
–ignore-fail-on-non-empty当使用通配符删除多个目录时,如果某一个目录下还有文件,不会提示失败。
举例:
hadoop fs -rmdir –ignore-fail-on-non-empty
hdfs://ClusterTest/myhdfsfiles
注意这里是两个–
-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]: Sets Access Control Lists (ACLs) of files and directories.
Options:
-b :Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility with permission bits.
-k :Remove the default ACL.
-R :Apply operations to all files and directories recursively.
-m :Modify ACL. New entries are added to the ACL, and existing entries are retained.
-x :Remove specified ACL entries. Other ACL entries are retained.
--set :Fully replace the ACL, discarding all existing entries. The must include entries for user, group, and others for compatibility with permission bits.
: Comma separated list of ACL entries.
: File or directory to modify.
设置访问控制列表(ACL)的文件和目录。
-b:移除所有除了基本的ACL条目。用户、组和其他的条目被保留为与权限位的兼容性。
-k:删除默认的ACL。
-R: 递归应用于所有文件和目录的操作。
-m:修改ACL。新的项目添加到ACL,并保留现有的条目。
-x: 删除指定的ACL条目。其他保留ACL条目。
–set:完全替换ACL,丢弃所有现有的条目。acl_spec必须包括用户,组,和其他有权限位的兼容性。
acl_spec:逗号分隔的ACL条目列表。
path:修改文件或目录。
举例:
hadoop fs -setfacl -m user:hadoop:rw- /file
hadoop fs -setfacl -x user:hadoop /file
hadoop fs -setfacl -b /file
hadoop fs -setfacl -k /dir
hadoop fs -setfacl –set user::rw-,user:hadoop:rw-,group::r–,other::r–
/file
hadoop fs -setfacl -R -m user:hadoop:r-x /dir
hadoop fs -setfacl -m default:user:hadoop:r-x /dir
-setrep [-R] [-w] ...: Set the replication level of a file. If is a directory
then the command recursively changes the replication factor of
all files under the directory tree rooted at .
The -w flag requests that the command wait for the replication
to complete. This can potentially take a very long time.
The -R flag is accepted for backwards compatibility. It has no effect.
更改文件的备份. 如果是一个目录,会递归改变目录下文件的备份。
-w标识,要求备份完成,这可能需要很长时间。
-R标识,是为了兼容,没有实际效果
-stat [format] ...: Print statistics about the file/directory at
in the specified format. Format accepts filesize in blocks (%b), group name of owner(%g),
filename (%n), block size (%o), replication (%r), user name of owner(%u), modification date (%y, %Y)
按指定格式打印文件/目录的打印统计。格式接受文件块(%b), 类型(%F), groutp拥有者(%g), 名字(%n), block size (%o), replication (%r), 用户拥有者(%u), 修改日期 (%y,
%Y). %y 显示UTC 日期如“yyyy-MM-dd HH:mm:ss” 和 %Y 1970年1月1日以来显示毫秒UTC. 如果没有指定, 默认使用%y.
-stat [format] ...: Print statistics about the file/directory at
in the specified format. Format accepts filesize in blocks (%b), group name of owner(%g),
filename (%n), block size (%o), replication (%r), user name of owner(%u), modification date (%y, %Y)
显示文件内容,最后千字节的文件发送到stdout
- f选项将输出附加数据随着文件的增长,如同Unix
-test -[defsz] : Answer various questions about , with result via exit status.
-d return 0 if is a directory.
-e return 0 if exists.
-f return 0 if is a file.
-s return 0 if file is greater than zero bytes in size.
-z return 0 if file is zero bytes in size.
else, return 1.
-d:如果路径是一个目录,返回0
-e:如果路径已经存在,返回0
-f: 如果路径是一个文件,返回0
-s:如果路径不是空,返回0
-z:如果文件长度为0,返回0
-text [-ignoreCrc] ...: Takes a source file and outputs the file in text format.
The allowed formats are zip and TextRecordInputStream and Avro.
将一个源文件,以文本格式输出文件。允许的格式是zip和textrecordinputstream
-touchz ...: Creates a file of zero length
at with current time as the timestamp of that .
An error is returned if the file exists with non-zero length
创建一个零长度的文件
-touchz ...: Creates a file of zero length
at with current time as the timestamp of that .
An error is returned if the file exists with non-zero length
返回单个命令的帮助