【十八掌●内功篇】第五掌:HDFS之Shell

这一篇博文是【大数据技术●降龙十八掌】系列文章的其中一篇,点击查看目录:这里写图片描述大数据技术●降龙十八掌


系列文章:
【十八掌●内功篇】第五掌:HDFS之基础知识
【十八掌●内功篇】第五掌:HDFS之Shell

可以通过hadoop fs –help命令来查看HDFS Shell命令的说明。大部分的HDFS Shell和Linux的shell相似。

一般的shell命令格式为:

bin/hadoop command [genericOptions] [commandOptions]

command是命令;genericOptions是一般的参数;commandOptions是命令参数。

举例
在hadoop fs -ls /input这条命令中,command是fs,genericOptions对应-ls等参数,commandOptions对应于/input这个路径参数。

下面对各个shell命令进行详细地分析

1、 appendToFile

-appendToFile  ... : Appends the contents of all the given local files to the
        given dst file. The dst file will be created if it does
        not exist. If  is -, then the input is read
        from stdin.

appendToFile是将本地文件追加到HDFS的文件中,前面的参数是本地文件的路径,后面的参数是HDFS上的文件路径,指定的HDFS文件如果不存在就会先创建,本地文件的路径参数可以有多个,可以将多个本地文件同时追加到一个HDFS文件中。

示例:

hadoop fs -appendToFile ~/myfiles/file1 ~/myfiles/file2 /myhdfsfiles/hdfs1

2、cat

-cat [-ignoreCrc]  ...:    Fetch all files that match the file pattern 
        and display their content on stdout.

显示指定路径的HDFS文件,可以一次显示多个文件。

指定-ignoreCrc参数后,可以使用正则表达式来筛选要显示的文件。

示例:

hadoop fs -cat hdfs://ClusterTest/myhdfsfiles/hdfs1
hdfs://ClusterTest/myhdfsfiles/wc.txt

hadoop fs -cat -ignoreCrc hdfs://ClusterTest/myhdfsfiles/*.txt

3、checksum

-checksum  ...:    Dump checksum information for files that match the file
        pattern  to stdout. Note that this requires a round-trip
        to a datanode storing each block of the file, and thus is not
        efficient to run on a large number of files. The checksum of a
        file depends on its content, block size and the checksum
        algorithm and parameters used for creating the file.

返回文件的校验信息。

举例:

hadoop fs -checksum hdfs://ClusterTest/myhdfsfiles/wc.txt

4、chgrp

chgrp [-R] GROUP PATH...:   This is equivalent to -chown ... :GROUP ...

改变文件所属的组。使用-R将使改变在目录结构下递归进行。命令的使用者必须是文件的所有者或者超级用户。

举例:

hadoop fs -chgrp -R centosg
hdfs://ClusterTest/myhdfsfiles/wc.txt

5、chmod

-chmod [-R]  PATH...:   Changes permissions of a file.
            This works similar to shell's chmod with a few exceptions.

        -R  modifies the files recursively. This is the only option
            currently supported.

        MODE    Mode is same as mode used for chmod shell command.
            Only letters recognized are 'rwxXt'. E.g. +t,a+r,g-w,+rwx,o=r

        OCTALMODE Mode specifed in 3 or 4 digits. If 4 digits, the first may
        be 1 or 0 to turn the sticky bit on or off, respectively.  Unlike shell command, it is not possible to specify only part of the mode
            E.g. 754 is same as u=rwx,g=rx,o=r

            If none of 'augo' is specified, 'a' is assumed and unlike
            shell command, no umask is applied.

改变文件的权限. 使用-R 将使改变在目录结构下递归进行。命令的使用者必须是文件的所有者或者超级用户。

举例:

hadoop fs -chmod -R 777 hdfs://ClusterTest/myhdfsfiles

6、chown

-chown [-R] [OWNER][:[GROUP]] PATH...:  Changes owner and group of a file.
            This is similar to shell's chown with a few exceptions.

            -R  modifies the files recursively. This is the only option
            currently supported.

            If only owner or group is specified then only owner or
            group is modified.

            The owner and group names may only consist of digits, alphabet,
            and any of [-_./@a-zA-Z0-9]. The names are case sensitive.

            WARNING: Avoid using '.' to separate user name and group though
            Linux allows it. If user names have dots in them and you are
            using local file system, you might see surprising results since
            shell command 'chown' is used for local files.

改变文件的所有者,使用-R将使改变在目录结构下递归进行。命令的使用者必须是文件的所有者或者超级用户。

举例:

hadoop fs -chown -R centos:supergroup hdfs://ClusterTest/myhdfsfiles

7、copyFromLocal

-copyFromLocal [-f] [-p]  ... :  Identical to the -put command.

将本地文件拷贝到HDFS上。

举例:

hadoop fs -copyFromLocal
~/myfiles/* hdfs://ClusterTest/myhdfsfiles

8、copyToLocal

-copyToLocal [-p] [-ignoreCrc] [-crc]  ... : Identical to the -get command.

将HDFS上的文件拷贝到本地

举例:

hadoop fs -copyToLocal
hdfs://ClusterTest/myhdfsfiles/wc.txt ~/wx.txt.b

9、count

-count [-q]  ...: Count the number of directories, files and bytes under the paths
        that match the specified file pattern.  The output columns are:
        DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME or
        QUOTA REMAINING_QUATA SPACE_QUOTA REMAINING_SPACE_QUOTA 
              DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME

统计目录个数,文件和目录下文件的大小。

输出列:DIR_COUNT,
FILE_COUNT, CONTENT_SIZE, PATHNAME

【目录个数,文件个数,总大小,路径名称】

输出列带有 -count
-q 是: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA,
DIR_COUNT, FILE_COUNT, CONTENT_SIZE, PATHNAME
【配置,其余指标,空间配额,剩余空间定额,目录个数,文件个数,总大小,路径名称】
举例:

hadoop fs -count hdfs://ClusterTest/

hadoop fs -count -q hdfs://ClusterTest/

10、cp

-cp [-f] [-p]  ... :  Copy files that match the file pattern  to a
        destination.  When copying multiple files, the destination
        must be a directory. Passing -p preserves access and
        modification times, ownership and the mode. Passing -f
        overwrites the destination if it already exists.

这个命令允许复制多个文件到一个目录。
-f 选项如果文件已经存在将会被重写.
-p 选项保存文件属性 [topx] (timestamps, ownership, permission, ACL, XAttr). 如果指定 -p没有参数, 保存timestamps, ownership, permission. 如果指定 -pa, 保留权限因为ACL是一个权限的超级组。确定是否保存raw命名空间属性取决于是否使用-p决定
举例:

hadoop fs -cp -f hdfs://ClusterTest/myhdfsfiles/* hdfs://ClusterTest/bk

11、 createSnapshot

-createSnapshot  []: Create a snapshot on a directory

创建一个目录的快照

12、deleteSnapshot

-deleteSnapshot :    Delete a snapshot from a directory

删除一个目录的快照

13、df

-df [-h] [ ...]:  Shows the capacity, free and used space of the filesystem.
        If the filesystem has multiple partitions, and no path to a
        particular partition is specified, then the status of the root
        partitions will be shown.
          -h   Formats the sizes of files in a human-readable fashion
               rather than a number of bytes.

显示剩余空间,-h 选项会让人更加易读

举例:

hadoop fs -df -h
hdfs://ClusterTest/myhdfsfiles

14、du

-du [-s] [-h]  ...:   Show the amount of space, in bytes, used by the files that
        match the specified file pattern. The following flags are optional:
          -s   Rather than showing the size of each individual file that
               matches the pattern, shows the total (summary) size.
          -h   Formats the sizes of files in a human-readable fashion
               rather than a number of bytes.

        Note that, even without the -s option, this only shows size summaries
        one level deep into a directory.
        The output is in the form 
            size    name(full path)

显示给定目录的文件大小及包含的目录,如果只有文件只显示文件的大小
-s 选项汇总文件的长度,而不是现实单个文件.
-h 选项显示格式更加易读
举例:

hadoop fs -du -h hdfs://ClusterTest

hadoop fs -du -h -s
hdfs://ClusterTest/

15、expunge

-expunge:   Empty the Trash

清空垃圾回收站

举例:

hadoop fs -expunge

16、get

-get [-p] [-ignoreCrc] [-crc]  ... : Copy files that match the file pattern 
        to the local name.   is kept.  When copying multiple,
        files, the destination must be a directory. Passing
        -p preserves access and modification times,
        ownership and the mode.

复制文件到本地文件。

-ignorecrc选项复制CRC校验失败的文件

-crc选项复制文件以及CRC信息。

-p是保留文件属性。

举例:

hadoop fs -get -ignoreCrc -crc hdfs://ClusterTest/myhdfsfiles/file1
~

17、getfacl

-getfacl [-R] :   Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.
        -R: List the ACLs of all files and directories recursively.
        : File or directory to list.

显示访问控制列表(ACL)的文件和目录. 如果一个目录有默认的ACL, getfacl
也显示默认的ACL.

-R: 递归目录和列出所有文件的ACLs.

举例:

hadoop fs -getfacl -R hdfs://ClusterTest/myhdfsfiles/

18、getmerge

.-getmerge [-nl] :   Get all the files in the directories that
        match the source file pattern and merge and sort them to only
        one file on local fs.  is kept.
          -nl   Add a newline character at the end of each file.

符合正则表达式的源目录文件合并到目标文件中。

-nl选项可以设置在每个文件末尾添加一个换行符。

举例:

hadoop fs -getmerge hdfs://ClusterTest/myhdfsfiles/file* ~/files

19、help

-help [cmd ...]:    Displays help for given command or all commands if none is specified.

显示给定的命令说明或者是显示所有的命令说明。
举例:
hadoop fs -help
hadoop fs -help text

20、ls

-ls [-d] [-h] [-R] [ ...]:    List the contents that match the specified file pattern. If
        path is not specified, the contents of /user/
        will be listed. Directory entries are of the form 
            permissions - userid groupid size_of_directory(in bytes) modification_date(yyyy-MM-dd HH:mm) directoryName 
        and file entries are of the form 
            permissions number_of_replicas userid groupid size_of_file(in bytes) modification_date(yyyy-MM-dd HH:mm) fileName 
          -d  Directories are listed as plain files.
          -h  Formats the sizes of files in a human-readable fashion
              rather than a number of bytes.
          -R  Recursively list the contents of directories.

-d: 目录被列为纯文件。
-h: 文件格式变为易读 (例如 67108864显示 64.0m).
-R: 递归子目录列表中。
如果是文件,则按照如下格式返回文件信息:文件名<副本数>文件大小修改日期修改时间权限用户ID 组ID
如果是目录,返回列表的信息如下:目录名

修改日期修改时间权限用户ID 组ID

举例:

hadoop fs -ls -h -R hdfs://ClusterTest/

21、mkdir

-mkdir [-p]  ...: Create a directory in specified location.
          -p  Do not fail if the directory already exists

创建目录

-p参数是指当目录存在时也不报错

举例:

hadoop fs -mkdir -p hdfs://ClusterTest/myhdfsfiles

22、moveFromLocal

-moveFromLocal  ... :    Same as -put, except that the source is
        deleted after it's copied.

将文件从本地移动到HDFS上,实际上是在复制完成后将本地的文件删除。

举例:

hadoop fs -moveFromLocal ~/file1 hdfs://ClusterTest/myhdfsfiles

23、mv

-mv  ... :    Move files that match the specified file pattern 
        to a destination .  When moving multiple files, the
        destination must be a directory.

移动文件,可以将符合正则表达式的文件都移动到一个目录下。如果移动多个文件,目标路径必须是要换个目录。

举例:

hadoop fs -mv hdfs://ClusterTest/myhdfsfiles/* hdfs://ClusterTest/bk2

24、put

-put [-f] [-p]  ... :    Copy files from the local file system
        into fs. Copying fails if the file already
        exists, unless the -f flag is given. Passing
        -p preserves access and modification times,
        ownership and the mode. Passing -f overwrites
        the destination if it already exists.

复制本地文件到HDFS。

-f参数指定后如果目标文件已经存在就覆盖

-p参数指定后,将保留原文件的属性

举例:

hadoop fs -put -f -p ~/myfiles/file1 hdfs://ClusterTest/myhdfsfiles

25、renameSnapshot

-renameSnapshot :    Rename a snapshot from oldName to newName

修改快照的的名称

26、rm

-rm [-f] [-r|-R] [-skipTrash]  ...:    Delete all files that match the specified file pattern.
        Equivalent to the Unix command "rm "
        -skipTrash option bypasses trash, if enabled, and immediately
        deletes 
          -f     If the file does not exist, do not display a diagnostic
                 message or modify the exit status to reflect an error.
          -[rR]  Recursively deletes directories

删除符合给定正则表达式的文件

-f 如果文件不存在也不报错

-rf递归删除

-skipTrash不进回收站,直接删除

27、rmdir

-rmdir [--ignore-fail-on-non-empty]  ...:  Removes the directory entry specified by each directory argument,provided it is empty.

删除目录,只能删除空目录

–ignore-fail-on-non-empty当使用通配符删除多个目录时,如果某一个目录下还有文件,不会提示失败。

举例:

hadoop fs -rmdir –ignore-fail-on-non-empty
hdfs://ClusterTest/myhdfsfiles

注意这里是两个–

28、setfacl

-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]: Sets Access Control Lists (ACLs) of files and directories.
        Options:
        -b :Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility with permission bits.
        -k :Remove the default ACL.
        -R :Apply operations to all files and directories recursively.
        -m :Modify ACL. New entries are added to the ACL, and existing entries are retained.
        -x :Remove specified ACL entries. Other ACL entries are retained.
        --set :Fully replace the ACL, discarding all existing entries. The  must include entries for user, group, and others for compatibility with permission bits.
        : Comma separated list of ACL entries.
        : File or directory to modify.

设置访问控制列表(ACL)的文件和目录。

-b:移除所有除了基本的ACL条目。用户、组和其他的条目被保留为与权限位的兼容性。

-k:删除默认的ACL。
-R: 递归应用于所有文件和目录的操作。
-m:修改ACL。新的项目添加到ACL,并保留现有的条目。
-x: 删除指定的ACL条目。其他保留ACL条目。
–set:完全替换ACL,丢弃所有现有的条目。acl_spec必须包括用户,组,和其他有权限位的兼容性。
acl_spec:逗号分隔的ACL条目列表。
path:修改文件或目录。
举例:

hadoop fs -setfacl -m user:hadoop:rw- /file
hadoop fs -setfacl -x user:hadoop /file
hadoop fs -setfacl -b /file
hadoop fs -setfacl -k /dir
hadoop fs -setfacl –set user::rw-,user:hadoop:rw-,group::r–,other::r–
/file
hadoop fs -setfacl -R -m user:hadoop:r-x /dir
hadoop fs -setfacl -m default:user:hadoop:r-x /dir

29、setrep

-setrep [-R] [-w]  ...:  Set the replication level of a file. If  is a directory
        then the command recursively changes the replication factor of
        all files under the directory tree rooted at .
        The -w flag requests that the command wait for the replication
        to complete. This can potentially take a very long time.
        The -R flag is accepted for backwards compatibility. It has no effect.

更改文件的备份. 如果是一个目录,会递归改变目录下文件的备份。
-w标识,要求备份完成,这可能需要很长时间。
-R标识,是为了兼容,没有实际效果

30、stat

-stat [format]  ...:  Print statistics about the file/directory at 
        in the specified format. Format accepts filesize in blocks (%b), group name of owner(%g),
        filename (%n), block size (%o), replication (%r), user name of owner(%u), modification date (%y, %Y)

按指定格式打印文件/目录的打印统计。格式接受文件块(%b), 类型(%F), groutp拥有者(%g), 名字(%n), block size (%o), replication (%r), 用户拥有者(%u), 修改日期 (%y,
%Y). %y 显示UTC 日期如“yyyy-MM-dd HH:mm:ss” 和 %Y 1970年1月1日以来显示毫秒UTC. 如果没有指定, 默认使用%y.

31、tail

-stat [format]  ...:  Print statistics about the file/directory at 
        in the specified format. Format accepts filesize in blocks (%b), group name of owner(%g),
        filename (%n), block size (%o), replication (%r), user name of owner(%u), modification date (%y, %Y)

显示文件内容,最后千字节的文件发送到stdout

- f选项将输出附加数据随着文件的增长,如同Unix

32、test

-test -[defsz] :  Answer various questions about , with result via exit status.
          -d  return 0 if  is a directory.
          -e  return 0 if  exists.
          -f  return 0 if  is a file.
          -s  return 0 if file  is greater than zero bytes in size.
          -z  return 0 if file  is zero bytes in size.
        else, return 1.

-d:如果路径是一个目录,返回0
-e:如果路径已经存在,返回0
-f: 如果路径是一个文件,返回0
-s:如果路径不是空,返回0
-z:如果文件长度为0,返回0

33、text

-text [-ignoreCrc]  ...:   Takes a source file and outputs the file in text format.
        The allowed formats are zip and TextRecordInputStream and Avro.

将一个源文件,以文本格式输出文件。允许的格式是zip和textrecordinputstream

34、touchz

-touchz  ...: Creates a file of zero length
        at  with current time as the timestamp of that .
        An error is returned if the file exists with non-zero length

创建一个零长度的文件

35、usage

-touchz  ...: Creates a file of zero length
        at  with current time as the timestamp of that .
        An error is returned if the file exists with non-zero length

返回单个命令的帮助

这一篇博文是【大数据技术●降龙十八掌】系列文章的其中一篇,点击查看目录:这里写图片描述大数据技术●降龙十八掌

你可能感兴趣的:(大数据技术,大数据技术)