Hadoop 常见指令

  • 一 概述
  • 二 HDFS 管理命令 fs
  • 三 作业管理命令 job
  • 四 作业提交命令 jar
  • 五 如何停止正在运行的 Hadoop 程序
  • 六 附录

一. 概述

bin 目录下的 Hadoop 脚本是最基础的集群管理脚本,用户可以通过该脚本完成各种功能,如 HDFS 文件管理、MapReduce 作业管理等。该脚本的使用方式:

hadoop [--config confdir] COMMAND
  • –config 是用于设置 Hadoop 配置文件目录,默认目录为 ${HADOOP_HOME}/etc/hadoop/
  • COMMAND 是具体的某个命令,常用的如下几个命令
    • HDFS 管理命令 fs
    • 作业管理命令 job
    • 作业提交命令 jar

我们可以键入 hadoop ,以查看更多的命令:

Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.


二. HDFS 管理命令 fs

[hadoop5@master5 ~]$ hadoop fs -help
Usage: hadoop fs [generic options]
        [-appendToFile <localsrc> ... <dst>]
        [-cat [-ignoreCrc] <src> ...]
        [-checksum <src> ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
        [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-count [-q] <path> ...]
        [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-df [-h] [<path> ...]]
        [-du [-s] [-h] <path> ...]
        [-expunge]
        [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-getfacl [-R] <path>]
        [-getfattr [-R] {-n name | -d} [-e en] <path>]
        [-getmerge [-nl] <src> <localdst>]
        [-help [cmd ...]]
        [-ls [-d] [-h] [-R] [<path> ...]]
        [-mkdir [-p] <path> ...]
        [-moveFromLocal <localsrc> ... <dst>]
        [-moveToLocal <src> <localdst>]
        [-mv <src> ... <dst>]
        [-put [-f] [-p] <localsrc> ... <dst>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
        [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
        [-setfattr {-n name [-v value] | -x name} <path>]
        [-setrep [-R] [-w] <rep> <path> ...]
        [-stat [format] <path> ...]
        [-tail [-f] <file>]
        [-test -[defsz] <path>]
        [-text [-ignoreCrc] <src> ...]
        [-touchz <path> ...]
        [-usage [cmd ...]]

具体想查某个指令的用法,可以键入以下命令查看

hadoop fs -usage ls

更多详细信息,请参考:《Hadoop Shell命令 》 及 附录


三. 作业管理命令 job

hadoop5@master5 ~]$ hadoop job -help
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.

Usage: CLI <command> <args>
        [-submit <job-file>]
        [-status <job-id>]
        [-counter <job-id> <group-name> <counter-name>]
        [-kill <job-id>]
        [-set-priority <job-id> <priority>]. Valid values for priorities are: VERY_HIGH HIGH NORMAL LOW VERY_LOW
        [-events <job-id> <from-event-#> <#-of-events>]
        [-history <jobHistoryFile>]
        [-list [all]]
        [-list-active-trackers]
        [-list-blacklisted-trackers]
        [-list-attempt-ids <job-id> <task-type> <task-state>]. Valid values for <task-type> are MAP REDUCE. Valid values for <task-state> are running, completed
        [-kill-task <task-attempt-id>]
        [-fail-task <task-attempt-id>]
        [-logs <job-id> <task-attempt-id>]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


四. 作业提交命令 jar

hadoop jar <jar> [mainClass] args..
  • <jar> 表示 jar 包名
  • mainClass 表示 main class 名称,可以不必输入而由 jar 命名自动搜索
  • args 是 main class 输入参数
bin/hadoop  jar hadoop-examples-1.0.0.jar wordcount /text/input /test/output


五. 如何停止正在运行的 Hadoop 程序

这需要根据 Hadoop 的版本

1. version 小于2.3.0

  • 查看正在运行的 Hadoop 任务
hadoop job -list
  • 关闭 Hadoop 任务进程
hadoop job -kill $jobId

组合以上两条命令就可以实现 kill 掉指定用户的 job

for i in `hadoop job -list | grep -w username| awk '{print $1}' | grep job_`; do hadoop job -kill $i; done
  • username 就是你希望关闭 Hadoop 任务的用户

2. version 大于等于2.3.0

  • 查看正在运行的 Hadoop 任务
yarn application -list
  • 关闭 Hadoop 任务进程
yarn application -kill $ApplicationId


六. 附录

Hadoop 常见指令_第1张图片


Hadoop 常见指令_第2张图片


Hadoop 常见指令_第3张图片

你可能感兴趣的:(Hadoop 常见指令)