hadoop fs -ls /
时,会列出文件路径,如:
hadoop fs -ls /
Found 1 items
drwxr-xr-x - didi supergroup 0 2019-06-09 13:57 /user
本文解析其执行过程。
Fsshell.java
中的main函数入口,如: public static void main(String argv[]) throws Exception {
System.out.println("进入FsShell控制台..."); //加了一行打印
FsShell shell = newShellInstance();
...
mvn clean package -DskipTests -Pdist,native -Dtar -Dmaven.javadoc.skip=true
hadoop-dist
文件夹下cd hadoop-dist/target/hadoop-2.7.2-2323
sh -x ./bin/hadoop fs -ls /
经过上述几个步骤,通过sh -x
打印详细信息可以分析。
首先进入源码的hadoop
脚本:
bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin"; pwd`
具体信息如下:
+ set -v
bin=`which $0`
which $0
++ which ./bin/hadoop
+ bin=./bin/hadoop
bin=`dirname ${bin}`
dirname ${bin}
++ dirname ./bin/hadoop
+ bin=./bin
bin=`cd "$bin"; pwd`
cd "$bin"; pwd
++ cd ./bin
++ pwd
+ bin=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/bin
目的是得到当前执行hadoop脚本的位置并进入。
接下来:
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
sh 和 ./ 一样。建子shell,子shell可用父shell变量
source 和 . 一样。相当于把文本复制到当前shell执行,同一个shell
目的是加载统计目录下的配置文件hadoop-config.sh
:
接下来是hadoop-config.sh
内容:
this="${BASH_SOURCE-$0}" # 获得路径及脚本
common_bin=$(cd -P -- "$(dirname -- "$this")" && pwd -P)
script="$(basename -- "$this")"
this="$common_bin/$script"
[ -f "$common_bin/hadoop-layout.sh" ] && . "$common_bin/hadoop-layout.sh"
具体解析如下:
++ script=hadoop-config.sh
this="$common_bin/$script"
++ this=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/libexec/hadoop-config.sh
[ -f "$common_bin/hadoop-layout.sh" ] && . "$common_bin/hadoop-layout.sh"
++ '[' -f /Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/libexec/hadoop-layout.sh ']'
可知 hadoop-layout.sh
并不存在,不用执行。
接下来是share包的加载:
HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"}
HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"}
HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"}
HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"}
HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"}
YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"}
YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"}
MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"}
MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"}
# the root of the Hadoop installation
# See HADOOP-6255 for directory structure layout
HADOOP_DEFAULT_PREFIX=$(cd -P -- "$common_bin"/.. && pwd -P)
HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX}
export HADOOP_PREFIX
接下来:
if [ $# -gt 1 ]
then
if [ "--config" = "$1" ]
then
shift
confdir=$1
if [ ! -d "$confdir" ]; then
echo "Error: Cannot find configuration directory: $confdir"
exit 1
fi
shift
HADOOP_CONF_DIR=$confdir
fi
fi
$#
表示参数个数,我们通过调试信息可知:
++ '[' 3 -gt 1 ']'
++ '[' --config = fs ']'
我们的参数有3个,且第一个参数是fs,不是–config,故该条件不满足。这里可知,hadoop允许传入–config参数设定 HADOOP_CONF_DIR
目录。
其他内容暂时忽略。回到hadoop
脚本。
function print_usage(){
echo "Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]"
echo " CLASSNAME run the class named CLASSNAME"
echo " or"
echo " where COMMAND is one of:"
echo " fs run a generic filesystem user client"
echo " version print the version"
echo " jar run a jar file"
echo " note: please use \"yarn jar\" to launch"
echo " YARN applications, not this command."
echo " checknative [-a|-h] check native hadoop and compression libraries availability"
echo " distcp copy file or directories recursively"
echo " archive -archiveName NAME -p * create a hadoop archive"
echo " classpath prints the class path needed to get the"
echo " credential interact with credential providers"
echo " Hadoop jar and the required libraries"
echo " daemonlog get/set the log level for each daemon"
echo " trace view and modify Hadoop tracing settings"
echo " externaltrash run a external trash tool"
echo ""
echo "Most commands print help when invoked w/o parameters."
}
提示信息。
if [ $# = 0 ]; then
print_usage
exit
fi
如果参数是0,则显示提示信息。
以下命令将显示hadoop
命令可以接的所有参数:
COMMAND=$1
case $COMMAND in
# usage flags
--help|-help|-h)
print_usage
exit
;;
#hdfs commands
namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups|portmap|nfs3)
echo "DEPRECATED: Use of this script to execute hdfs command is deprecated." 1>&2
echo "Instead use the hdfs command for it." 1>&2
echo "" 1>&2
#try to locate hdfs and if present, delegate to it.
shift
if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then
exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then
exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
else
echo "HADOOP_HDFS_HOME not found!"
exit 1
fi
;;
#mapred commands for backwards compatibility
pipes|job|queue|mrgroups|mradmin|jobtracker|tasktracker)
echo "DEPRECATED: Use of this script to execute mapred command is deprecated." 1>&2
echo "Instead use the mapred command for it." 1>&2
echo "" 1>&2
#try to locate mapred and if present, delegate to it.
shift
if [ -f "${HADOOP_MAPRED_HOME}"/bin/mapred ]; then
exec "${HADOOP_MAPRED_HOME}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/mapred ]; then
exec "${HADOOP_PREFIX}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
else
echo "HADOOP_MAPRED_HOME not found!"
exit 1
fi
;;
#core commands
*)
# the core commands
if [ "$COMMAND" = "fs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
elif [ "$COMMAND" = "version" ] ; then
CLASS=org.apache.hadoop.util.VersionInfo
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
if [[ -n "${YARN_OPTS}" ]] || [[ -n "${YARN_CLIENT_OPTS}" ]]; then
echo "WARNING: Use \"yarn jar\" to launch YARN applications." 1>&2
fi
elif [ "$COMMAND" = "key" ] ; then
CLASS=org.apache.hadoop.crypto.key.KeyShell
elif [ "$COMMAND" = "checknative" ] ; then
CLASS=org.apache.hadoop.util.NativeLibraryChecker
elif [ "$COMMAND" = "distcp" ] ; then
CLASS=org.apache.hadoop.tools.DistCp
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [ "$COMMAND" = "daemonlog" ] ; then
CLASS=org.apache.hadoop.log.LogLevel
elif [ "$COMMAND" = "archive" ] ; then
CLASS=org.apache.hadoop.tools.HadoopArchives
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [ "$COMMAND" = "externaltrash" ]; then
CLASS=org.apache.hadoop.externaltrash.ExternalTrash
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [ "$COMMAND" = "credential" ] ; then
CLASS=org.apache.hadoop.security.alias.CredentialShell
elif [ "$COMMAND" = "trace" ] ; then
CLASS=org.apache.hadoop.tracing.TraceAdmin
elif [ "$COMMAND" = "classpath" ] ; then
if [ "$#" -gt 1 ]; then
CLASS=org.apache.hadoop.util.Classpath
else
# No need to bother starting up a JVM for this simple case.
if $cygwin; then
CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)
fi
echo $CLASSPATH
exit
fi
elif [[ "$COMMAND" = -* ]] ; then
# class and package names cannot begin with a -
echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'"
exit 1
else
CLASS=$COMMAND
fi
# cygwin path translation
if $cygwin; then
CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)
HADOOP_LOG_DIR=$(cygpath -w "$HADOOP_LOG_DIR" 2>/dev/null)
HADOOP_PREFIX=$(cygpath -w "$HADOOP_PREFIX" 2>/dev/null)
HADOOP_CONF_DIR=$(cygpath -w "$HADOOP_CONF_DIR" 2>/dev/null)
HADOOP_COMMON_HOME=$(cygpath -w "$HADOOP_COMMON_HOME" 2>/dev/null)
HADOOP_HDFS_HOME=$(cygpath -w "$HADOOP_HDFS_HOME" 2>/dev/null)
HADOOP_YARN_HOME=$(cygpath -w "$HADOOP_YARN_HOME" 2>/dev/null)
HADOOP_MAPRED_HOME=$(cygpath -w "$HADOOP_MAPRED_HOME" 2>/dev/null)
fi
shift
# Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
#make sure security appender is turned off
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"
export CLASSPATH=$CLASSPATH
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
;;
esac
我们分析fs
部分:
+ case $COMMAND in
+ '[' fs = fs ']'
+ CLASS=org.apache.hadoop.fs.FsShell
+ false
+ shift
+ HADOOP_OPTS=' -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323 -Dhadoop.id.str=didi -Dhadoop.root.logger=INFO,console -Djava.library.path=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx2048m '
+ HADOOP_OPTS=' -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323 -Dhadoop.id.str=didi -Dhadoop.root.logger=INFO,console -Djava.library.path=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx2048m -Dhadoop.security.logger=INFO,NullAppender'
+ export 'CLASSPATH=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/etc/hadoop:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/common/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/common/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/yarn/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/yarn/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/mapreduce/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar'
+ CLASSPATH='/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/etc/hadoop:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/common/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/common/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/yarn/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/yarn/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/mapreduce/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar'
+ exec /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/bin/java -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323 -Dhadoop.id.str=didi -Dhadoop.root.logger=INFO,console -Djava.library.path=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx2048m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.fs.FsShell -ls /
真正开始执行的命令是:
export CLASSPATH=$CLASSPATH
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
我们将上述export
和exec
复制出来在shell
中执行:
$ export CLASSPATH='/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/etc/hadoop:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/common/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/common/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/hdfs/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/yarn/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/yarn/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/mapreduce/lib/*:/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar'
$ /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/bin/java -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323 -Dhadoop.id.str=didi -Dhadoop.root.logger=INFO,console -Djava.library.path=/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx2048m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.fs.FsShell -ls /
进入FsShell控制台...
Found 1 items
drwxr-xr-x - didi supergroup 0 2019-06-09 13:57 /user
可以看到也能运行成功。我们解析一下CLASSPATH
的内容。
1.
/Users/didi/CodeFile/xx_hadoop/hadoop-dist/target/hadoop-2.7.2-2323/etc/hadoop
配置环境目录
2.
.../share/common/lib/*
和.../share/common/*
common包目录
3.
其他目录如下:
public static void main(String argv[]) throws Exception {
System.out.println("进入FsShell控制台...");
FsShell shell = newShellInstance(); //fs实例
Configuration conf = new Configuration(); //配置类
conf.setQuietMode(false); //设置成“非安静模式”,默认为“安静模式”,在安静模式下,error和information的信息不会被记录。
shell.setConf(conf);
int res;
try {
res = ToolRunner.run(shell, argv); //ToolRunner就是一个工具类,用于执行实现了接口`Tool`的类
} finally {
shell.close();
}
System.exit(res);
}
参考:https://blog.csdn.net/strongyoung88/article/details/68952248