按照惯例,遇到Linux的问题先说环境:我用的是Centos 7.6。
今天遇到了这样一个问题,改变tar命令的选项顺序后,会报错”Exiting with failure status due to previous errors“,但是还是会创建一个普通格式的压缩文件:
# 假设我们所在的目录下有一个目录叫test,test里面有一个叫test.test的文件
tar -cvfz test.tar.gz test/
# tar: test.tar.gz: Cannot stat: No such file or directory
# test/
# test/test.test
# tar: Exiting with failure status due to previous errors
ls
# ... z ...
# 多了一个叫z的文件
tar -tvf z
# test/
# test/test.test
# 可以看到压缩包内容,成功创建了压缩包
tar -ztvf z
# gzip: stdin: not in gzip format
# 但压缩包不是gzip格式的
一直以来,在Linux中进行压缩,比如用gzip吧,用的命令都是tar -zcvf test.tar.gz test/
,也没有深究过为什么要这么用。其实哪怕用man tar
查看文档,得到的文档也没有说明选项的顺序会不会对结果有影响。理论上来说,选项顺序对结果是不应该有影响的,但是这个问题就是发生了。
再次进行测试。这次用老式的Unix写法,一切正常:
tar cvfz test.tar.gz test/
# test/
# test/test.test
tar -ztvf test.tar.gz
# test/
# test/test.test
这就很奇怪了。根据上面的现象,可以得出一个猜测:tar命令在选项前有-
时,会将选项中f后的第一个“单词”当成生成压缩文件的名字。我们都知道,tar的命令格式是tar [-选项] [目标文件名] [源]
,也就是说,命令会被分割成tar [-cvf] [z] [test.tar.gz test/]
,从test.tar.gz和test/中生成一个叫z的普通压缩文件(因为此时z已经被认为是文件名,所以不会调用gzip)。而又因为test.tar.gz是一个不存在的文件,所以会报错文件不存在。
再次调整一下选项顺序,验证一下上面的猜想:
tar -cfvz test.tar.gz test/
# tar: test.tar.gz: Cannot stat: No such file or directory
# test/
# test/test.test
# tar: Exiting with failure status due to previous errors
ls
# ... vz ...
# 多了一个叫vz的文件
tar -tvf vz
# test/
# test/test.test
tar -ztvf vz
# gzip: stdin: not in gzip format
############
tar -fcvz test.tar.gz test/
# tar: You must specify one of one of the '-Acdtrux' or '--test-label' options
看来我们的猜想是正确的,tar命令的执行结果会受到f的位置的影响。我看,Stack Exchange上的老外也说了:
As @jcbermu said, for most programs and in most cases, the order of command line flags is not important. However, some flags expect a value. Specifically, tar’s -f flag is:
-f, --file ARCHIVE
use archive file or device ARCHIVESo, tar expects -f to have a value and that value will be the name of the tarfile it creates. For example, to add all .jpg files to an archive called foo.tar, you would run
tar -f foo.tar *jpg
What you were running wastar -cfv test.tar *.jpg
tar understands that as "create (-c) an archive called v (-fv), containing files test.tar and any ending in .jpg.
When you run tar -cvf test.tar *.jpg on the other hand, it takes test.tar as the name of the archive and *jpg as the list of files.
但是为什么呢?猜想总是需要理论支撑的。网上查半天也没查到,文档里也没说,无奈之下只能自己找tar的源码看了(顺便吐槽一下C语言的可读性)。出于简洁(而且也不知道自己的理解对不对),就只放一些主要逻辑好了:
// tar.c
/* Main routine for tar. */
int main (int argc, char **argv)
{
/* Decode options. */
decode_options (argc, argv);
/* Main command execution. */
switch (subcommand_option)
{
case UNKNOWN_SUBCOMMAND:
USAGE_ERROR ((0, 0,
_("You must specify one of the `-Acdtrux' or `--test-label' options")));
case CREATE_SUBCOMMAND:
create_archive ();
break;
}
/* Dispose of allocated memory, and return. */
if (exit_status == TAREXIT_FAILURE)
error (0, 0, _("Exiting with failure status due to previous errors"));
return exit_status;
}
可以看出,在main函数里调用了decode_options
,对命令行输入的参数进行了解析。从输出里可以看出,所有的问题都是因为输入没有被正确解析。那么,所有的问题都在decode_options
函数里了。
同时,这一段代码这也能解释,为什么报错但是压缩包还是会被创建,因为代码是按照指令执行的,指令之间其实互不影响;而且,报错退出是在整个main函数的最后,并不影响前面的行为(比如,创建)的执行。
// tar.c
/* Convert old-style tar call by exploding option element and rearranging
options accordingly. */
if (argc > 1 && argv[1][0] != '-')
{
int new_argc; /* argc value for rearranged arguments */
char **new_argv; /* argv value for rearranged arguments */
char *const *in; /* cursor into original argv */
char **out; /* cursor into rearranged argv */
const char *letter; /* cursor into old option letters */
char buffer[3]; /* constructed option buffer */
/* Initialize a constructed option. */
buffer[0] = '-';
buffer[2] = '\0';
/* Allocate a new argument array, and copy program name in it. */
new_argc = argc - 1 + strlen (argv[1]);
new_argv = xmalloc ((new_argc + 1) * sizeof (char *));
in = argv;
out = new_argv;
*out++ = *in++;
/* Copy each old letter option as a separate option, and have the
corresponding argument moved next to it. */
for (letter = *in++; *letter; letter++)
{
struct argp_option *opt;
buffer[1] = *letter;
*out++ = xstrdup (buffer);
opt = find_argp_option (options, *letter);
if (opt && opt->arg)
{
if (in < argv + argc)
*out++ = *in++;
else
USAGE_ERROR ((0, 0, _("Old option `%c' requires an argument."),
*letter));
}
}
/* Copy all remaining options. */
while (in < argv + argc)
*out++ = *in++;
*out = 0;
/* Replace the old option list by the new one. */
argc = new_argc;
argv = new_argv;
}
这一段代码能解释为什么用老式的Unix实现(也就是不加-
)不会受影响。因为对于Unix实现,tar会把选项直接拆成对应的指令,然后执行。比如,cvfz
会被直接拆成['c','v','f','z']
,并且将参数argc和argc替换掉。
// tar.c
/* Parse all options and non-options as they appear. */
prepend_default_options (getenv ("TAR_OPTIONS"), &argc, &argv);
if (argp_parse (&argp, argc, argv, ARGP_IN_ORDER, &idx, &args))
exit (TAREXIT_FAILURE);
// prepargs.c
/* Prepend the whitespace-separated options in OPTIONS to the argument
vector of a main program with argument count *PARGC and argument
vector *PARGV. */
void prepend_default_options (char const *options, int *pargc, char ***pargv);
这一段就是对选项进行解析。这个函数的功能已经很明确了,就是提取选项里的命令,并且放到前面。
// tar.c
static error_t parse_opt (int key, char *arg, struct argp_state *state)
{
struct tar_args *args = state->input;
switch (key) {
case ARGP_KEY_ARG:
/* File name or non-parsed option, because of ARGP_IN_ORDER */
name_add_name (arg, MAKE_INCL_OPTIONS (args));
args->input_files = true;
break;
case 'c':
set_subcommand_option (CREATE_SUBCOMMAND);
break;
case 'f':
if (archive_names == allocated_archive_names)
archive_name_array = x2nrealloc (archive_name_array,
&allocated_archive_names,
sizeof (archive_name_array[0]));
archive_name_array[archive_names++] = arg;
break;
}
}
// agrp.h
/* This is not an option at all, but rather a command line argument. If a
parser receiving this key returns success, the fact is recorded, and the
ARGP_KEY_NO_ARGS case won't be used. HOWEVER, if while processing the
argument, a parser function decrements the NEXT field of the state it's
passed, the option won't be considered processed; this is to allow you to
actually modify the argument (perhaps into an option), and have it
processed again. */
#define ARGP_KEY_ARG 0
// names.c
/* Add to name_array the file NAME with fnmatch options MATCHING_FLAGS */
void name_add_name (const char *name, int matching_flags)
{
static int prev_flags = 0; /* FIXME: Or EXCLUDE_ANCHORED? */
struct name_elt *ep;
check_name_alloc ();
ep = &name_array[entries++];
if (prev_flags != matching_flags)
{
ep->type = NELT_FMASK;
ep->v.matching_flags = matching_flags;
prev_flags = matching_flags;
check_name_alloc ();
ep = &name_array[entries++];
}
ep->type = NELT_NAME;
ep->v.name = name;
name_count++;
}
答案在此揭晓:在选项里遇到f,就会把arg直接加入archive_names[]
里;也就是说,会把选项里剩下的内容当做文件名。事实上,创建压缩文件的时候会把数组里第一个字符串作为目标文件名,可以在create.c里查看。
所以,这算不算tar的一个bug呢?