Linux tar命令选项顺序不同导致"Exiting with failure status due to previous errors"

按照惯例,遇到Linux的问题先说环境:我用的是Centos 7.6。

今天遇到了这样一个问题,改变tar命令的选项顺序后,会报错”Exiting with failure status due to previous errors“,但是还是会创建一个普通格式的压缩文件:

# 假设我们所在的目录下有一个目录叫test,test里面有一个叫test.test的文件
tar -cvfz test.tar.gz test/
# tar: test.tar.gz: Cannot stat: No such file or directory
# test/
# test/test.test
# tar: Exiting with failure status due to previous errors
ls
# ... z ...
# 多了一个叫z的文件
tar -tvf z
# test/
# test/test.test
# 可以看到压缩包内容,成功创建了压缩包
tar -ztvf z
# gzip: stdin: not in gzip format
# 但压缩包不是gzip格式的

一直以来,在Linux中进行压缩,比如用gzip吧,用的命令都是tar -zcvf test.tar.gz test/,也没有深究过为什么要这么用。其实哪怕用man tar查看文档,得到的文档也没有说明选项的顺序会不会对结果有影响。理论上来说,选项顺序对结果是不应该有影响的,但是这个问题就是发生了。

再次进行测试。这次用老式的Unix写法,一切正常:

tar cvfz test.tar.gz test/
# test/
# test/test.test
tar -ztvf test.tar.gz
# test/
# test/test.test

这就很奇怪了。根据上面的现象,可以得出一个猜测:tar命令在选项前有-时,会将选项中f后的第一个“单词”当成生成压缩文件的名字。我们都知道,tar的命令格式是tar [-选项] [目标文件名] [源],也就是说,命令会被分割成tar [-cvf] [z] [test.tar.gz test/],从test.tar.gz和test/中生成一个叫z的普通压缩文件(因为此时z已经被认为是文件名,所以不会调用gzip)。而又因为test.tar.gz是一个不存在的文件,所以会报错文件不存在。

再次调整一下选项顺序,验证一下上面的猜想:

tar -cfvz test.tar.gz test/
# tar: test.tar.gz: Cannot stat: No such file or directory
# test/
# test/test.test
# tar: Exiting with failure status due to previous errors
ls
# ... vz ...
# 多了一个叫vz的文件
tar -tvf vz
# test/
# test/test.test
tar -ztvf vz
# gzip: stdin: not in gzip format
############
tar -fcvz test.tar.gz test/
# tar: You must specify one of one of the '-Acdtrux' or '--test-label' options

看来我们的猜想是正确的,tar命令的执行结果会受到f的位置的影响。我看,Stack Exchange上的老外也说了:

As @jcbermu said, for most programs and in most cases, the order of command line flags is not important. However, some flags expect a value. Specifically, tar’s -f flag is:

-f, --file ARCHIVE
use archive file or device ARCHIVE

So, tar expects -f to have a value and that value will be the name of the tarfile it creates. For example, to add all .jpg files to an archive called foo.tar, you would run
tar -f foo.tar *jpg
What you were running was

tar -cfv test.tar *.jpg

tar understands that as "create (-c) an archive called v (-fv), containing files test.tar and any ending in .jpg.
When you run tar -cvf test.tar *.jpg on the other hand, it takes test.tar as the name of the archive and *jpg as the list of files.

但是为什么呢?猜想总是需要理论支撑的。网上查半天也没查到,文档里也没说,无奈之下只能自己找tar的源码看了(顺便吐槽一下C语言的可读性)。出于简洁(而且也不知道自己的理解对不对),就只放一些主要逻辑好了:

// tar.c
/* Main routine for tar.  */
int main (int argc, char **argv)
{
  /* Decode options.  */
  decode_options (argc, argv);
  /* Main command execution.  */
  switch (subcommand_option)
    {
    case UNKNOWN_SUBCOMMAND:
      USAGE_ERROR ((0, 0,
		    _("You must specify one of the `-Acdtrux' or `--test-label'  options")));

    case CREATE_SUBCOMMAND:
      create_archive ();
      break;
    }
  /* Dispose of allocated memory, and return.  */
  if (exit_status == TAREXIT_FAILURE)
    error (0, 0, _("Exiting with failure status due to previous errors"));
  return exit_status;
}

可以看出,在main函数里调用了decode_options,对命令行输入的参数进行了解析。从输出里可以看出,所有的问题都是因为输入没有被正确解析。那么,所有的问题都在decode_options函数里了。

同时,这一段代码这也能解释,为什么报错但是压缩包还是会被创建,因为代码是按照指令执行的,指令之间其实互不影响;而且,报错退出是在整个main函数的最后,并不影响前面的行为(比如,创建)的执行。

// tar.c
/* Convert old-style tar call by exploding option element and rearranging
     options accordingly.  */
  if (argc > 1 && argv[1][0] != '-')
    {
      int new_argc;		/* argc value for rearranged arguments */
      char **new_argv;		/* argv value for rearranged arguments */
      char *const *in;		/* cursor into original argv */
      char **out;		/* cursor into rearranged argv */
      const char *letter;	/* cursor into old option letters */
      char buffer[3];		/* constructed option buffer */
      /* Initialize a constructed option.  */
      buffer[0] = '-';
      buffer[2] = '\0';
      /* Allocate a new argument array, and copy program name in it.  */
      new_argc = argc - 1 + strlen (argv[1]);
      new_argv = xmalloc ((new_argc + 1) * sizeof (char *));
      in = argv;
      out = new_argv;
      *out++ = *in++;
      /* Copy each old letter option as a separate option, and have the
	 corresponding argument moved next to it.  */
      for (letter = *in++; *letter; letter++)
	{
	  struct argp_option *opt;
          
	  buffer[1] = *letter;
	  *out++ = xstrdup (buffer);
	  opt = find_argp_option (options, *letter);
	  if (opt && opt->arg)
	    {
	      if (in < argv + argc)
		*out++ = *in++;
	      else
		USAGE_ERROR ((0, 0, _("Old option `%c' requires an argument."),
			      *letter));
	    }
	}
      /* Copy all remaining options.  */
      while (in < argv + argc)
	*out++ = *in++;
      *out = 0;
      /* Replace the old option list by the new one.  */
      argc = new_argc;
      argv = new_argv;
    }

这一段代码能解释为什么用老式的Unix实现(也就是不加-)不会受影响。因为对于Unix实现,tar会把选项直接拆成对应的指令,然后执行。比如,cvfz会被直接拆成['c','v','f','z'],并且将参数argc和argc替换掉。

// tar.c
/* Parse all options and non-options as they appear.  */
  prepend_default_options (getenv ("TAR_OPTIONS"), &argc, &argv);
  if (argp_parse (&argp, argc, argv, ARGP_IN_ORDER, &idx, &args))
    exit (TAREXIT_FAILURE);
// prepargs.c
/* Prepend the whitespace-separated options in OPTIONS to the argument
   vector of a main program with argument count *PARGC and argument
   vector *PARGV.  */
void prepend_default_options (char const *options, int *pargc, char ***pargv);

这一段就是对选项进行解析。这个函数的功能已经很明确了,就是提取选项里的命令,并且放到前面。

// tar.c
static error_t parse_opt (int key, char *arg, struct argp_state *state)
{
  struct tar_args *args = state->input;
  switch (key) {
    case ARGP_KEY_ARG:
      /* File name or non-parsed option, because of ARGP_IN_ORDER */
      name_add_name (arg, MAKE_INCL_OPTIONS (args));
      args->input_files = true;
      break;
    case 'c':
      set_subcommand_option (CREATE_SUBCOMMAND);
      break;
    case 'f':
      if (archive_names == allocated_archive_names)
	archive_name_array = x2nrealloc (archive_name_array,
					 &allocated_archive_names,
					 sizeof (archive_name_array[0]));
      archive_name_array[archive_names++] = arg;
      break;
  }
}
// agrp.h
/* This is not an option at all, but rather a command line argument.  If a
   parser receiving this key returns success, the fact is recorded, and the
   ARGP_KEY_NO_ARGS case won't be used.  HOWEVER, if while processing the
   argument, a parser function decrements the NEXT field of the state it's
   passed, the option won't be considered processed; this is to allow you to
   actually modify the argument (perhaps into an option), and have it
   processed again.  */
#define ARGP_KEY_ARG            0
// names.c
/* Add to name_array the file NAME with fnmatch options MATCHING_FLAGS */
void name_add_name (const char *name, int matching_flags)
{
  static int prev_flags = 0; /* FIXME: Or EXCLUDE_ANCHORED? */
  struct name_elt *ep;

  check_name_alloc ();
  ep = &name_array[entries++];
  if (prev_flags != matching_flags)
    {
      ep->type = NELT_FMASK;
      ep->v.matching_flags = matching_flags;
      prev_flags = matching_flags;
      check_name_alloc ();
      ep = &name_array[entries++];
    }
  ep->type = NELT_NAME;
  ep->v.name = name;
  name_count++;
}

答案在此揭晓:在选项里遇到f,就会把arg直接加入archive_names[]里;也就是说,会把选项里剩下的内容当做文件名。事实上,创建压缩文件的时候会把数组里第一个字符串作为目标文件名,可以在create.c里查看。

所以,这算不算tar的一个bug呢?

你可能感兴趣的:(有趣的bug,操作系统)