Studying note of GCC-3.4.6 source (71)

4.        Source code parsing

Return from lang_dependent_init, the compiler is almost ready, and enable to parse our source program. At here we can see that, every time to compile source file(s) by command line, the compiler would take above complex initialization. Such compilation, is called translation unit. From above code, can find that the initialization of C++ compiler is much more complicated than C one; and in later, it shows that the processing of C++ also much more difficult than C. For simple source code, C++ compiler no doubt has lower efficiency. But for sophisticated source code, for which C++’s runtime library, the tree form runtime environment, and the rich features are required, C++ compiler may not be inferior.

 

do_compile (continue)

 

4660      compile_file ();

4661

4662      if (flag_unit_at_a_time)

4663      {

4664         rtl_dump_file = cgraph_dump_file;

4665         cgraph_dump_file = NULL;

4666         close_dump_file (DFI_cgraph, NULL, NULL_RTX);

4667      }

4668   }

 

Here compile_file compiles an entire translation unit, and write a file of assembly output and various debugging dumps.

 

1810 static void

1811 compile_file (void)                                                                                   in toplev.c

1812 {

1813   /* Initialize yet another pass.  */

1814

1815   init_final (main_input_filename);

1816   coverage_init (aux_base_name);

1817

1818   timevar_push (TV_PARSE);

1819

1820  /* Call the parser, which parses the entire file (calling

1821     rest_of_compilation for each function).  */

1822   (*lang_hooks.parse_file) (set_yydebug);

 

Routine init_final initializes data used in final pass. Here app_on is set nonzero if have enabled APP processing of our assembler output (this variable used tegother with macro ASM_APP_ON which is a C string constant for text to be output before each asm statement or group of consecutive ones. Normally this is "#APP", which is a comment that has no effect on most assemblers but tells the GNU assembler that it must check the lines that follow for all valid assembler constructs). And variable final_sequence contains the sequence rtx if we are outputting an insn sequence.

 

241  void

242  init_final (const char *filename ATTRIBUTE_UNUSED)                       in final.c

243  {

244    app_on = 0;

245    final_sequence = 0;

246 

247  #ifdef ASSEMBLER_DIALECT

248    dialect_number = ASSEMBLER_DIALECT;

249  #endif

250  }

 

Following, coverage_init does initialization for gcov – the tool of GCC to evaluate code coverage. We just ingore it here. For C++ language, the hook for source file parsing is c_common_parse_file in following.

 

1219 void

1220 c_common_parse_file (int set_yydebug ATTRIBUTE_UNUSED)                   in c-opts.c

1221 {

1222   unsigned file_index;

1223  

1224 #if YYDEBUG != 0

1225   yydebug = set_yydebug;

1226 #else

1227   warning ("YYDEBUG not defined");

1228 #endif

1229

1230   file_index = 0;

1231  

1232   do

1233   {

1234     if (file_index > 0)

1235     {

1236       /* Reset the state of the parser.  */

1237       c_reset_state();

1238

1239       /* Reset cpplib's macros and start a new file.  */

1240       cpp_undef_all (parse_in);

1241       main_input_filename = this_input_filename

1242          = cpp_read_main_file (parse_in, in_fnames[file_index]);

1243       if (this_input_filename == NULL)

1244         break;

1245     }

1246     finish_options ();

1247     if (file_index == 0)

1248       pch_init ();

1249     c_parse_file ();

1250

1251     file_index++;

1252   } while (file_index < num_in_fnames);

1253  

1254   finish_file ();

1255 }

 

Note that in c_common_post_options, the first source file in the command line has been read in and held by main_input_filename. And variable num_in_fnames records the number of source files input via command line. The names of the these files are saved in array in_fnames. Here it can see that source files in the same command line would have their content merged tegother, and forms one big front-end tree.

5.1. Setup macros for the system

Below if cpp_opts->preprocessed is nonzero, means we are handling source files after preprocessing. Otherwise for every input source file, now all command line options have been recorded and but some of them still are unhandled which are kept within deferred_opts. And it is also time to setup macros for the system, which will affect the definitions in system headers.

 

1411 static void

1412 finish_options (void)                                                                                 in c-opts.c

1413 {

1414   if (!cpp_opts->preprocessed)

1415   {

1416     size_t i;

1417

1418     cpp_change_file (parse_in, LC_RENAME, _("<built-in>"));

1419     cpp_init_builtins (parse_in, flag_hosted);

1420     c_cpp_builtins (parse_in);

 

Note that at line 1418, cpp_change_file has following definition. Argument reason of value LC_RENAME means a file name or line number changes for reasons other than new file enter or leave (e.g. a #line directive). Here it means content below forms a included system file “built-in”.

 

973  void

974  cpp_change_file (cpp_reader *pfile, enum lc_reason reason,                          in cppfiles.c

975         const char *new_name)

976  {

977    _cpp_do_file_change (pfile, reason, new_name, 1, 0);

978  }

 

The detail about source file change handling refers to section Do file change.

5.1.1. C++ builtin macros

C++ standard defines some pre-defined macros we can used in programming which are described by builtin_array.

 

289  static const struct builtin builtin_array[] =                                                    in cppinit.c

290  {

291    B("__TIME__",        BT_TIME),

292    B("__DATE__",        BT_DATE),

293    B("__FILE__",         BT_FILE),

294    B("__BASE_FILE__",     BT_BASE_FILE),

295    B("__LINE__",         BT_SPECLINE),

296    B("__INCLUDE_LEVEL__", BT_INCLUDE_LEVEL),

297    /* Keep builtins not used for -traditional-cpp at the end, and

298      update init_builtins() if any more are added.  */

299    B("_Pragma",           BT_PRAGMA),

300    B("__STDC__",        BT_STDC),

301  };

 

Above the rear two are not supported in traditional mode. Below in FOR loop at line 348, function cpp_lookup inserts identifiers of these macros into hashtable ident_hash. As these identifiers stand for builtin objects, the hashnodes should set accordingly. Note that here, we don’t set content for these macros – we don’t know yet.

 

339  void

340  cpp_init_builtins (cpp_reader *pfile, int hosted)                                           in cppinit.c

341  {

342    const struct builtin *b;

343    size_t n = ARRAY_SIZE (builtin_array);

344 

345    if (CPP_OPTION (pfile, traditional))

346      n -= 2;

347 

348    for(b = builtin_array; b < builtin_array + n; b++)

349    {

350      cpp_hashnode *hp = cpp_lookup (pfile, b->name, b->len);

351      hp->type = NT_MACRO;

352      hp->flags |= NODE_BUILTIN | NODE_WARN;

353      hp->value.builtin = b->value;

354    }

355 

356    if (CPP_OPTION (pfile, cplusplus))

357      _cpp_define_builtin (pfile, "__cplusplus 1");

358    else if (CPP_OPTION (pfile, lang) == CLK_ASM)

359      _cpp_define_builtin (pfile, "__ASSEMBLER__ 1");

360    else if (CPP_OPTION (pfile, lang) == CLK_STDC94)

361      _cpp_define_builtin (pfile, "__STDC_VERSION__ 199409L");

362    else if (CPP_OPTION (pfile, c99))

363      _cpp_define_builtin (pfile, "__STDC_VERSION__ 199901L");

364 

365    if (hosted)

366      _cpp_define_builtin (pfile, "__STDC_HOSTED__ 1");

367    else

368      _cpp_define_builtin (pfile, "__STDC_HOSTED__ 0");

369 

370    if (CPP_OPTION (pfile, objc))

371      _cpp_define_builtin (pfile, "__OBJC__ 1");

372  }

 

Then statement at line 357, is just same as declaring: #define __cplusplus 1. For this specific definition, below function is in charge of building the corresponding node of cpp_macro.

 

1832 void

1833 _cpp_define_builtin (cpp_reader *pfile, const char *str)                                in cpplib.c

1834 {

1835   size_t len = strlen (str);

1836   char *buf = alloca (len + 1);

1837   memcpy (buf, str, len);

1838   buf[len] = '/n';

1839   run_directive (pfile, T_DEFINE, buf, len);

1840 }

 

As the name suggests, run_directive runs the directive following the details held in dtable.

 

440  static void

441  run_directive (cpp_reader *pfile, int dir_no, const char *buf, size_t count)     in cpplib.c

442  {

443    cpp_push_buffer (pfile, (const uchar *) buf, count,

444                  /* from_stage3 */ true);

445    /* Disgusting hack.  */

446    if (dir_no == T_PRAGMA)

447      pfile->buffer->file = pfile->buffer->prev->file;

448    start_directive (pfile);

449 

450    /* This is a short-term fix to prevent a leading '#' being

451      interpreted as a directive.  */

452    _cpp_clean_line (pfile);

453 

454    pfile->directive = &dtable[dir_no];

455    if (CPP_OPTION (pfile, traditional))

456      prepare_directive_trad (pfile);

457    pfile->directive->handler (pfile);

458    end_directive (pfile, 1);

459    if (dir_no == T_PRAGMA)

460      pfile->buffer->file = NULL;

461   _cpp_pop_buffer (pfile);

462  }

 

At line 443, cpp_push_buffer links the buf into buffer slot of pfile. Note that at line 447, file field of buffer is of type _cpp_file, which represents the source file. As a result, current buffer is the directive itself, but its file field is 0. As some #pragma directives have effect upon the following content in the source file, the update at line 447 is important.

 

223  static void

224  start_directive (cpp_reader *pfile)                                                               in cpplib.c

225  {

226    /* Setup in-directive state.  */

227    pfile->state.in_directive = 1;

228    pfile->state.save_comments = 0;

229 

230    /* Some handlers need the position of the # for diagnostics.  */

231    pfile->directive_line = pfile->line;

232  }

 

Then start_directive setups related slots of pfile which will affect the behavor of below lex_macro_node. And _cpp_clean_line above, set cur and next_line slots in buffer to the head and end of the definition string. At line 457, handler in fact goes to do_define.

 

502  static void

503  do_define (cpp_reader *pfile)                                                                     in cpplib.c

504  {

505    cpp_hashnode *node = lex_macro_node (pfile);

506 

507    if (node)

508    {

509      /* If we have been requested to expand comments into macros,

510        then re-enable saving of comments.  */

511       pfile->state.save_comments =

512          ! CPP_OPTION (pfile, discard_comments_in_macro_exp);

513 

514      if (_cpp_create_definition (pfile, node))

515        if (pfile->cb.define)

516          pfile->cb.define (pfile, pfile->directive_line, node);

517    }

518  }

 

Function _cpp_lex_token decomposes the content of buffer into tokens (i.e, identifier, string, number etc.), and packs them into cpp_token to return. One token is returned in every invocation.

 

466  static cpp_hashnode *

467  lex_macro_node (cpp_reader *pfile)                                                           in cpplib.c

468  {

469    const cpp_token *token = _cpp_lex_token (pfile);

470 

471    /* The token immediately after #define must be an identifier. That

472      identifier may not be "defined", per C99 6.10.8p4.

473      In C++, it may not be any of the "named operators" either,

474      per C++98 [lex.digraph], [lex.key].

475      Finally, the identifier may not have been poisoned. (In that case

476      the lexer has issued the error message for us.)  */

477 

478    if (token->type == CPP_NAME)

479    {

480      cpp_hashnode *node = token->val.node;

481 

482      if (node == pfile->spec_nodes.n_defined)

483        cpp_error (pfile, CPP_DL_ERROR,

484                 "/"defined/" cannot be used as a macro name");

485      else if (! (node->flags & NODE_POISONED))

486        return node;

487    }

488    else if (token->flags & NAMED_OP)

489      cpp_error (pfile, CPP_DL_ERROR,

490               "/"%s/" cannot be used as a macro name as it is an operator in C++",

491               NODE_NAME (token->val.node));

492    else if (token->type == CPP_EOF)

493      cpp_error (pfile, CPP_DL_ERROR, "no macro name given in #%s directive",

494               pfile->directive->name);

495    else

496      cpp_error (pfile, CPP_DL_ERROR, "macro names must be identifiers");

497 

498    return NULL;

499  }

 

Remember that we can poison an identifier to remove it completely from the program. For an poisoned identifier, it will has flags set with NODE_POISONED. And identifier with flags set by NAMED_OP is named operator in C++. lex_macro_node ensures the identifier is valid for being the name of macro.

The detail of implementing macro definition, can refer to section Create macro definition – ISO mode. The define handle at line 515 is for debugger purpose, we skip it here.

Then function end_directive restores the lexer’s state, and if skip_line is not 0, skips the rest tokens of the line of the directive.

 

235  static void

236  end_directive (cpp_reader *pfile, int skip_line)                                     in cpplib.c

237  {

238    if (CPP_OPTION (pfile, traditional))

239    {

240      /* Revert change of prepare_directive_trad.  */

241      pfile->state.prevent_expansion--;

242 

243      if (pfile->directive != &dtable[T_DEFINE])

244        _cpp_remove_overlay (pfile);

245    }

246    /* We don't skip for an assembler #.  */

247    else if (skip_line)

248   {

249      skip_rest_of_line (pfile);

250      if (!pfile->keep_tokens)

251      {

252        pfile->cur_run = &pfile->base_run;

253        pfile->cur_token = pfile->base_run.base;

254      }

255    }

256 

257    /* Restore state.  */

258    pfile->state.save_comments = ! CPP_OPTION (pfile, discard_comments);

259    pfile->state.in_directive = 0;

260    pfile->state.in_expression = 0;

261    pfile->state.angled_headers = 0;

262    pfile->directive = 0;

263  }

 

你可能感兴趣的:(Studying note of GCC-3.4.6 source (71))