Return from lang_dependent_init, the compiler is almost ready, and enable to parse our source program. At here we can see that, every time to compile source file(s) by command line, the compiler would take above complex initialization. Such compilation, is called translation unit. From above code, can find that the initialization of C++ compiler is much more complicated than C one; and in later, it shows that the processing of C++ also much more difficult than C. For simple source code, C++ compiler no doubt has lower efficiency. But for sophisticated source code, for which C++’s runtime library, the tree form runtime environment, and the rich features are required, C++ compiler may not be inferior.
do_compile (continue)
4660 compile_file ();
4661
4662 if (flag_unit_at_a_time)
4663 {
4664 rtl_dump_file = cgraph_dump_file;
4665 cgraph_dump_file = NULL;
4666 close_dump_file (DFI_cgraph, NULL, NULL_RTX);
4667 }
4668 }
Here compile_file compiles an entire translation unit, and write a file of assembly output and various debugging dumps.
1810 static void
1811 compile_file (void) in toplev.c
1812 {
1813 /* Initialize yet another pass. */
1814
1815 init_final (main_input_filename);
1816 coverage_init (aux_base_name);
1817
1818 timevar_push (TV_PARSE);
1819
1820 /* Call the parser, which parses the entire file (calling
1821 rest_of_compilation for each function). */
1822 (*lang_hooks.parse_file) (set_yydebug);
Routine init_final initializes data used in final pass. Here app_on is set nonzero if have enabled APP processing of our assembler output (this variable used tegother with macro ASM_APP_ON which is a C string constant for text to be output before each asm statement or group of consecutive ones. Normally this is "#APP", which is a comment that has no effect on most assemblers but tells the GNU assembler that it must check the lines that follow for all valid assembler constructs). And variable final_sequence contains the sequence rtx if we are outputting an insn sequence.
241 void
242 init_final (const char *filename ATTRIBUTE_UNUSED) in final.c
243 {
244 app_on = 0;
245 final_sequence = 0;
246
247 #ifdef ASSEMBLER_DIALECT
248 dialect_number = ASSEMBLER_DIALECT;
249 #endif
250 }
Following, coverage_init does initialization for gcov – the tool of GCC to evaluate code coverage. We just ingore it here. For C++ language, the hook for source file parsing is c_common_parse_file in following.
1219 void
1220 c_common_parse_file (int set_yydebug ATTRIBUTE_UNUSED) in c-opts.c
1221 {
1222 unsigned file_index;
1223
1224 #if YYDEBUG != 0
1225 yydebug = set_yydebug;
1226 #else
1227 warning ("YYDEBUG not defined");
1228 #endif
1229
1230 file_index = 0;
1231
1232 do
1233 {
1234 if (file_index > 0)
1235 {
1236 /* Reset the state of the parser. */
1237 c_reset_state();
1238
1239 /* Reset cpplib's macros and start a new file. */
1240 cpp_undef_all (parse_in);
1241 main_input_filename = this_input_filename
1242 = cpp_read_main_file (parse_in, in_fnames[file_index]);
1243 if (this_input_filename == NULL)
1244 break;
1245 }
1246 finish_options ();
1247 if (file_index == 0)
1248 pch_init ();
1249 c_parse_file ();
1250
1251 file_index++;
1252 } while (file_index < num_in_fnames);
1253
1254 finish_file ();
1255 }
Note that in c_common_post_options, the first source file in the command line has been read in and held by main_input_filename. And variable num_in_fnames records the number of source files input via command line. The names of the these files are saved in array in_fnames. Here it can see that source files in the same command line would have their content merged tegother, and forms one big front-end tree.
Below if cpp_opts->preprocessed is nonzero, means we are handling source files after preprocessing. Otherwise for every input source file, now all command line options have been recorded and but some of them still are unhandled which are kept within deferred_opts. And it is also time to setup macros for the system, which will affect the definitions in system headers.
1411 static void
1412 finish_options (void) in c-opts.c
1413 {
1414 if (!cpp_opts->preprocessed)
1415 {
1416 size_t i;
1417
1418 cpp_change_file (parse_in, LC_RENAME, _("<built-in>"));
1419 cpp_init_builtins (parse_in, flag_hosted);
1420 c_cpp_builtins (parse_in);
Note that at line 1418, cpp_change_file has following definition. Argument reason of value LC_RENAME means a file name or line number changes for reasons other than new file enter or leave (e.g. a #line directive). Here it means content below forms a included system file “built-in”.
973 void
974 cpp_change_file (cpp_reader *pfile, enum lc_reason reason, in cppfiles.c
975 const char *new_name)
976 {
977 _cpp_do_file_change (pfile, reason, new_name, 1, 0);
978 }
The detail about source file change handling refers to section Do file change.
C++ standard defines some pre-defined macros we can used in programming which are described by builtin_array.
289 static const struct builtin builtin_array[] = in cppinit.c
290 {
291 B("__TIME__", BT_TIME),
292 B("__DATE__", BT_DATE),
293 B("__FILE__", BT_FILE),
294 B("__BASE_FILE__", BT_BASE_FILE),
295 B("__LINE__", BT_SPECLINE),
296 B("__INCLUDE_LEVEL__", BT_INCLUDE_LEVEL),
297 /* Keep builtins not used for -traditional-cpp at the end, and
298 update init_builtins() if any more are added. */
299 B("_Pragma", BT_PRAGMA),
300 B("__STDC__", BT_STDC),
301 };
Above the rear two are not supported in traditional mode. Below in FOR loop at line 348, function cpp_lookup inserts identifiers of these macros into hashtable ident_hash. As these identifiers stand for builtin objects, the hashnodes should set accordingly. Note that here, we don’t set content for these macros – we don’t know yet.
339 void
340 cpp_init_builtins (cpp_reader *pfile, int hosted) in cppinit.c
341 {
342 const struct builtin *b;
343 size_t n = ARRAY_SIZE (builtin_array);
344
345 if (CPP_OPTION (pfile, traditional))
346 n -= 2;
347
348 for(b = builtin_array; b < builtin_array + n; b++)
349 {
350 cpp_hashnode *hp = cpp_lookup (pfile, b->name, b->len);
351 hp->type = NT_MACRO;
352 hp->flags |= NODE_BUILTIN | NODE_WARN;
353 hp->value.builtin = b->value;
354 }
355
356 if (CPP_OPTION (pfile, cplusplus))
357 _cpp_define_builtin (pfile, "__cplusplus 1");
358 else if (CPP_OPTION (pfile, lang) == CLK_ASM)
359 _cpp_define_builtin (pfile, "__ASSEMBLER__ 1");
360 else if (CPP_OPTION (pfile, lang) == CLK_STDC94)
361 _cpp_define_builtin (pfile, "__STDC_VERSION__ 199409L");
362 else if (CPP_OPTION (pfile, c99))
363 _cpp_define_builtin (pfile, "__STDC_VERSION__ 199901L");
364
365 if (hosted)
366 _cpp_define_builtin (pfile, "__STDC_HOSTED__ 1");
367 else
368 _cpp_define_builtin (pfile, "__STDC_HOSTED__ 0");
369
370 if (CPP_OPTION (pfile, objc))
371 _cpp_define_builtin (pfile, "__OBJC__ 1");
372 }
Then statement at line 357, is just same as declaring: #define __cplusplus 1. For this specific definition, below function is in charge of building the corresponding node of cpp_macro.
1832 void
1833 _cpp_define_builtin (cpp_reader *pfile, const char *str) in cpplib.c
1834 {
1835 size_t len = strlen (str);
1836 char *buf = alloca (len + 1);
1837 memcpy (buf, str, len);
1838 buf[len] = '/n';
1839 run_directive (pfile, T_DEFINE, buf, len);
1840 }
As the name suggests, run_directive runs the directive following the details held in dtable.
440 static void
441 run_directive (cpp_reader *pfile, int dir_no, const char *buf, size_t count) in cpplib.c
442 {
443 cpp_push_buffer (pfile, (const uchar *) buf, count,
444 /* from_stage3 */ true);
445 /* Disgusting hack. */
446 if (dir_no == T_PRAGMA)
447 pfile->buffer->file = pfile->buffer->prev->file;
448 start_directive (pfile);
449
450 /* This is a short-term fix to prevent a leading '#' being
451 interpreted as a directive. */
452 _cpp_clean_line (pfile);
453
454 pfile->directive = &dtable[dir_no];
455 if (CPP_OPTION (pfile, traditional))
456 prepare_directive_trad (pfile);
457 pfile->directive->handler (pfile);
458 end_directive (pfile, 1);
459 if (dir_no == T_PRAGMA)
460 pfile->buffer->file = NULL;
461 _cpp_pop_buffer (pfile);
462 }
At line 443, cpp_push_buffer links the buf into buffer slot of pfile. Note that at line 447, file field of buffer is of type _cpp_file, which represents the source file. As a result, current buffer is the directive itself, but its file field is 0. As some #pragma directives have effect upon the following content in the source file, the update at line 447 is important.
223 static void
224 start_directive (cpp_reader *pfile) in cpplib.c
225 {
226 /* Setup in-directive state. */
227 pfile->state.in_directive = 1;
228 pfile->state.save_comments = 0;
229
230 /* Some handlers need the position of the # for diagnostics. */
231 pfile->directive_line = pfile->line;
232 }
Then start_directive setups related slots of pfile which will affect the behavor of below lex_macro_node. And _cpp_clean_line above, set cur and next_line slots in buffer to the head and end of the definition string. At line 457, handler in fact goes to do_define.
502 static void
503 do_define (cpp_reader *pfile) in cpplib.c
504 {
505 cpp_hashnode *node = lex_macro_node (pfile);
506
507 if (node)
508 {
509 /* If we have been requested to expand comments into macros,
510 then re-enable saving of comments. */
511 pfile->state.save_comments =
512 ! CPP_OPTION (pfile, discard_comments_in_macro_exp);
513
514 if (_cpp_create_definition (pfile, node))
515 if (pfile->cb.define)
516 pfile->cb.define (pfile, pfile->directive_line, node);
517 }
518 }
Function _cpp_lex_token decomposes the content of buffer into tokens (i.e, identifier, string, number etc.), and packs them into cpp_token to return. One token is returned in every invocation.
466 static cpp_hashnode *
467 lex_macro_node (cpp_reader *pfile) in cpplib.c
468 {
469 const cpp_token *token = _cpp_lex_token (pfile);
470
471 /* The token immediately after #define must be an identifier. That
472 identifier may not be "defined", per C99 6.10.8p4.
473 In C++, it may not be any of the "named operators" either,
474 per C++98 [lex.digraph], [lex.key].
475 Finally, the identifier may not have been poisoned. (In that case
476 the lexer has issued the error message for us.) */
477
478 if (token->type == CPP_NAME)
479 {
480 cpp_hashnode *node = token->val.node;
481
482 if (node == pfile->spec_nodes.n_defined)
483 cpp_error (pfile, CPP_DL_ERROR,
484 "/"defined/" cannot be used as a macro name");
485 else if (! (node->flags & NODE_POISONED))
486 return node;
487 }
488 else if (token->flags & NAMED_OP)
489 cpp_error (pfile, CPP_DL_ERROR,
490 "/"%s/" cannot be used as a macro name as it is an operator in C++",
491 NODE_NAME (token->val.node));
492 else if (token->type == CPP_EOF)
493 cpp_error (pfile, CPP_DL_ERROR, "no macro name given in #%s directive",
494 pfile->directive->name);
495 else
496 cpp_error (pfile, CPP_DL_ERROR, "macro names must be identifiers");
497
498 return NULL;
499 }
Remember that we can poison an identifier to remove it completely from the program. For an poisoned identifier, it will has flags set with NODE_POISONED. And identifier with flags set by NAMED_OP is named operator in C++. lex_macro_node ensures the identifier is valid for being the name of macro.
The detail of implementing macro definition, can refer to section Create macro definition – ISO mode. The define handle at line 515 is for debugger purpose, we skip it here.
Then function end_directive restores the lexer’s state, and if skip_line is not 0, skips the rest tokens of the line of the directive.
235 static void
236 end_directive (cpp_reader *pfile, int skip_line) in cpplib.c
237 {
238 if (CPP_OPTION (pfile, traditional))
239 {
240 /* Revert change of prepare_directive_trad. */
241 pfile->state.prevent_expansion--;
242
243 if (pfile->directive != &dtable[T_DEFINE])
244 _cpp_remove_overlay (pfile);
245 }
246 /* We don't skip for an assembler #. */
247 else if (skip_line)
248 {
249 skip_rest_of_line (pfile);
250 if (!pfile->keep_tokens)
251 {
252 pfile->cur_run = &pfile->base_run;
253 pfile->cur_token = pfile->base_run.base;
254 }
255 }
256
257 /* Restore state. */
258 pfile->state.save_comments = ! CPP_OPTION (pfile, discard_comments);
259 pfile->state.in_directive = 0;
260 pfile->state.in_expression = 0;
261 pfile->state.angled_headers = 0;
262 pfile->directive = 0;
263 }