Studying note of GCC-3.4.6 source (21)

3.3. Handling switches

3.3.1. Options related to optimization

Back to decode_options, at line 480, for C++, initialize_diagnostics in lang_hooks points to cxx_initialize_diagnostics. It setups diagnostics facility which will give out adequate and correct error message. We skip it as it is not close relate to compilation.

 

decode_options (continue)

 

489    /* Scan to see what optimization level has been specified. That will

490      determine the default value of many flags. */

491    for (i = 1; i < argc; i++)

492    {

493      if (!strcmp (argv[i], "-O"))

494      {

495        optimize = 1;

496        optimize_size = 0;

497      }

498      else if (argv[i][0] == '-' && argv[i][1] == 'O')

499      {

500        /* Handle -Os, -O2, -O3, -O69, ... */

501        const char *p = &argv[i][2];

502 

503        if ((p[0] == 's') && (p[1] == 0))

504        {

505          optimize_size = 1;

506 

507          /* Optimizing for size forces optimize to be 2. */

508          optimize = 2;

509        }

510        else

511        {

512          const int optimize_val = read_integral_parameter (p, p - 2, -1);

513          if (optimize_val != -1)

514          {

515            optimize = optimize_val;

516            optimize_size = 0;

517          }

518        }

519      }

520    }

 

In GCC, following switches indicate level of optimization[8]

-O: The compiler attempts to reduce both code size and execution time, but not to make modifications that would cause difficulties with debugging. Turns on the options -fno_optimize_size, -fdefer_pop, -fthread_jumps, -jguess_branch_prob, -cprop-registers, and -fdelayed_branch. The -fomit_frame_pointer flag is set only if the debugger is able to work without it on this platform.

-O0: The default. Disables all optimizations. Turns off all size optimization and sets -fno-merge-constants.

-O1: The same as -O.

-O2: This level turns on all optimizations that do not involve size and speed trade-offs. In addition to the options turned on for -O, this level turns on -foptimize-sibling-calls, -fcse-follow-jumps, -fcse-skip-blocks, -fgcse, -fexpensive-optimizations, -fstrength-reduce, -frerun-loop-opt, -fschedule-insns, -fdelete-null-pointer-checks, -fschedule-insn-after-reload, -frerun-cse-after-loop, -fpeephole2, -fforce-mem, -fcaller-saves, -fstruct-aliasing, -fregmove, and -freorder-blocks. This level does no loop unrolling, inlining, nor register renaming.

-O3: In addition to the options turned on for -O2, this level turns on -finline-functions and -frename-registers.

-Os: Optimizes for size. All of the -O2 options flags are set. The -falign-loops, -falign-jumps, -falign-labels, and -falign-functions are all set to 1, which prevents any space being inserted for alignment.

 

decode_options (continue)

 

522    if (!optimize)

523    {

524      flag_merge_constants = 0;

525    }

526 

527    if (optimize >= 1)

528    {

529      flag_defer_pop = 1;

530      flag_thread_jumps = 1;

531  #ifdef DELAY_SLOTS

532      flag_delayed_branch = 1;

533  #endif

534  #ifdef CAN_DEBUG_WITHOUT_FP

535      flag_omit_frame_pointer = 1;

536  #endif

537      flag_guess_branch_prob = 1;

538      flag_cprop_registers = 1;

539      flag_loop_optimize = 1;

540      flag_if_conversion = 1;

541      flag_if_conversion2 = 1;

542    }

543 

544    if (optimize >= 2)

545    {

546      flag_crossjumping = 1;

547      flag_optimize_sibling_calls = 1;

548      flag_cse_follow_jumps = 1;

549      flag_cse_skip_blocks = 1;

550      flag_gcse = 1;

551      flag_expensive_optimizations = 1;

552      flag_strength_reduce = 1;

553      flag_rerun_cse_after_loop = 1;

554      flag_rerun_loop_opt = 1;

555      flag_caller_saves = 1;

556      flag_force_mem = 1;

557      flag_peephole2 = 1;

558  #ifdef INSN_SCHEDULING

559      flag_schedule_insns = 1;

560      flag_schedule_insns_after_reload = 1;

561  #endif

562      flag_regmove = 1;

563      flag_strict_aliasing = 1;

564      flag_delete_null_pointer_checks = 1;

565      flag_reorder_blocks = 1;

566      flag_reorder_functions = 1;

567      flag_unit_at_a_time = 1;

568    }

569 

570    if (optimize >= 3)

571    {

572      flag_inline_functions = 1;

573      flag_rename_registers = 1;

574      flag_unswitch_loops = 1;

575      flag_web = 1;

576    }

577 

578    if (optimize < 2 || optimize_size)

579    {

580      align_loops = 1;

581      align_jumps = 1;

582      align_labels = 1;

583      align_functions = 1;

584 

585      /* Don't reorder blocks when optimizing for size because extra

586        jump insns may be created; also barrier may create extra padding.

587 

588        More correctly we should have a block reordering mode that tried

589        to minimize the combined size of all the jumps. This would more

590        or less automatically remove extra jumps, but would also try to

591        use more short jumps instead of long jumps. */

592      flag_reorder_blocks = 0;

593    }

 

Above at line 531, macro DELAY_SLOTS is output by tool genattr according to the presence of define_delay pattern in machine description file. And INSN_SCHEDULING at line 558 is defined by genattr too.

Besides, there are long list of variables that we need first understand their usage.

flag_merge_constants (-fmerge-constants, -fmerge-all-constants), it will attempt to merge identical constant across constant sections, if 1 only string constants and constants from constant pool, if 2 also constant variables.

flag_defer_pop (–fdef-pop), if nonzero, the arguments that were pushed onto the stack to make a function call are not popped off immediately after the return of the function, but are allowed to accumulate along with the arguments of several function calls, and the stack is later cleared of them all.

flag_thread_jumps (-fthread-jumps), if nonzero, if the value of the conditional expression of a jump goes to a location where the values are such that another jump will also be taken, the original jump is redirected to the final destination.

flag_omit_frame_pointer (-fomit-frame-pointer), if nonzero, doesn’t store the frame pointer in a register for functions that don’t need one, thus omitting the code to store and retrieve the address as well as making another register available for general use. This flag is automatically set for all levels of -O optimization, but only if the debugger can be run without a frame pointer. If the debugger cannot be run with this setting you will have to set it explicitly. Some platforms have no frame pointer and this flag will have no effect.

flag_guess_branch_prob, if nonzero, will try to guess branch probabilities.

flag_cprop_registers, after register allocation and post-register allocation instruction splitting, we perform a copy-propagation pass to try to reduce scheduling dependencies and occasionally eliminate the copy.

flag_rename_registers, if nonzero, registers should be renamed.

flag_loop_optimize, if nonzero, means perform loop optimizer.

flag_if_conversion, if nonzero, means perform if conversion.

flag_if_conversion2, if nonzero, means perform if conversion after reload.

flag_crossjumping, if nonzero, means perform crossjumping.

flag_optimize_sibling_calls, if nonzero, allows GCC to optimize sibling and tail recursive calls.

flag_cse_follow_jumps (-fcse-follow-jumps), if nonzero, when the target of a jump cannot be reached any other way except by the jump being taken, the common subexpression elimination scan follows the path of the jump. That is, any values that exist before the jump is taken will always exist at the point of the destination of the jump and can be used there. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-cse-follow-jumps.

flag_cse_skip_blocks (-fcse-skip-blocks), if nonzero, if the body of an if statement is simple enough that it does not contain code that would disrupt the previously calculated values, the common subexpression analysis flow skips over the if statement and is applied to the statements that follow it. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-cse-skip-blocks.

flag_gcse, if nonzero, means perform global common subexpression elimination (CSE).

flag_expensive_optimizations (-fexpensive-optimizations), if nonzero, enables a few optimizations that are effective but cost in terms of compile time. For example, common subexpression elimination is run again following global common subexpression elimination. Some of the other optimizations are carried out in more depth when this flag is set. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-expensive-optimizations.

flag_strength_reduce (-fstrength-reduce), if nonzero, performs loop strength reduction and elimination variables being used inside loops. This is the process of replacing time-consuming operations, such as multiply and divide, with simpler and faster operations, such as add and subtract. This option is always set by -funroll_loops and -funroll-all-loops. It is also set by -O2, -O3, and –Os but can be overridden by -fno-strength-reduce. As a simple example, the following loop uses a temporary variable to contain a calculated index:

for(int i=0; i<10; i++) {

index = i * 2;

frammis(valarr[index]);

}

The internal variable index can be eliminated, and the multiplication can be changed to a simple shift resulting in the following:

for(int i=0; i<10; i++) {

frammis(valarr[i << 1]);

}

Shifting the loop counter one position to the left effectively doubles it, and the value is then used directly as the index on the array without being stored in a temporary variable.

flag_rerun_cse_after_loop (-frerun-cse-after-loop), if nonzero, will cause the common subexpression optimization to be applied again following loop optimizations. This is done because it is possible that loop optimization creates the presence of new subexpressions. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-rerun-cse-after-loop. This increases compilation time about 20% and picks up a few more common expressions.

flag_rerun_loop_opt (-frerun-loop-opt), if nonzero, runs the loop optimization twice. The second time does not unroll loops, but it does analyze the loops again with the instructions from the first optimization pass removed. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-rerun-loop-opt.

flag_caller_saves (-fcaller-saves), if nonzero, extra instructions are included to save registers before a function call and then restore them afterward. The registers can then be used in the function call and inside the function itself. Only registers that contain useful values are saved, and then only if it seems better to save and restore than it does to reload the value later, when it is needed again. This option is enabled by default on some machines and is always enabled by -O2, -O3, and -Os, but can be overridden by -fno-caller-saves.

flag_force_mem (-fforce-mem), if nonzero, values must be copied into registers to have arithmetic performed on them. This improves the generated code because values needed will often have been previously loaded into a register and do not need to be loaded again. This flag is set by -O2, -O3, and –Os.

flag_peephole2 (-fpeephole2), if nonzero, enables RTL peephole optimization after registers have been allocated but before scheduling. The optimization is a machine specific translation of one specific set of instructions into another. This option is platform dependent and may have no effect. There is no effect unless optimization is also specified. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-peephole2.

flag_schedule_insns (-fscedule-insns), if nonzero, on machines that have relatively slow floating point or memory access operations when compared to other operations, and on machines that support the execution of more than one instruction at a time, an attempt is made to change the order of the instructions to eliminate stalling. Other instructions are executed during the time the slower instruction is being executed. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-scedule-insns.

flag_schedule_insns_after_reload (-fscedule-insns2), if nonzero, this is the same as -fschedule_insns except that it is performed after allocation of both the global registers and the local registers for each function. This can be effective on machines with a small number of registers and relatively slow instructions to load registers. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-scedule-insns2.

flag_regmove (-foptimize-register-move, -fregmove), if nonzero, register allocation is optimized by changing the assignment of registers used in operations that move data from one memory location to another. This is especially effective on machines that have instructions that can move data directly from one memory location to another. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-optimize-register-move.

flag_strict_aliasing (-fstrict-aliasing), if nonzero, the strictest aliasing rules are applied depending on the language being compiled.With strict aliasing in C, for example, an int cannot be the alias of a double or a pointer, but it can be the alias of an unsigned int. Even with strict aliasing there is not a problem with union members as long as the references are through the union and not through a pointer to the address of a union member. The following code could cause a problem:

int *iptr;

union {

int ivalue;

double dvalue;

} migs;

. . .

migs.ivalue = 45;

iptr = &migs.ivalue;

frammis(*iptr);

migs.dvalue = 88.6;

frammis(*iptr);

In this example is possible that strict aliasing would not recognize that the value pointed to by iptr had changed between the two function calls. However, referring to the union members directly would not cause a problem.

flag_delete_null_pointer_checks (-fdelete-null-pointer-checks), if nonzero, the code that checks for an attempt to dereference a null pointer is removed if dataflow analysis indicates that the pointer cannot be null. In some environments it is possible to process the result of an attempt to dereference a null pointer, so this option should not be used in programs that rely on these checks. This flag is set by -O2, -O3, and -Os, but can be overridden by -fno-delete-null-pointer-checks.

flag_reorder_blocks, if nonzero, basic blocks should be reordered.

flag_reorder_functions, if nonzero, functions should be reordered.

flag_unit_at_a_time, if nonzero, we perform whole unit at a time compilation.

flag_inline_functions (-finline-functions), if nonzero, the compiler is allowed to select certain simple functions to be expanded in line at the point of the function call. If the function is declared in such a way that all calls to it are known (for example, a static function in a C source file cannot be addressed from outside the file) the body of the function is omitted because it is never actually called. This option is automatically turned on by -O3 unless the -fno-inline-functions flag is specified.

flag_unswitch_loops, if nonzero, enables loop unswitching.

flag_web, if nonzero, means performs web construction pass.

align_loops (-falign-loop[=number]), aligns the top of loops to a boundary that is a power of 2 equal to or greater than number, but only if it is not necessary to skip more than number bytes to do it. For example, if number is 20, the resulting alignment is on a 32-byte boundary as long as no more than 20 bytes must be skipped to place it there. This option could make code larger because of the insertion of dummy instructions to bring about alignment, but, depending on the machine, the loop could execute faster because of branching to an aligned location from the bottom of each iteration. If number is not specified, the machine default is used, which is normally 1. Specifying number as 1 is equivalent to -fno-align-loops, and no alignment takes place.

align_jumps (–falign-jumps[=number]), aligns branch targets that cannot be reached any other way to a boundary that is a power of 2 equal to or greater than number, but only if it is not necessary to skip no more than number bytes to do it. For example, if number is 20 the resulting alignment is on a 32-byte boundary as long as no more than 20 bytes must be skipped to place it there. Unlike the similar option -falign-labels, this option does not require the insertion of dummy instructions before the branch target. If number is not specified the machine default is used, which is normally 1. Specifying the number as 1 is equivalent to -fno-align-jumps and no alignment takes place.

align_labels (–align-labels[=number]), aligns the targets of all branches to a boundary that is a power of 2 equal to or greater than number, but only if it is not necessary to skip no more than number bytes to do it. For example, if number is 20, the resulting alignment is on a 32-byte boundary as long as no more than 20 bytes must be skipped to place it there. This option can make code slower and larger because of the insertion of dummy instructions before the branch target. For a similar, but cheaper, version of this option see -falign-jumps. If -falign-loops or -falign-jumps is used, with a greater value than number, the greater value is used here. If number is not specified, the machine default is used, which is normally 1. Specifying number as 1 is equivalent to -fno-align-labels and no alignment takes place.

align_functions (–align-functions[=number]), aligns the starting address of functions on a boundary that is a power of 2 equal to or greater than number, but only if it is necessary to skip no more than number bytes to do it. For example, if number is 20, the resulting alignment is on a 32 byte boundary as long as no more than 20 bytes must be skipped to place it there. Setting number to a power of 2 causes all functions to be aligned to the boundary. If the number is not specified the machine default is used. For some machines the number is rounded up to a power of 2 thus aligning all functions. Specifying number as 1 is equivalent to -fno-align-functions and no alignment will take place.

 

decode_options (continue)

 

595    /* Initialize whether `char' is signed. */

596    flag_signed_char = DEFAULT_SIGNED_CHAR;

597  #ifdef DEFAULT_SHORT_ENUMS

598    /* Initialize how much space enums occupy, by default. */

599    flag_short_enums = DEFAULT_SHORT_ENUMS;

600  #endif

601 

602    /* Initialize target_flags before OPTIMIZATION_OPTIONS so the latter can

603      modify it. */

604    target_flags = 0;

605    set_target_switch ("");

606 

607    /* Unwind tables are always present in an ABI-conformant IA-64

608      object file, so the default should be ON. */

609  #ifdef IA64_UNWIND_INFO

610    flag_unwind_tables = IA64_UNWIND_INFO;

611  #endif

612 

613  #ifdef OPTIMIZATION_OPTIONS

614    /* Allow default optimizations to be specified on a per-machine basis. */

615    OPTIMIZATION_OPTIONS (optimize, optimize_size);

616  #endif

 

At line 596, DEFAULT_SIGNED_CHAR is defined as 1 if `char' should by default be signed; else as 0. And DEFAULT_SHORT_ENUMS is only defined for DSP1600 chip.

 

你可能感兴趣的:(branch,alignment,optimization,loops,Allocation,Constants)