Studying note of GCC-3.4.6 source (46)

4.2.6. Prepare for loop optimization pass

4.2.6.1.    Overview

Back to backend_init, next calls init_loop. This function prepares some variables for using in loop optimization which moves constant expressions out of loops, and optionally does strength- reduction (where expensive operations are replaced with equivalent but less expensive operations and loop unrolling as well. The loop finds invariant computations within loops and moves them to the beginning of the loop. Then it identifies basic and general induction variables.

Basic induction variables (BIVs) are pseudo registers which are set within a loop only by incrementing or decrementing its value. General induction variables (GIVs) are pseudo registers with a value which is a linear function of a basic induction variable. BIVs are recognized by basic_induction_var; GIVs by general_induction_var.

Once induction variables are identified, strength reduction is applied to the general induction variables, and induction variable elimination is applied to the basic induction variables.

It also finds cases where a register is set within the loop by zero-extending a narrower value and changes these to zero the entire register once before the loop and merely copy the low part within the loop. Most of the complexity is in heuristics to decide when it is worth while to do these things.

4.2.6.2.    Initialization

As overview reviews, before the loop optimization pass, it needs collect cost of addressing with register and the cost of copy between registers. It is done by init_loop.

 

401    void

402    init_loop (void)                                                                                       in loop.c

403    {

404      rtx reg = gen_rtx_REG (word_mode, LAST_VIRTUAL_REGISTER + 1);

405   

406      reg_address_cost = address_cost (reg, SImode);

407   

408      copy_cost = COSTS_N_INSNS (1);

409    }

 

Begin at FIRST_PSEUDO_REGISTER, it is the virtual registers which are sure not having corresponding hard registers. Virtual registers are used during RTL generation to refer to locations into the stack frame when the actual location isn't known until RTL generation is complete. The routine instantiate_virtual_regs replaces these with the proper value, which is normally [frame|arg|stack]_pointer_rtx plus a constant. The number of virtual registers is 4. After virtual registers, the registers are trivial.

Here using trivial register is because we only just evaluate the cost of using register for addressing. This rtx object is created in memory managed by GC and it is abandoned when exits from the function.

 

889    int

890    address_cost (rtx x, enum machine_mode mode)                                                in cse.c

891    {

892      /* The address_cost target hook does not deal with ADDRESSOF nodes. But,

893        during CSE, such nodes are present. Using an ADDRESSOF node which

894        refers to the address of a REG is a good thing because we can then

895        turn (MEM (ADDRESSSOF (REG))) into just plain REG.  */

896   

897      if (GET_CODE (x) == ADDRESSOF && REG_P (XEXP ((x), 0)))

898        return -1;

899   

900      /* We may be asked for cost of various unusual addresses, such as operands

901        of push instruction. It is not worthwhile to complicate writing

902        of the target hook by such cases.  */

903   

904      if (!memory_address_p (mode, x))

905        return 1000;

906   

907      return (*targetm.address_cost) (x);

908    }

 

 

At line 897, in address_cost, REG_P checks if rtx object represents register or not. The x satisfies the condition representing the address of register.

 

272    #define REG_P(X) (GET_CODE (X) == REG)                                                        in rtl.h

 

For x not representing register address, statement at line 897 is not true. So memory_address_p is executed to evaluate the cost for addressing registers or memory. It returns 1 if addr is a valid memory address for mode mode.

 

1303 int                                                                                                           in recog.c

1304 memory_address_p (enum machine_mode mode ATTRIBUTE_UNUSED, rtx addr)

1305 {

1306   if (GET_CODE (addr) == ADDRESSOF)

1307     return 1;

1308

1309   GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);

1310   return 0;

1311

1312  win:

1313   return 1;

1314 }

 

The cost of addressing registers and memory, of course, is target related. Again if addr stands for a register address as line 1306 indicating, it is no doubt a valid address. Otherwise, for x86 machine, we can get following for line 1309.

 

2019 #define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR)       /                     in i386.h

2020 do {                                                               /

2021   if (legitimate_address_p ((MODE), (X), 0))                        /

2022     goto ADDR;                                                   /

2023 } while (0)

 

6033 int

6034 legitimate_address_p (enum machine_mode mode, rtx addr, int strict)                in i386.c

6035 {

6036   struct ix86_address parts;

6037   rtx base, index, disp;

6038   HOST_WIDE_INT scale;

6039   const char *reason = NULL;

6040   rtx reason_rtx = NULL_RTX;

6041

6042   if (TARGET_DEBUG_ADDR)

6043   {

6044     fprintf (stderr,

6045        "/n======/nGO_IF_LEGITIMATE_ADDRESS, mode = %s, strict = %d/n",

6046        GET_MODE_NAME (mode), strict);

6047     debug_rtx (addr);

6048   }

6049

6050   if (ix86_decompose_address (addr, &parts) <= 0)

6051   {

6052     reason = "decomposition failed";

6053     goto report_error;

6054   }

 

struct ix86_address contains parts of addressing methods for loop controlling variable in x86 machine. As we know the address will be calculated by the expression: base + scale*index + disp, and for objects within thread-local storage (TLS), registers FS or GS are needed as segment selector which is described by seg at line 849 below.

 

845    struct ix86_address                                                                                          in i386.c

846    {

847      rtx base, index, disp;

848      HOST_WIDE_INT scale;

849      enum ix86_address_seg { SEG_DEFAULT, SEG_FS, SEG_GS } seg;

850    };

 

While ix86_decompose_address will decompose the specified address according to the expression above and fill out the ix86_address.

 

5566 static int

5567 ix86_decompose_address (rtx addr, struct ix86_address *out)                               in i386.c

5568 {

5569   rtx base = NULL_RTX;

5570   rtx index = NULL_RTX;

5571   rtx disp = NULL_RTX;

5572   HOST_WIDE_INT scale = 1;

5573   rtx scale_rtx = NULL_RTX;

5574   int retval = 1;

5575   enum ix86_address_seg seg = SEG_DEFAULT;

5576

5577   if (GET_CODE (addr) == REG || GET_CODE (addr) == SUBREG)

5578     base = addr;

5579   else if …

 

As we mention above, the code part of rtx expression of addr is REG, so lines below 5579 are skipped. For register, the base of address, of course, is the address of this register. At this point, we get: scale_rtx = null, index = null, disp = null, scale = 1.

 

ix86_decompose_address (continue)

 

5706   out->base = base;

5707   out->index = index;

5708   out->disp = disp;

5709   out->scale = scale;

5710   out->seg = seg;

5711

5712   return retval;

5713 }

 

Return from ix86_decompose_address, parts contains following data: parts.base = addr, parts.index = NULL_RTX, parts.disp = NULL_RTX, parts.scale = 1, parts.seg = SEG_DEFAULT.

 

legitimate_address_p (continue)

 

6056   base = parts.base;

6057   index = parts.index;

6058   disp = parts.disp;

6059   scale = parts.scale;

6060

6061   /* Validate base register.

6062

6063     Don't allow SUBREG's here, it can lead to spill failures when the base

6064     is one word out of a two word structure, which is represented internally

6065     as a DImode int.  */

6066

6067   if (base)

6068   {

6069     reason_rtx = base;

6070

6071     if (GET_CODE (base) != REG)

6072     {

6073       reason = "base is not a register";

6074       goto report_error;

6075     }

6076

6077     if (GET_MODE (base) != Pmode)

6078     {

6079       reason = "base is not in Pmode";

6080       goto report_error;

6081     }

6082

6083     if ((strict && ! REG_OK_FOR_BASE_STRICT_P (base))

6084       || (! strict && ! REG_OK_FOR_BASE_NONSTRICT_P (base)))

6085     {

6086       reason = "base is not valid";

6087       goto report_error;

6088     }

6089   }

6232   /* Everything looks valid.  */

6233   if (TARGET_DEBUG_ADDR)

6234     fprintf (stderr, "Success./n");

6235   return TRUE;

        ...

6244 }

 

The base above points to the rtx expression of REG of word mode – the mode is alias for Pmode for all supported machines. When invoking legitimate_address_p, here we pass parameter strict with 0, so at line 6084, it needs verify that the register class of base is OK for the purpose as base address.

 

1970 #define REG_OK_FOR_BASE_NONSTRICT_P(X)                              /     in i386.h

1971   (REGNO (X) <= STACK_POINTER_REGNUM                              /

1972    || REGNO (X) == ARG_POINTER_REGNUM                               /

1973    || REGNO (X) == FRAME_POINTER_REGNUM                          /

1974    || (REGNO (X) >= FIRST_REX_INT_REG                             /

1975        && REGNO (X) <= LAST_REX_INT_REG)                          /

1976    || REGNO (X) >= FIRST_PSEUDO_REGISTER)

 

For this non-strict version, assigning base with arg pointer and frame pointer is allowed (which is prohibited in strict version, and in x86 machine these two type of registers do not exist, the machine uses stack instead of these registers).

As legitimate_address_p returns true, memory_address_p returns 1 to address_cost, target specified address_cost will be invoked. For x86 machine, of course, it is ix86_address_cost. In x86 machine, complex address is prefered to let GCC to combine simple ones tegother to get the result, it is more efficient. So line 5730 ~ 5733 gives advantage for the case. Similarly parts saved in pseudo registers will be penalized as they are less efficient. For the address we give here, the cost will be 1.

 

5720 static int

5721 ix86_address_cost (rtx x)                                                                                  in i386.c

5722 {

5723   struct ix86_address parts;

5724   int cost = 1;

5725

5726   if (!ix86_decompose_address (x, &parts))

5727     abort ();

5728

5729   /* More complex memory references are better.  */

5730   if (parts.disp && parts.disp != const0_rtx)

5731     cost--;

5732   if (parts.seg != SEG_DEFAULT)

5733     cost--;

5734

5735   /* Attempt to minimize number of registers in the address.  */

5736   if ((parts.base

5737      && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER))

5738     || (parts.index

5739     && (!REG_P (parts.index)

5740         || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER)))

5741       cost++;

5742  

5743     if (parts.base

5744        && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER)

5745        && parts.index

5746        && (!REG_P (parts.index) || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER)

5747        && parts.base != parts.index)

5748       cost++;

5749  

5750     /* AMD-K6 don't like addresses with ModR/M set to 00_xxx_100b,

5751       since it's predecode logic can't detect the length of instructions

5752       and it degenerates to vector decoded. Increase cost of such

5753       addresses here. The penalty is minimally 2 cycles. It may be worthwhile

5754       to split such addresses or even refuse such addresses at all.

5755  

5756       Following addressing modes are affected:

5757        [base+scale*index]

5758        [scale*index+disp]

5759        [base+index]

5760  

5761       The first and last case  may be avoidable by explicitly coding the zero in

5762       memory address, but I don't have AMD-K6 machine handy to check this

5763       theory.  */

5764  

5765     if (TARGET_K6

5766         && ((!parts.disp && parts.base && parts.index && parts.scale != 1)

5767         || (parts.disp && !parts.base && parts.index && parts.scale != 1)

5768         || (!parts.disp && parts.base && parts.index && parts.scale == 1)))

5769       cost += 10;

5770  

5771     return cost;

5772   }

 

Back to init_loop, reg_address_cost gets value of 1. Then we need to decide the value of copy_cost. It is done via COSTS_N_INSNS. For not ARM system, COSTS_N_INSNS is defined in below, it specifies that the cost equal to N times that of a fast register-to-register instruction.

 

1995 #define COSTS_N_INSNS(N) ((N) * 4)                                                            in rtl.h

 

你可能感兴趣的:(Studying note of GCC-3.4.6 source (46))