GCC-3.4.6源代码学习笔记(46)

4.2.6. 准备循环优化遍

4.2.6.1.    概览

回到backend_init,下一个调用init_loop。这个函数准备那些用于循环优化中的变量。这个优化把常量表达式移出循环体,并且可选地执行强度降低(strength-reduction,期间昂贵的操作被等效但更廉价的操作所替代)及循环展开(loop unrolling)。这个循环优化找出在循环中不依循环改变的计算(循环不变量)并将它们移到循环体外。然后识别基本及普通的归纳变量(induction variable)。

基本归纳变量(BIV)是伪寄存器,它们在循环内通过增加或减少其值而设置。普通归纳变量(GIV)也是伪寄存器,其值是基本归纳变量的线性函数。BIVbasic_induction_var识别,GIV则是由general_induction_var

一旦归纳变量被识别,在普通归纳变量上会应用强度降低优化,而在基本归纳变量上则应用归纳变量的消除。

有时在循环中,一个寄存器被设置为一个0扩展的更窄(narrower)的值,将其变为在循环前把整个寄存器设为0,而在循环中仅拷贝低位非0部分。而其中的复杂性集中在启发式地决定何时值得这样做。

4.2.6.2.    初始化

概览中已暗示了,在执行循环优化遍之前,需要收集通过寄存器取址及在寄存器间拷贝的代价。这些工作由init_loop完成。

 

401    void

402    init_loop (void)                                                                                       in loop.c

403    {

404      rtx reg = gen_rtx_REG (word_mode, LAST_VIRTUAL_REGISTER + 1);

405   

406      reg_address_cost = address_cost (reg, SImode);

407   

408      copy_cost = COSTS_N_INSNS (1);

409    }

 

FIRST_PSEUDO_REGISTER开始,它是虚拟寄存器,没有对应的物理寄存器。虚拟寄存器在生成RTL时使用,当真实位置在RTL生成完成之前未知,虚拟寄存器用于引用栈框内的位置。例程instantiate_virtual_regs用合适的值替代这些虚拟寄存器,这些值通常是 [frame|arg|stack]_pointer_rtx加上常量。虚拟寄存器有4个。在虚拟寄存器后面的寄存器是无足轻重的。

这里使用这些无足轻重的寄存器,因为我们只是评估使用寄存器取址的代价。这个rtx对象从GC管理的内存中分配,并且在从该函数退出后被回收。

 

889    int

890    address_cost (rtx x, enum machine_mode mode)                                                in cse.c

891    {

892      /* The address_cost target hook does not deal with ADDRESSOF nodes. But,

893        during CSE, such nodes are present. Using an ADDRESSOF node which

894        refers to the address of a REG is a good thing because we can then

895        turn (MEM (ADDRESSSOF (REG))) into just plain REG.  */

896   

897      if (GET_CODE (x) == ADDRESSOF && REG_P (XEXP ((x), 0)))

898        return -1;

899   

900      /* We may be asked for cost of various unusual addresses, such as operands

901        of push instruction. It is not worthwhile to complicate writing

902        of the target hook by such cases.  */

903   

904      if (!memory_address_p (mode, x))

905        return 1000;

906   

907      return (*targetm.address_cost) (x);

908    }

 

address_cost中的897行,REG_P检查rtx对象是否表示寄存器。符合该行条件的x代表了寄存器的地址。

 

272    #define REG_P(X) (GET_CODE (X) == REG)                                                        in rtl.h

 

对于不代表寄存器地址的x对象,不满足897行的条件。因此执行memory_address_p来评估寄存器或内存的取址代价。它返回1如果addr,对于mode模式,是一个有效的地址。

 

1303 int                                                                                                           in recog.c

1304 memory_address_p (enum machine_mode mode ATTRIBUTE_UNUSED, rtx addr)

1305 {

1306   if (GET_CODE (addr) == ADDRESSOF)

1307     return 1;

1308

1309   GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);

1310   return 0;

1311

1312  win:

1313   return 1;

1314 }

 

寄存器及内存的取址,显然是与目标平台相关的。同样在1306行,如果addr代表了寄存器的地址,毫无疑问,这是有效的地址。否则对于x86机器,上面的1309行有如下的定义。

 

2019 #define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR)       /                     in i386.h

2020 do {                                                               /

2021   if (legitimate_address_p ((MODE), (X), 0))                        /

2022     goto ADDR;                                                   /

2023 } while (0)

 

6033 int

6034 legitimate_address_p (enum machine_mode mode, rtx addr, int strict)                in i386.c

6035 {

6036   struct ix86_address parts;

6037   rtx base, index, disp;

6038   HOST_WIDE_INT scale;

6039   const char *reason = NULL;

6040   rtx reason_rtx = NULL_RTX;

6041

6042   if (TARGET_DEBUG_ADDR)

6043   {

6044     fprintf (stderr,

6045        "/n======/nGO_IF_LEGITIMATE_ADDRESS, mode = %s, strict = %d/n",

6046        GET_MODE_NAME (mode), strict);

6047     debug_rtx (addr);

6048   }

6049

6050   if (ix86_decompose_address (addr, &parts) <= 0)

6051   {

6052     reason = "decomposition failed";

6053     goto report_error;

6054   }

 

结构体ix86_address包含了,在x86机器中,对于循环控制变量的所有取址方法的各个部分。已知地址可以通过表达式来计算:base + scale*index + disp,对于在线程局部储存(TLS)中的对象,寄存器FSGS被用作段选择器,它们为下面849行的seg所描述。

 

845    struct ix86_address                                                                                          in i386.c

846    {

847      rtx base, index, disp;

848      HOST_WIDE_INT scale;

849      enum ix86_address_seg { SEG_DEFAULT, SEG_FS, SEG_GS } seg;

850    };

 

ix86_decompose_address将依据上面的表达式分解给定的地址并填充对应的ix86_address对象。

 

5566 static int

5567 ix86_decompose_address (rtx addr, struct ix86_address *out)                               in i386.c

5568 {

5569   rtx base = NULL_RTX;

5570   rtx index = NULL_RTX;

5571   rtx disp = NULL_RTX;

5572   HOST_WIDE_INT scale = 1;

5573   rtx scale_rtx = NULL_RTX;

5574   int retval = 1;

5575   enum ix86_address_seg seg = SEG_DEFAULT;

5576

5577   if (GET_CODE (addr) == REG || GET_CODE (addr) == SUBREG)

5578     base = addr;

5579   else if …

 

上面看到,在现在的场景下,rtx表达式addrrtx码是REG,因此5579行以下的代码被跳过。对于寄存器,其地址的基址,无疑就是这个寄存器的地址。在该点上,我们有: scale_rtx = nullindex = nulldisp = nullscale = 1

 

ix86_decompose_address (continue)

 

5706   out->base = base;

5707   out->index = index;

5708   out->disp = disp;

5709   out->scale = scale;

5710   out->seg = seg;

5711

5712   return retval;

5713 }

 

ix86_decompose_address返回,parts包含如下数据: parts.base = addrrts.index = NULL_RTXparts.disp = NULL_RTXparts.scale = 1parts.seg = SEG_DEFAULT

 

legitimate_address_p (continue)

 

6056   base = parts.base;

6057   index = parts.index;

6058   disp = parts.disp;

6059   scale = parts.scale;

6060

6061   /* Validate base register.

6062

6063     Don't allow SUBREG's here, it can lead to spill failures when the base

6064     is one word out of a two word structure, which is represented internally

6065     as a DImode int.  */

6066

6067   if (base)

6068   {

6069     reason_rtx = base;

6070

6071     if (GET_CODE (base) != REG)

6072     {

6073       reason = "base is not a register";

6074       goto report_error;

6075     }

6076

6077     if (GET_MODE (base) != Pmode)

6078     {

6079       reason = "base is not in Pmode";

6080       goto report_error;

6081     }

6082

6083     if ((strict && ! REG_OK_FOR_BASE_STRICT_P (base))

6084       || (! strict && ! REG_OK_FOR_BASE_NONSTRICT_P (base)))

6085     {

6086       reason = "base is not valid";

6087       goto report_error;

6088     }

6089   }

6232   /* Everything looks valid.  */

6233   if (TARGET_DEBUG_ADDR)

6234     fprintf (stderr, "Success./n");

6235   return TRUE;

        ...

6244 }

 

上面的base指向word模式的REGrtx表达式 —— 在所有支持的机器上这个模式是Pmode的别名。当调用legitimate_address_p这里传入参数strict0因此在6084验证base的寄存器类别作为基地址是合适的。

 

1970 #define REG_OK_FOR_BASE_NONSTRICT_P(X)                              /     in i386.h

1971   (REGNO (X) <= STACK_POINTER_REGNUM                              /

1972    || REGNO (X) == ARG_POINTER_REGNUM                               /

1973    || REGNO (X) == FRAME_POINTER_REGNUM                          /

1974    || (REGNO (X) >= FIRST_REX_INT_REG                             /

1975        && REGNO (X) <= LAST_REX_INT_REG)                          /

1976    || REGNO (X) >= FIRST_PSEUDO_REGISTER)

 

对于这个不严格的版本arg指针寄存器和frame指针寄存器用作base是允许的(在严格版本中,这是禁止的,而在x86机器上,这2种寄存器不存在,将使用栈来代替)。

因为legitimate_address_p返回truememory_address_p返回1address_cost,特定于目标机器的address_cost将被调用。对于x86机器毫无疑问它是ix86_address_cost。在x86机器上,复杂地址更受欢迎,它可以让GCC将简单地址合成起来,从而更有效率。因此在5730 ~ 5733行,对这样的情况给予了优待。同样的保存在伪寄存器中的部分将受到惩罚,因为它们会导致效率降低。而对于这里我们的情况,代价将是1

 

5720 static int

5721 ix86_address_cost (rtx x)                                                                                  in i386.c

5722 {

5723   struct ix86_address parts;

5724   int cost = 1;

5725

5726   if (!ix86_decompose_address (x, &parts))

5727     abort ();

5728

5729   /* More complex memory references are better.  */

5730   if (parts.disp && parts.disp != const0_rtx)

5731     cost--;

5732   if (parts.seg != SEG_DEFAULT)

5733     cost--;

5734

5735   /* Attempt to minimize number of registers in the address.  */

5736   if ((parts.base

5737      && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER))

5738     || (parts.index

5739     && (!REG_P (parts.index)

5740         || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER)))

5741       cost++;

5742  

5743     if (parts.base

5744        && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER)

5745        && parts.index

5746        && (!REG_P (parts.index) || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER)

5747        && parts.base != parts.index)

5748       cost++;

5749  

5750     /* AMD-K6 don't like addresses with ModR/M set to 00_xxx_100b,

5751       since it's predecode logic can't detect the length of instructions

5752       and it degenerates to vector decoded. Increase cost of such

5753       addresses here. The penalty is minimally 2 cycles. It may be worthwhile

5754       to split such addresses or even refuse such addresses at all.

5755  

5756       Following addressing modes are affected:

5757        [base+scale*index]

5758        [scale*index+disp]

5759        [base+index]

5760  

5761       The first and last case  may be avoidable by explicitly coding the zero in

5762       memory address, but I don't have AMD-K6 machine handy to check this

5763       theory.  */

5764  

5765     if (TARGET_K6

5766         && ((!parts.disp && parts.base && parts.index && parts.scale != 1)

5767          || (parts.disp && !parts.base && parts.index && parts.scale != 1)

5768          || (!parts.disp && parts.base && parts.index && parts.scale == 1)))

5769       cost += 10;

5770  

5771     return cost;

5772   }

 

回到init_loopreg_address_cost得到值1。那么我们需要确定copy_cost的值。它由COSTS_N_INSNS给出。对于非ARM系统,COSTS_N_INSNS被定义为如下,它表明其代价等同于快速寄存器-寄存器的指令的N倍。

 

1995 #define COSTS_N_INSNS(N) ((N) * 4)                                                            in rtl.h

 

你可能感兴趣的:(优化,struct,report,null,hook,structure)