Back to backend_init, next calls init_loop. This function prepares some variables for using in loop optimization which moves constant expressions out of loops, and optionally does strength- reduction (where expensive operations are replaced with equivalent but less expensive operations) and loop unrolling as well. The loop finds invariant computations within loops and moves them to the beginning of the loop. Then it identifies basic and general induction variables.
Basic induction variables (BIVs) are pseudo registers which are set within a loop only by incrementing or decrementing its value. General induction variables (GIVs) are pseudo registers with a value which is a linear function of a basic induction variable. BIVs are recognized by basic_induction_var; GIVs by general_induction_var.
Once induction variables are identified, strength reduction is applied to the general induction variables, and induction variable elimination is applied to the basic induction variables.
It also finds cases where a register is set within the loop by zero-extending a narrower value and changes these to zero the entire register once before the loop and merely copy the low part within the loop. Most of the complexity is in heuristics to decide when it is worth while to do these things.
As overview reviews, before the loop optimization pass, it needs collect cost of addressing with register and the cost of copy between registers. It is done by init_loop.
401 void
402 init_loop (void) in loop.c
403 {
404 rtx reg = gen_rtx_REG (word_mode, LAST_VIRTUAL_REGISTER + 1);
405
406 reg_address_cost = address_cost (reg, SImode);
407
408 copy_cost = COSTS_N_INSNS (1);
409 }
Begin at FIRST_PSEUDO_REGISTER, it is the virtual registers which are sure not having corresponding hard registers. Virtual registers are used during RTL generation to refer to locations into the stack frame when the actual location isn't known until RTL generation is complete. The routine instantiate_virtual_regs replaces these with the proper value, which is normally [frame|arg|stack]_pointer_rtx plus a constant. The number of virtual registers is 4. After virtual registers, the registers are trivial.
Here using trivial register is because we only just evaluate the cost of using register for addressing. This rtx object is created in memory managed by GC and it is abandoned when exits from the function.
889 int
890 address_cost (rtx x, enum machine_mode mode) in cse.c
891 {
892 /* The address_cost target hook does not deal with ADDRESSOF nodes. But,
893 during CSE, such nodes are present. Using an ADDRESSOF node which
894 refers to the address of a REG is a good thing because we can then
895 turn (MEM (ADDRESSSOF (REG))) into just plain REG. */
896
897 if (GET_CODE (x) == ADDRESSOF && REG_P (XEXP ((x), 0)))
898 return -1;
899
900 /* We may be asked for cost of various unusual addresses, such as operands
901 of push instruction. It is not worthwhile to complicate writing
902 of the target hook by such cases. */
903
904 if (!memory_address_p (mode, x))
905 return 1000;
906
907 return (*targetm.address_cost) (x);
908 }
At line 897, in address_cost, REG_P checks if rtx object represents register or not. The x satisfies the condition representing the address of register.
272 #define REG_P(X) (GET_CODE (X) == REG) in rtl.h
For x not representing register address, statement at line 897 is not true. So memory_address_p is executed to evaluate the cost for addressing registers or memory. It returns 1 if addr is a valid memory address for mode mode.
1303 int in recog.c
1304 memory_address_p (enum machine_mode mode ATTRIBUTE_UNUSED, rtx addr)
1305 {
1306 if (GET_CODE (addr) == ADDRESSOF)
1307 return 1;
1308
1309 GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);
1310 return 0;
1311
1312 win:
1313 return 1;
1314 }
The cost of addressing registers and memory, of course, is target related. Again if addr stands for a register address as line 1306 indicating, it is no doubt a valid address. Otherwise, for x86 machine, we can get following for line 1309.
2019 #define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR) / in i386.h
2020 do { /
2021 if (legitimate_address_p ((MODE), (X), 0)) /
2022 goto ADDR; /
2023 } while (0)
6033 int
6034 legitimate_address_p (enum machine_mode mode, rtx addr, int strict) in i386.c
6035 {
6036 struct ix86_address parts;
6037 rtx base, index, disp;
6038 HOST_WIDE_INT scale;
6039 const char *reason = NULL;
6040 rtx reason_rtx = NULL_RTX;
6041
6042 if (TARGET_DEBUG_ADDR)
6043 {
6044 fprintf (stderr,
6045 "/n======/nGO_IF_LEGITIMATE_ADDRESS, mode = %s, strict = %d/n",
6046 GET_MODE_NAME (mode), strict);
6047 debug_rtx (addr);
6048 }
6049
6050 if (ix86_decompose_address (addr, &parts) <= 0)
6051 {
6052 reason = "decomposition failed";
6053 goto report_error;
6054 }
struct ix86_address contains parts of addressing methods for loop controlling variable in x86 machine. As we know the address will be calculated by the expression: base + scale*index + disp, and for objects within thread-local storage (TLS), registers FS or GS are needed as segment selector which is described by seg at line 849 below.
845 struct ix86_address in i386.c
846 {
847 rtx base, index, disp;
848 HOST_WIDE_INT scale;
849 enum ix86_address_seg { SEG_DEFAULT, SEG_FS, SEG_GS } seg;
850 };
While ix86_decompose_address will decompose the specified address according to the expression above and fill out the ix86_address.
5566 static int
5567 ix86_decompose_address (rtx addr, struct ix86_address *out) in i386.c
5568 {
5569 rtx base = NULL_RTX;
5570 rtx index = NULL_RTX;
5571 rtx disp = NULL_RTX;
5572 HOST_WIDE_INT scale = 1;
5573 rtx scale_rtx = NULL_RTX;
5574 int retval = 1;
5575 enum ix86_address_seg seg = SEG_DEFAULT;
5576
5577 if (GET_CODE (addr) == REG || GET_CODE (addr) == SUBREG)
5578 base = addr;
5579 else if …
As we mention above, the code part of rtx expression of addr is REG, so lines below 5579 are skipped. For register, the base of address, of course, is the address of this register. At this point, we get: scale_rtx = null, index = null, disp = null, scale = 1.
ix86_decompose_address (continue)
5706 out->base = base;
5707 out->index = index;
5708 out->disp = disp;
5709 out->scale = scale;
5710 out->seg = seg;
5711
5712 return retval;
5713 }
Return from ix86_decompose_address, parts contains following data: parts.base = addr, parts.index = NULL_RTX, parts.disp = NULL_RTX, parts.scale = 1, parts.seg = SEG_DEFAULT.
legitimate_address_p (continue)
6056 base = parts.base;
6057 index = parts.index;
6058 disp = parts.disp;
6059 scale = parts.scale;
6060
6061 /* Validate base register.
6062
6063 Don't allow SUBREG's here, it can lead to spill failures when the base
6064 is one word out of a two word structure, which is represented internally
6065 as a DImode int. */
6066
6067 if (base)
6068 {
6069 reason_rtx = base;
6070
6071 if (GET_CODE (base) != REG)
6072 {
6073 reason = "base is not a register";
6074 goto report_error;
6075 }
6076
6077 if (GET_MODE (base) != Pmode)
6078 {
6079 reason = "base is not in Pmode";
6080 goto report_error;
6081 }
6082
6083 if ((strict && ! REG_OK_FOR_BASE_STRICT_P (base))
6084 || (! strict && ! REG_OK_FOR_BASE_NONSTRICT_P (base)))
6085 {
6086 reason = "base is not valid";
6087 goto report_error;
6088 }
6089 }
…
6232 /* Everything looks valid. */
6233 if (TARGET_DEBUG_ADDR)
6234 fprintf (stderr, "Success./n");
6235 return TRUE;
...
6244 }
The base above points to the rtx expression of REG of word mode – the mode is alias for Pmode for all supported machines. When invoking legitimate_address_p, here we pass parameter strict with 0, so at line 6084, it needs verify that the register class of base is OK for the purpose as base address.
1970 #define REG_OK_FOR_BASE_NONSTRICT_P(X) / in i386.h
1971 (REGNO (X) <= STACK_POINTER_REGNUM /
1972 || REGNO (X) == ARG_POINTER_REGNUM /
1973 || REGNO (X) == FRAME_POINTER_REGNUM /
1974 || (REGNO (X) >= FIRST_REX_INT_REG /
1975 && REGNO (X) <= LAST_REX_INT_REG) /
1976 || REGNO (X) >= FIRST_PSEUDO_REGISTER)
For this non-strict version, assigning base with arg pointer and frame pointer is allowed (which is prohibited in strict version, and in x86 machine these two type of registers do not exist, the machine uses stack instead of these registers).
As legitimate_address_p returns true, memory_address_p returns 1 to address_cost, target specified address_cost will be invoked. For x86 machine, of course, it is ix86_address_cost. In x86 machine, complex address is prefered to let GCC to combine simple ones tegother to get the result, it is more efficient. So line 5730 ~ 5733 gives advantage for the case. Similarly parts saved in pseudo registers will be penalized as they are less efficient. For the address we give here, the cost will be 1.
5720 static int
5721 ix86_address_cost (rtx x) in i386.c
5722 {
5723 struct ix86_address parts;
5724 int cost = 1;
5725
5726 if (!ix86_decompose_address (x, &parts))
5727 abort ();
5728
5729 /* More complex memory references are better. */
5730 if (parts.disp && parts.disp != const0_rtx)
5731 cost--;
5732 if (parts.seg != SEG_DEFAULT)
5733 cost--;
5734
5735 /* Attempt to minimize number of registers in the address. */
5736 if ((parts.base
5737 && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER))
5738 || (parts.index
5739 && (!REG_P (parts.index)
5740 || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER)))
5741 cost++;
5742
5743 if (parts.base
5744 && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER)
5745 && parts.index
5746 && (!REG_P (parts.index) || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER)
5747 && parts.base != parts.index)
5748 cost++;
5749
5750 /* AMD-K6 don't like addresses with ModR/M set to 00_xxx_100b,
5751 since it's predecode logic can't detect the length of instructions
5752 and it degenerates to vector decoded. Increase cost of such
5753 addresses here. The penalty is minimally 2 cycles. It may be worthwhile
5754 to split such addresses or even refuse such addresses at all.
5755
5756 Following addressing modes are affected:
5757 [base+scale*index]
5758 [scale*index+disp]
5759 [base+index]
5760
5761 The first and last case may be avoidable by explicitly coding the zero in
5762 memory address, but I don't have AMD-K6 machine handy to check this
5763 theory. */
5764
5765 if (TARGET_K6
5766 && ((!parts.disp && parts.base && parts.index && parts.scale != 1)
5767 || (parts.disp && !parts.base && parts.index && parts.scale != 1)
5768 || (!parts.disp && parts.base && parts.index && parts.scale == 1)))
5769 cost += 10;
5770
5771 return cost;
5772 }
Back to init_loop, reg_address_cost gets value of 1. Then we need to decide the value of copy_cost. It is done via COSTS_N_INSNS. For not ARM system, COSTS_N_INSNS is defined in below, it specifies that the cost equal to N times that of a fast register-to-register instruction.
1995 #define COSTS_N_INSNS(N) ((N) * 4) in rtl.h