To make GCC porting to other machines (architecture) in most efficient and convient way, GCC needs machine description file (MD file) for chips. To describe a chip, series definitions called pattern are introduced. Generally, we need describe the chip from two ways.
First is the instruction set defined in form of rtl– there include what is the instructions look alike (define_insn pattern), which instructions sequence is more efficient than other equivalent ones (define_peephole and define_peephole2), how to split a complex instruction into simpler ones then one of them can be replaced into delay slot or fill pipeline (define_split and define_insn_split, for delay slot and pipeline fill in consideration refer to Tool of genattrtab ), how to split a complex instruction into simpler ones then define_insn patterns can be matched (define_expand).
The second is the architecture description; we know different chips of the same series may still have different function units, different pipeline structure. To exploit chips ability as possible, we need tell the compiler about the detail. The description is also defined in rtl language.
So these two forms of description are largely human readable, to convert the descriptions into a form that can be included into the source code of GCC, the developer design a series tools for the purpose. These tools are so important, without them backend can do nothing. They not only provide the major work of assembly generation, but also offer the basic for optimization executed at level close to machine (for release V4, a layer named SSA is introudced to enhence the ability of the compiler to do more optimization at level closer to source code. By machine description file, release V3 can provide very powerful optimization at lower level, however at this lower level form, many informations that can help to dependence analysis, such as type and alias, are stripped. It keeps compiler away from taking opportunites of optimization).
In following, we will study most of the tools for handling the machine description file to undertsand how the backend is constructed. At compiling GCC, these tools will be first compiled, and then they will be run to parse the description file of target machine to generate the source code. Following, it comes to compile GCC’s source (such that for the front-end we have seen), tegother with the code emitted here, to build the compiler.
Before study genrecog tool, we need first study tool genconditions, as it will produce the file insn-conditions.c which will be used by genrecog.
182 int
183 main (int argc, char **argv) in genconditions.c
184 {
185 rtx desc;
186 int pattern_lineno; /* not used */
187 int code;
188
189 progname = "genconditions";
190
191 if (argc <= 1)
192 fatal ("No input file name.");
193
194 if (init_md_reader (argv[1]) != SUCCESS_EXIT_CODE)
195 return (FATAL_EXIT_CODE);
196
197 condition_table = htab_create (1000, hash_c_test, cmp_c_test, NULL);
Note, condition_table below and condition_table above are two different static variables, which are declared in genconditions.c and gensupport.c respectively. init_md_reader will read in rtx objects from machine description file, which is a common function invoked by other tools too.
935 int
936 init_md_reader (const char *filename) in gensupport.c
937 {
938 FILE *input_file;
939 int c;
940 size_t i;
941 char *lastsl;
942
943 lastsl = strrchr (filename, '/');
944 if (lastsl != NULL)
945 base_dir = save_string (filename, lastsl - filename + 1 );
946
947 read_rtx_filename = filename;
948 input_file = fopen (filename, "r");
949 if (input_file == 0)
950 {
951 perror (filename);
952 return FATAL_EXIT_CODE;
953 }
954
955 /* Initialize the table of insn conditions. */
956 condition_table = htab_create (n_insn_conditions ,
957 hash_c_test, cmp_c_test, NULL);
958
959 for (i = 0; i < n_insn_conditions ; i++)
960 *(htab_find_slot (condition_table , &insn_conditions [i], INSERT))
961 = (void *) &insn_conditions [i];
962 obstack_init (rtl_obstack );
963 errors = 0;
964 sequence_num = 0;
965
966 /* Read the entire file. */
967 while (1)
968 {
969 rtx desc;
970 int lineno;
971
972 c = read_skip_spaces (input_file);
973 if (c == EOF)
974 break ;
975
976 ungetc (c, input_file);
977 lineno = read_rtx_lineno ;
978 desc = read_rtx (input_file);
979 process_rtx (desc, lineno);
980 }
981 fclose (input_file);
982
983 /* Process define_cond_exec patterns. */
984 if (define_cond_exec_queue != NULL)
985 process_define_cond_exec ();
986
987 return errors ? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE;
988 }
read_skip_spaces skips white space and comment, and fetches the first valid character.
102 int
103 read_skip_spaces (FILE *infile) in read-rtl.c
104 {
105 int c;
106
107 while (1)
108 {
109 c = getc (infile);
110 switch (c)
111 {
112 case '/n':
113 read_rtx_lineno ++;
114 break ;
115
116 case ' ': case '/t': case '/f': case '/r':
117 break ;
118
119 case ';':
120 do
121 c = getc (infile);
122 while (c != '/n' && c != EOF);
123 read_rtx_lineno ++;
124 break ;
125
126 case '/':
127 {
128 int prevc;
129 c = getc (infile);
130 if (c != '*')
131 fatal_expected_char (infile, '*', c);
132
133 prevc = 0;
134 while ((c = getc (infile)) && c != EOF)
135 {
136 if (c == '/n')
137 read_rtx_lineno ++;
138 else if (prevc == '*' && c == '/')
139 break ;
140 prevc = c;
141 }
142 }
143 break ;
144
145 default :
146 return c;
147 }
148 }
149 }
For md file, “;” at the beginning of the line indicates the whole line is comment. At the same time, “/*” and “*/” pair serve as comment too. Then read_rtx with the help of read_skip_spaces constructs rtx object from rtx definition in machine description file.
509 rtx
510 read_rtx (FILE *infile) in read-rtl.c
511 {
512 int i, j;
513 RTX_CODE tmp_code;
514 const char *format_ptr;
515 /* tmp_char is a buffer used for reading decimal integers
516 and names of rtx types and machine modes.
517 Therefore, 256 must be enough. */
518 char tmp_char[256];
519 rtx return_rtx;
520 int c;
521 int tmp_int;
522 HOST_WIDE_INT tmp_wide;
523
524 /* Obstack used for allocating RTL objects. */
525 static struct obstack rtl_obstack;
526 static int initialized ;
527
528 /* Linked list structure for making RTXs: */
529 struct rtx_list
530 {
531 struct rtx_list *next;
532 rtx value; /* Value of this node. */
533 };
534
535 if (!initialized ) {
536 obstack_init (&rtl_obstack);
537 initialized = 1;
538 }
539
540 again:
541 c = read_skip_spaces (infile); /* Should be open paren. */
542 if (c != '(')
543 fatal_expected_char (infile, '(', c);
544
545 read_name (tmp_char, infile);
546
547 tmp_code = UNKNOWN;
548
549 if (! strcmp (tmp_char, "define_constants"))
550 {
551 read_constants (infile, tmp_char);
552 goto again;
553 }
554 for (i = 0; i < NUM_RTX_CODE; i++)
555 if (! strcmp (tmp_char, GET_RTX_NAME (i)))
556 {
557 tmp_code = (RTX_CODE) i; /* get value for name */
558 break ;
559 }
560
561 if (tmp_code == UNKNOWN)
562 fatal_with_file_and_line (infile, "unknown rtx code `%s'", tmp_char);
563
564 /* (NIL) stands for an expression that isn't there. */
565 if (tmp_code == NIL)
566 {
567 /* Discard the closeparen. */
568 while ((c = getc (infile)) && c != ')')
569 ;
570
571 return 0;
572 }
573
574 /* If we end up with an insn expression then we free this space below. */
575 return_rtx = rtx_alloc (tmp_code);
576 format_ptr = GET_RTX_FORMAT (GET_CODE (return_rtx));
In machine description file, every instruction must be enclosed within parentheses pair, line 542 ensures it. Then read_name is invoked to get the name of the instruction.
154 static void
155 read_name (char *str, FILE *infile) in read-rtl.c
156 {
157 char *p;
158 int c;
159
160 c = read_skip_spaces (infile);
161
162 p = str;
163 while (1)
164 {
165 if (c == ' ' || c == '/n' || c == '/t' || c == '/f' || c == '/r')
166 break ;
167 if (c == ':' || c == ')' || c == ']' || c == '"' || c == '/'
168 || c == '(' || c == '[')
169 {
170 ungetc (c, infile);
171 break ;
172 }
173 *p++ = c;
174 c = getc (infile);
175 }
176 if (p == str)
177 fatal_with_file_and_line (infile, "missing name or number");
178 if (c == '/n')
179 read_rtx_lineno ++;
180
181 *p = 0;
182
183 if (md_constants )
184 {
185 /* Do constant expansion. */
186 struct md_constant *def;
187
188 p = str;
189 do
190 {
191 struct md_constant tmp_def;
192
193 tmp_def.name = p;
194 def = htab_find (md_constants , &tmp_def);
195 if (def)
196 p = def->value;
197 } while (def);
198 if (p != str)
199 strcpy (str, p);
200 }
201 }
Above, at line 183, md_constants is an instance of hash table (htab), while at line 186 the md_constant is a simple struct having two char pointers members name and value .
In machine description file, there sometimes defines constants which will be used in the instruction definition. These constant definitions are indicated by define_constants . At line 551 above in read_rtx , read_constants is invoked to handle the definitions.
421 static void
422 read_constants (FILE *infile, char *tmp_char) in read-rtl.c
423 {
424 int c;
425 htab_t defs;
426
427 c = read_skip_spaces (infile);
428 if (c != '[')
429 fatal_expected_char (infile, '[', c);
430 defs = md_constants ;
431 if (! defs)
432 defs = htab_create (32, def_hash, def_name_eq_p, (htab_del) 0);
433 /* Disable constant expansion during definition processing. */
434 md_constants = 0;
435 while ( (c = read_skip_spaces (infile)) != ']')
436 {
437 struct md_constant *def;
438 void **entry_ptr;
439
440 if (c != '(')
441 fatal_expected_char (infile, '(', c);
442 def = xmalloc (sizeof (struct md_constant));
443 def->name = tmp_char;
444 read_name (tmp_char, infile);
445 entry_ptr = htab_find_slot (defs, def, TRUE);
446 if (! *entry_ptr)
447 def->name = xstrdup (tmp_char);
448 c = read_skip_spaces (infile);
449 ungetc (c, infile);
450 read_name (tmp_char, infile);
451 if (! *entry_ptr)
452 {
453 def->value = xstrdup (tmp_char);
454 *entry_ptr = def;
455 }
456 else
457 {
458 def = *entry_ptr;
459 if (strcmp (def->value, tmp_char))
460 fatal_with_file_and_line (infile,
461 "redefinition of %s, was %s, now %s",
462 def->name, def->value, tmp_char);
463 }
464 c = read_skip_spaces (infile);
465 if (c != ')')
466 fatal_expected_char (infile, ')', c);
467 }
468 md_constants = defs;
469 c = read_skip_spaces (infile);
470 if (c != ')')
471 fatal_expected_char (infile, ')', c);
472 }
The form of the constant definition is like: (define_constants [(name value)…(name value)]), the pair of name and value will be retrieved and saved into struct md_constant, which saved into hash table of md_constants .
Other machine description patterns include define_insn, define_attr, define_peephole, define_split, define_expand, define_insn_and_split and etc. They also appear in file rtl.def . They are special rtx object, and for i386 system, we can find following
192 DEF_RTL_EXPR(DEFINE_INSN, "define_insn", "sEsTV", 'x') in rtl.def
In this definition, we know that the first parameter is the rtx code, the second parameter is the name, the third parameter is the format, in which “sEsTV” means the rtx object has at most 5 children, and the last parameter is the rtx class. Let’s see the meaning the character in format and class for rtx object has.
For format we can get:
"0" field is unused (or used in a phase-dependent manner), prints nothing
"i" an integer, prints the integer
"n" like "i", but prints entries from `note_insn_name'
"w" an integer of width HOST_BITS_PER_WIDE_INT, prints the integer
"s" a pointer to a string, prints the string
"S" like "s", but optional: the containing rtx may end before this operand
"T" like "s", but treated specially by the RTL reader; only in machine description patterns.
"e" a pointer to an rtl expression, prints the expression
"E" a pointer to a vector that points to a number of rtl expressions, prints a list of the rtl expressions
"V" like "E", but optional: the containing rtx may end before this operand
"u" a pointer to another insn, prints the uid of the insn.
"b" is a pointer to a bitmap header.
"B" is a basic block pointer.
"t" is a tree pointer.
For class we can get:
"o" an rtx code that can be used to represent an object (e.g, REG, MEM)
"<" an rtx code for a comparison (e.g, EQ, NE, LT)
"1" an rtx code for a unary arithmetic expression (e.g, NEG, NOT)
"c" an rtx code for a commutative binary operation (e.g,, PLUS, MULT)
"3" an rtx code for a non-bitfield three input operation (IF_THEN_ELSE)
"2" an rtx code for a non-commutative binary operation (e.g., MINUS, DIV)
"b" an rtx code for a bit-field operation (ZERO_EXTRACT, SIGN_EXTRACT)
"i" an rtx code for a machine insn (INSN, JUMP_INSN, CALL_INSN)
"m" an rtx code for something that matches in insns (e.g, MATCH_DUP)
"g" an rtx code for grouping insns together (e.g, GROUP_PARALLEL)
"a" an rtx code for autoincrement addressing modes (e.g. POST_DEC)
"x" everything else
And other related instructions’ rtl definitions are:
200 DEF_RTL_EXPR(DEFINE_PEEPHOLE, "define_peephole", "EsTV", 'x')
211 DEF_RTL_EXPR(DEFINE_SPLIT, "define_split", "EsES", 'x')
239 DEF_RTL_EXPR(DEFINE_INSN_AND_SPLIT, "define_insn_and_split", "sEsTsESV", 'x')
243 DEF_RTL_EXPR(DEFINE_PEEPHOLE2, "define_peephole2", "EsES", 'x')
247 DEF_RTL_EXPR(DEFINE_COMBINE, "define_combine", "Ess", 'x')
260 DEF_RTL_EXPR(DEFINE_EXPAND, "define_expand", "sEss", 'x')
276 DEF_RTL_EXPR(DEFINE_DELAY, "define_delay", "eE", 'x')
So at line 554 in function read_rtx above, other instructions in machinde description file are recognized by name, and the corresponding rtx codes are fetched. At line 565, NIL is used by rtl reader and printer to represent a null pointer. Those instructions are allocated as rtx objects. And read_rtx comes to handle these instructions.
Let’s take an example from i386.md.
467 (define_insn "cmpdi_ccno_1_rex64" in i386.md
468 [(set (reg 17)
469 (compare (match_operand:DI 0 "nonimmediate_operand" "r,?mr")
470 (match_operand:DI 1 "const0_operand" "n,n")))]
471 "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)"
472 "@
473 test{q}/t{%0, %0|%0, %0}
474 cmp{q}/t{%1, %0|%0, %1}"
475 [(set_attr "type" "test,icmp")
476 (set_attr "length_immediate" "0,1")
477 (set_attr "mode" "DI")])