以GCC8.2.0版本为例,介绍gcc语法解析器(parser)对声明即函数定义的解析过程以及structure/union的简单解析说明。
GCC中gcc/c/c-parser.c文件主要负责解析GNU C的完整语法。其中单个源码文件的解析入口在void c_parse_file (void)函数中,函数代码如下:
先对c_parser struct进行初始化,再判断其pragma_kind。代码中的c_parser_translation_unit函数主要为解析当前translation unit(TU)。最后每一个parser解析完成时会被置为NULL,又开始解析下一个源文件。
其中一个c_parser对应着一个.c文件,c_parser struct解析器结构记录了相关解析状态和上下文的信息,以及词法分析器信息等,其代码如下:
translation unit是gcc整个语法产生式的开始点,其注释说明如下:
/* Parse a translation unit (C90 6.7, C99 6.9, C11 6.9).
translation-unit:
external-declarations
external-declarations:
external-declaration
external-declarations external-declaration
GNU extensions:
translation-unit:
empty
*/
函数中ggc_collect函数进行一些顶级标记和清除处理,c_parser_external_declaration函数中进行非终结符语法解析,代码如下:
根据解析c_parser的&parser->tokens[0]->type
执行不同规则的流程,即不同的语法产生式会走不同的执行路径。部分代码如下:
该方法首先会收集构建成声明或函数定义的信息,然后将它们组合成一个声明。其主要信息包含声明说明符(c_declspecs)、声明符(c_declarator),初始化值,参数列表等。该方法中首先根据build_null_declspecs方法创建一个structure的空declaration specifiers list,其后进行一系列的解析和处理操作。总体来说一个完整的声明包括start_decl、finish_decl两部分,其中一个非常重要的结构体是c_declspecs,主要用于在解析过程中存放声明信息。
build_null_declspecs函数代码如下:
c_declspecs struct的定义如下:
/* A sequence of declaration specifiers in C. When a new declaration
specifier is added, please update the enum c_declspec_word above
accordingly. */
struct c_declspecs {
source_location locations[cdw_number_of_elements];
/* The type specified, if a single type specifier such as a struct,
union or enum specifier, typedef name or typeof specifies the
whole type, or NULL_TREE if none or a keyword such as "void" or
"char" is used. Does not include qualifiers. */
tree type;
/* Any expression to be evaluated before the type, from a typeof
specifier. */
tree expr;
/* The attributes from a typedef decl. */
tree decl_attr;
/* When parsing, the attributes. Outside the parser, this will be
NULL; attributes (possibly from multiple lists) will be passed
separately. */
tree attrs;
/* The pass to start compiling a __GIMPLE or __RTL function with. */
char *gimple_or_rtl_pass;
/* The base-2 log of the greatest alignment required by an _Alignas
specifier, in bytes, or -1 if no such specifiers with nonzero
alignment. */
int align_log;
/* For the __intN declspec, this stores the index into the int_n_* arrays. */
int int_n_idx;
/* For the _FloatN and _FloatNx declspec, this stores the index into
the floatn_nx_types array. */
int floatn_nx_idx;
/* The storage class specifier, or csc_none if none. */
enum c_storage_class storage_class;
/* Any type specifier keyword used such as "int", not reflecting
modifiers such as "short", or cts_none if none. */
ENUM_BITFIELD (c_typespec_keyword) typespec_word : 8;
/* The kind of type specifier if one has been seen, ctsk_none
otherwise. */
ENUM_BITFIELD (c_typespec_kind) typespec_kind : 3;
/* Whether any expressions in typeof specifiers may appear in
constant expressions. */
BOOL_BITFIELD expr_const_operands : 1;
/* Whether any declaration specifiers have been seen at all. */
BOOL_BITFIELD declspecs_seen_p : 1;
/* Whether something other than a storage class specifier or
attribute has been seen. This is used to warn for the
obsolescent usage of storage class specifiers other than at the
start of the list. (Doing this properly would require function
specifiers to be handled separately from storage class
specifiers.) */
BOOL_BITFIELD non_sc_seen_p : 1;
/* Whether the type is specified by a typedef or typeof name. */
BOOL_BITFIELD typedef_p : 1;
/* Whether the type is explicitly "signed" or specified by a typedef
whose type is explicitly "signed". */
BOOL_BITFIELD explicit_signed_p : 1;
/* Whether the specifiers include a deprecated typedef. */
BOOL_BITFIELD deprecated_p : 1;
/* Whether the type defaulted to "int" because there were no type
specifiers. */
BOOL_BITFIELD default_int_p : 1;
/* Whether "long" was specified. */
BOOL_BITFIELD long_p : 1;
/* Whether "long" was specified more than once. */
BOOL_BITFIELD long_long_p : 1;
/* Whether "short" was specified. */
BOOL_BITFIELD short_p : 1;
/* Whether "signed" was specified. */
BOOL_BITFIELD signed_p : 1;
/* Whether "unsigned" was specified. */
BOOL_BITFIELD unsigned_p : 1;
/* Whether "complex" was specified. */
BOOL_BITFIELD complex_p : 1;
/* Whether "inline" was specified. */
BOOL_BITFIELD inline_p : 1;
/* Whether "_Noreturn" was speciied. */
BOOL_BITFIELD noreturn_p : 1;
/* Whether "__thread" or "_Thread_local" was specified. */
BOOL_BITFIELD thread_p : 1;
/* Whether "__thread" rather than "_Thread_local" was specified. */
BOOL_BITFIELD thread_gnu_p : 1;
/* Whether "const" was specified. */
BOOL_BITFIELD const_p : 1;
/* Whether "volatile" was specified. */
BOOL_BITFIELD volatile_p : 1;
/* Whether "restrict" was specified. */
BOOL_BITFIELD restrict_p : 1;
/* Whether "_Atomic" was specified. */
BOOL_BITFIELD atomic_p : 1;
/* Whether "_Sat" was specified. */
BOOL_BITFIELD saturating_p : 1;
/* Whether any alignment specifier (even with zero alignment) was
specified. */
BOOL_BITFIELD alignas_p : 1;
/* Whether any __GIMPLE specifier was specified. */
BOOL_BITFIELD gimple_p : 1;
/* Whether any __RTL specifier was specified. */
BOOL_BITFIELD rtl_p : 1;
/* The address space that the declaration belongs to. */
addr_space_t address_space;
};
c_parser_declaration_or_fndef函数中还有两个相关的比较重要的函数分别为:c_parser_declspecs、finish_declspecs。部分代码如下:
类型的解析是在c_parser_declspecs函数中,无论是系统默认的类型还是用户自定义的类型,在解析过程中c_parser_declspecs函数会被递归调用,直到解析完最后一行以分号结尾的代码为止。
该方法中依照语法产生式循环解析各种说明符,包括解析前的各种错误和警告检查,当所有的检查通过时来到具体说明符的真正解析位置,通过c_parser_peek_token (parser)函数解析出parser->token[0]信息,并根据其keyword关键字来判断走哪一个switch case。解析完成之后会将此说明符解析出来的信息放置在声明说明符c_declspecs中。
例如结构体、联合体的keyword判断为:
以上代码中包含两个重要函数:c_parser_struct_or_union_specifier、declspecs_add_type。c_parser_struct_or_union_specifier函数主要是解析结构或联合说明符,declspecs_add_type函数是把存储类说明符或函数说明符信息添加到声明说明符数据结构中。
解析结构体或联合体说明符,struct/union的处理相关函数有:start_struct(解析struct/union的定义,并在解析组件之前启动标签的作用域,准备相关的数据结构)、c_parser_struct_declaration(以分号为单位,解析出一个声明列表)、chainon(将c_parser_struct_declaration解析出来的声明列表和之前的串联起来)、finish_struct(完成结构体的定义,这包括布局结构体的空间,实际分配空间给各成员,计算出结构体的对齐方式、机器模式、大小)、parser_xref_tag(在声明结构体变量的时候会被执行)。
部分代码如下所示:
struct c_typespec ret;
tree attrs;
tree ident = NULL_TREE;
location_t struct_loc;
location_t ident_loc = UNKNOWN_LOCATION;
enum tree_code code;
switch (c_parser_peek_token (parser)->keyword)
{
case RID_STRUCT:
code = RECORD_TYPE;
break;
case RID_UNION:
code = UNION_TYPE;
break;
default:
gcc_unreachable ();
}
//...
/* Parse a struct or union definition. Start the scope of the
tag before parsing components. */
struct c_struct_parse_info *struct_info;
tree type = start_struct (struct_loc, code, ident, &struct_info);
//...
/* Parse some comma-separated declarations, but not the
trailing semicolon if any. */
decls = c_parser_struct_declaration (parser);
contents = chainon (decls, contents);
//...
ret.spec = finish_struct (struct_loc, type, nreverse (contents),
chainon (attrs, postfix_attrs), struct_info);
ret.kind = ctsk_tagdef;
ret.expr = NULL_TREE;
ret.expr_const_operands = true;
timevar_pop (TV_PARSE_STRUCT);
return ret;
}
else if (!ident)
{
c_parser_error (parser, "expected %<{%>");
ret.spec = error_mark_node;
ret.kind = ctsk_tagref;
ret.expr = NULL_TREE;
ret.expr_const_operands = true;
return ret;
}
ret = parser_xref_tag (ident_loc, code, ident);
return ret;
函数c_parser_struct_declaration处理是以分号为单位的结构体成员声明。也就是说这些声明符共用一个声明说明符,但声明符和位域都是需要分别处理,其注释说明如下:
该方法中主要调用的方法有:c_parser_declspecs、finish_declspecs、grokfield。部分代码如下:
在获取关于一个声明的所有说明信息后,调用函数finish_declspecs按照c_declspecs中的各布尔变量给出相应的类型节点, 比如unsigned int a; gcc在获得unsigned int信息后,调用函数finish_declspecs,将specs->type 置为unsigned_type_node。即函数finish_declspecs用来确定一个声明的类型信息。只有经过该函数给出节点的类型信息之后,其后的grokfield函数才能正确将所有声明整合成一个真正的GCC declare。
部分代码如下所示:
grokfield函数的作用便是把函数c_parser_struct_declaration处理以分号为单位的结构体成员声明,通过grokfield函数将这些分别处理的声明符和位域整合成一个真正的GCC声明。我们获得了声明说明符和声明符(可能会有位宽),于是我们有足够的信息去合成一个结构体成员。其代码如下:
该函数的主要处理步骤为:如果是普通的标识符则进入此条件的语句块中;通过grokdeclarator糅合成一个FIELD_DECL声明;完成结构体的定义
对于如下示例结构体:
typedef struct _PixelPacket{
char rt, gt, bt, ot;
}PixelPacket;
结构体的声明会在c_parser_declaration_or_fndef中进行处理,结构体的定义作为一个类型会在c_parser_declspec函数中进行解析,对于结构体定义他会直接跳转到函数c_parser_declaration_or_fndef。函数c_parser_declspecs会解析struct定义并将解析出来的类型放入声明说明符结构体c_declspec中,finish_declspecs会将声明说明符结构体中的内容进行解析,例如基本类型int 在c_declspec中是cst_int,会在函数finish_declspecs中找到其对应的树类型节点。
在c_parser_struct_or_union_specifier函数中c_parser_struct_declaration方法执行结束之后整个主要的类型解析过程基本也快结束了,此时得到的tree decls通过gdb,我们可以看到其类型已经是一个FIELD_DECL了,如下:
此时我们再从decls->decl_common.common.common.typed.type中继续查找,但由于此时的数据内存我们不能访问,所以我们gdb往后执行多遍之后,便能找到其中的一些信息,如下:
在这便能通过decls得到structure的field type信息,同样field declare信息也可以通过decls->decl_minimal.name->identifier.id.str的chainon链表循环获得。