coco/r笔记-对unicode支持及EBNF表达式

1、支持unicode

2、支持utf-8

3、例子如下:

VarDeclaration<ref int adr> (. string name; TypeDesc type; .)
= Ident<out name> (. Obj x = symTab.Enter(name);
int n = 1; .)
{ ',' Ident<out name> (. Obj y = symTab.Enter(name);
x.next = y; x = y;
n++; .)
}
':' Type<out type> (. adr += n * typ.size;
for (int a = adr; x != null; x = x.next) {
a -= type.size;
x.adr = a;
} .)
';' .
The core of this specification is the EBNF production
VarDeclaration = Ident {',' Ident} ':' Type ';'.

void VarDeclaration(ref int adr) {
string name; TypeDesc type;
Ident(out name);
Obj x = symTab.Enter(name);
int n = 1;
while (la.kind == comma) {
Get();
Ident(out name);
Obj y = symTab.Enter(name);
x.next = y; x = y;
n++;
}
Expect(colon);
Type(out type);
adr += n * type.size;
for (int a = adr; x != null; x = x.next) {
a -= type.size;
x.adr = a;
}
Expect(semicolon);
}

4、表达式规则如下:

ident = letter {letter | digit}.
number = digit {digit}.
string = '"' {anyButQuote} '"'.
char = '\'' anyButApostrophe '\''.

\\ backslash \r carriage return \f form feed
\' apostrophe \n new line \a bell
\" quote \t horizontal tab \b backspace
\0 null character \v vertical tab \uxxxx hex char value
The following identifiers are reserved keywords (in the C# version of Cocol/R the
identifier using is also a keyword, in the Java version the identifier import):
ANY CONTEXT IGNORE PRAGMAS TOKENS
CHARACTERS END IGNORECASE PRODUCTIONS WEAK
COMMENTS FROM NESTED SYNC
COMPILER IF out TO
Comments are enclosed in /* and */ and may be nested. Alternatively they can start
with // and go to the end of the line.
EBNF
All syntax descriptions in Cocol/R are written in Extended Backus-Naur Form
(EBNF) [Wirth77]. By convention, identifiers starting with a lower case letter denote
terminal symbols, identifiers starting with an upper case letter denote nonterminal
symbols. Strings denote themselves. The following meta-characters are used:
symbol meaning example
= separates the sides of a production A = a b c .
. terminates a production A = a b c .
| separates alternatives a b | c | d e means a b or c or d e
( ) groups alternatives (a | b) c means a c or b c
[ ] option [a] b means a b or b
{ } iteration (0 or more times) {a} b means b or a b or a a b or ...
Attributes are written between < and >. Semantic actions are enclosed in (. and .).
The operators + and - are used to form character sets.
2.2 Overall Structure
A Cocol/R compiler description has the following structure:
Cocol =
[Imports]
"COMPILER" ident
[GlobalFieldsAndMethods]
ScannerSpecification
ParserSpecification
"END" ident '.'
.

你可能感兴趣的:(C++,c,C#,F#,Go)