linux flex 手册

Synopsis(概要)-------------------------------------------------------------
flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
[--help --version] [filename...]
Overview-------------------------------------------------------------------



///////////////////////////////////////////////////////////////////////////////////////////////////////
Description(说明)

flex是一个产生扫描器的工具;能识别文本模式的词句的程序.flex读入给定的文件,
或没有文件名给定时的标准输入,即要产生的扫描器的说明.说明按照正规表达式和
C代码的格式,叫做规则.flex产生定义了例程'yylex()'的'lex.yy.c'C源码文件,该
文件被编译和用'-lfl'标志链接库以产生可执行文件,当可执行文件被运行后,它分析
以正规表达式出现的输入.无论何时它找到匹配,都将执行相应的C代码.


Patterns(模式)

'x'     匹配字符'x'

'.'     除了新行'\n'以外的任何字符(字节)

'[xyz]'         一个"字符类";在这种情况,该模式匹配'x','y','z'中任意一个

`[abj-oZ]'      一个内部带有一个范围的"字符类";匹配一个'a',一个'b',任意一个从'j'到'o'间的字母,或是一个'Z'

`[^A-Z]'        一个"否定字符类",也就是,任意一个不在给出的范围字符类内的字符.在这里,任意的非大写字符.

`[^A-Z\n]'      任意的非大写字母或是一个新行.

`r*'    0个或以上的r,r是任意的正规表达

`r+'    1个或以上的r

`r?'    0个或1个的r(也即,"一个可选的r")

`r{2,5}'                在任何地方出现的范围是2到5个的r.

`r{2,}'         2个或以上的r

`r{4}'          恰好4个的r

`{name}'                "name"定义的扩展(看上面)

`"[xyz]\"foo"'          逐字的字符串: '[xyz]"foo'

`\的'   如果x是一个`a', `b', `f',`n', `r', `t', 或 `v',那么ANSI-C解释\x,否则,
        一个字面上的`x'(用来escape象`*'那样的operators)

`\0'    一个NUL字符(ASCII码的0)

`\123'          带有八进制值123的字符

`\x2a'          带有十六进制值123的字符

`(r)'           匹配一个r;括号被用来忽视优先级(看下面)

`rs'            后跟规则表达s的规则表达r;叫做"concatenation"连结

`r|s'           一个r或是一个s

`r/s'   后只跟一个s的r.该文本根据s是否被包含匹配当决定这个规则是否是最长的匹配时,
        但在动作执行前返回s给输入.因此该动作只能看到根据r匹配的文本.这个模式的类型被称为紧随上下文.
        (有一些flex不能正确匹配的`r/s'的组合,看底下Deficiencies/Bugs这节关于"dangerous trailing context"的注意)

`^r'    一个r,但该r只是在行首(也即是,要扫描的开始处,或者说是在一个新行已被扫描到后).

`r$'    一个r,但该r只是在行尾(也即是,恰好在一个新行前).等价于"r/\n".注意flex对"newline"的主张正好象
        任意C编译器用来编译flex时对'\n'的解释一样;个别例外是,
        在一些DOS系统你必须你自己滤除在输入内的\r,或显式地为"r$"用上r/\r\n

`<s>r'          一个r,但该r只是存在于开始条件s内时(看下面对开始条件的讨论)<s1,s2,s3>r也一样,
                只是r可以在任意的开始条件s1,s2或s3内

`<*>r'          一个在任何开始条件内的r,甚至是独占的条件.

`<<EOF>>'       一个end-of-file

`<s1,s2><<EOF>>'        一个end-of-file当在开始条件s1或s2内时



注意在字符类内部,所有正规表达式operators失去他们的特别语义处理('\')和
字符类operators,'-',']',还有字符类的开始operators,'^'.

上面正规表达式是按照从高到底优先级列举的,有些具有同等优先级.举个例子,

 foo|(ba(r*))

 因"*"operators优先级比concatenation(连接)高,并且"连接"又比"或('|')"高.因此
 该模式匹配字符串"foo"或者匹配有0个或多个r跟随的字符串"ba".要匹配"foo"或0个或多个
 "bar",用:

 foo|(bar)*

 除字符和字符范围外,字符类也能包含字符类表达.这些表达被附在'['和']'定界符(...)之内
 .可用的表达是:

[:alnum:] [:alpha:] [:blank:]
[:cntrl:] [:digit:] [:graph:]
[:lower:] [:print:] [:punct:]
[:space:] [:upper:] [:xdigit:]

这些表达都指定一个字符集合以等价于相应的标准C的'isXXX'函数.例如,'[:alnum:]'指定那些
让'isalnum'返回true的字符,任何按字母或数字顺序的一些系统不提供'isblank()',因此flex
定义'[:blank:]'为一个空格或一个tab键入.

例如,下列字符类都是等价的:

        [[:alnum:]]
        [[:alpha:][:digit:]
        [[:alpha:]0-9]
        [a-zA-Z0-9]

假如你的扫描器是case-insensitive的(-i标志),那么'[:upper:]'和'[:lower:]'等价于'[:alpha:]'.

一些模式的注解:

    1.一个象上面例子"[^A-Z]"样的否定字符类将匹配一个新行除非"\n"(或等价的escape序列)显式的出现
    在否定字符类(如,"[^A-Z\n]").这不象许多别的正规表达式工具样对待否定字符类,遗憾的是
    不兼容性是历史原因造成的.匹配新行意味这象 [^"]* 样的模式能匹配整个输入除非在输入里有别的引用
    2.一个规则最多只能有一个上下文跟随的instance('/'operator或'$'operator).开始情形,'^',
    和"<<EOF>>"模式只能存在于一个模式的开头,且,除带有'/'和'$'外,不能在括弧内部聚合在一块.
    一个不存在于规则开头的'^'或一个不存在于规则末尾的'$'失去它的特别语义而被作为一个正常字符对待
    下面的规则是非法的:

        foo/bar$
        <sc1>foo<sc2>bar


    注意非法1可被改正成"foo/bar\n".
    下面将导致"$"或"^"被作为一个正常字符对待:

        foo|(bar$)
        foo|^bar

    假如要的是一个"foo"或一个"禁止后跟一个新行",下面的能达到目的(特别的'|'动作在底下说明)

        foo      |
        bar$     /* action goes here */

     相类似的trick将为匹配一个foo或一个"bar-at-the-beginning-of-a-line"工作.

How the input is matched(输入是咋样被匹配的)

    当产生的扫描器被运行后,它分析它的输入寻找匹配它的任何规则的字符串.假如它找到
   一个以上的匹配,它取匹配最长文本的一个(对于trailing context规则,这个最长包括了trailing部分的长度,
    尽管它然后将返回给输入).假如它找到两个或更多同样长度的匹配,列举在flex输入文件中第一个规则被选中.
   
    一旦匹配确定,相应匹配的文本被安排在全局字符指针yytext中有用,并且它的长度在全局整形数yyleng中.
   相应的动作被执行(更多的动作细节见下面),并且然后扫描剩下的输入以匹配另一个规则.

   假如没找到匹配,那么默认的规则被执行:下一个输入的字符考虑过匹配且拷贝到标准输出.因此,最简单的合法的
   flex输入是:

        %%

   这将产生一个简单的将它的输入(每次一个字符)拷贝到它的输出的扫描器.

   注意yytext能被定义成两种不同的方式:或者是字符指针或者是字符数组.在你的flex输入的第一(定义)段中你能通过
   专门的'%pointer'或'array'指令控制.默认是'%pointer',除非你用'-l'lex兼容性选项,在yytext成为一个数组的情况,
   用 '%pointer'的好处是在匹配非常大的记号(除非你用完了动态内存)时扫描快些而且没有缓冲区溢出.
   坏处是约束你咋样修改yytext(看下一段),并且'unput()'函数调用将破坏当前yytext的内容,这一点在不同的lex版本间
   移植时是相当令人头疼的关卡.

   用 '%array'的好处是你能按你的意思修改yytext,并且并且'unput()'函数调用不会破坏当前yytext的内容(看下面),
   此外,现存的lex程序有时用扩展的声明形式存取yytext:
  
        extern char yytext[];

   当用'%pointer'时这个定义是错误的,但用'%array'没错.

   '%array'定义yytext为一含YYLMAX个元素的字符数组,默认的一个非常大的值.你能简单的#define YYLMAX *在你的flex输
   入的第一段中而改变这个值的大小,如上面提到的,用'%pointer'动态的调整巨大的记号集.这意味着你的'%pointer'扫描器能
   调整非常巨大的记号集合(如匹配整个注释块),忍受每次扫描器必须改变yytext大小的同时也必须重新扫描整个的记号集
   因此匹配这样的记号集合会变得很慢.目前yytext不能动态地增长假如'unput()'的一个调用导致太多的文本需要回;代之的是
   一个运行期错误.
  
   也要注意你不能用带有C++扫描器类的'%array'(c++选项;看下面).

 Actions(动作)
 每个在规则中的模式都有一个相应的动作,这可以是任何C表达.在模式的第一个non-escaped空格字符结束处的,同一行剩下的部分
 是它的动作,假如动作为空,那么该模式匹配的输入被简单的废弃.例如这是一个删除所有出现"zap me"形式输入的特别的程序:

        %%
        "zap me"

 (它将拷贝别的所有输入字符到输出因为他们匹配默认规则.)

 这下面是一个把多个空格和tab压缩为一个单一空格的程序,并抛弃在一行末尾找到的空白:

        %%
        [ \t]+        putchar( ' ' );
        [ \t]+$       /* ignore this token */

 假如动作包含一个'{',那么动作区跨越直到平衡的'}',并且动作可以越过多行.flex知道C字符串和注释并且不会被他们内部的花括号迷惑,
 但也允许以'%{'开始的动作考虑动作成为到下一个'%}'的所有文本()(不顾在动作内部的普通的空格).

 动作含('|')意味着"和下一个规则共一个动作"看下面的例证.
 
 动作能包含任意的C代码,包括return表达以返回值给调用'yylex()'的例程.每次'yylex()'被调用,它从最左连续处理记号集
 直到它到达文件结束或执行到return.
 
 动作能自由地修改yytext除了长度以外(加字符到yytext末尾--这将改写在输入流后面的字符).该情况没有用'%array'(看上面);
 在用'%array'的那种情形, 动作可以任何方式自由地修改yytext.
 
 动作能自由地修改yyleng除了他们不想那么做假如动作中也包含'yymore()'(看下面).
 
 这儿有一些特别的能包含在动作中的指令:
 
        一)'ECHO'拷贝yytext到扫描器的输出.
        二)BEGIN后跟开始condition的名字,该condition把扫描器安置在相应的开始condition内(看下面).
        三)REJECT指示扫描器继续处理到"第二个最好的"匹配输入(或输入的一个前缀)的规则.
        该规则被按"How the Input is Matched"选中,
        并且yytext和yyleng被适时的建立.它或许是跟最初选中的规则匹配最多文本的但后来却成为flex的输入文件,
        或许是匹配较少的文本,举个例子,接
        下来的程序计算输入的单词并在看到"frob"调用例程special():
       
                int word_count = 0;
        %%

        frob        special(); REJECT;
        [^ \t\n]+   ++word_count;
       
        没有REJECT,任何在输入中的"frob"将不能被作为单词计算,因为扫描器正常情况下为每个记号只执行一个动作.REJECT允许执行多个,
        每一个当前活动的规则找下一个最好的选择.假如当下面的扫描器扫描记号"abcd"时,它将写"abcdabcaba"到输出;

        %%
        a        |
        ab       |
        abc      |
        abcd     ECHO; REJECT;
.       |\n     /* eat up any unmatched character */

        (开始的三个规则分享第四个动作因为他们用了特别的'|'动作)REJECT是格外奢侈的特色在扫描器的执行中;假如它用在扫描器的任意动作中
        它将拖慢所有的扫描器匹配.此外,用REJECT时不能带'-Cf'或'-CF'选项(看下面).
        也要注意REJECT是一个分支,不象别的特别的动作;紧跟着它的在动作中的代码将不被执行.
        四)'yymore()'告诉扫描器下一次匹配的一个规则,相应的记号应该被附加到当前yytext的值而不是替换它.
        例如,给予"mega-kludge"输入下面的程序
        将写"mega-mega-kludge"到输出:

        %%
        mega-    ECHO; yymore();
        kludge   ECHO;
       
        第一个"mega-"被匹配并回音到输出.然后"kludge"被匹配,但前面的"mega-"仍挂在yytext的开头因此"kludge"规则的'ECHO'将实际写出"mega-kludge".
       
关于'yymore()'用法的两个注解.第一,'yymore()'取决于当前记号正确反馈的yyleng的值,因此你不能修改yyleng假如你用了'yymore()'.
第二,在扫描器动作中存在的'yymore()'必然使扫描器的匹配速度蒙受性能上的保应.

        一)'yyless(n)'返回除"后退到输入流的当前记号开头的"n个字符之外的一切,就是当扫描器寻找下一个匹配时
        他们将被从新扫描的那个地方.yytext和yyleng被适当地调整(举例来说,yyleng现在将等于n).
        例如,在下面的程序中输入"foobar"将写出"foobarbar":
       
        %%
        foobar    ECHO; yyless(3);
        [a-z]+    ECHO;

        带1个参数0的yyless将造成整个的当前输入字符串被从新扫描.除非你已改变扫描器随后处理输入的方式(举个例子用BEGIN),
        否则这将导致一个无穷循环.注意yyless是一个宏并且只能用在flex的输入文件中,不能是别的源文件.
        二)'unput(c)'把字符c放回到输入流.它将成为下一个被扫描的字符.
        下面的动作取得当前的记号并使它附在括号内被从新扫描.
       
        {
        int i;
        /* Copy yytext because unput()trashes yytext */
        char *yycopy = strdup( yytext);
        unput( ')' );
        for ( i = yyleng - 1; i >= 0;--i )
                unput(yycopy[i] );
        unput( '(' );
        free( yycopy );
        }
       
        注意因每一个'unput()'安置给定的字符到输入流的开头,向后推(pushing back)字符串必须从后到前完成.用'unput'时一个重要的潜在的问题是假如你
        用了'%pointer'(默认),一个'unput()'调用将破坏yytext的内容,starting withits rightmost character and devouring one character to the left with eachcall
        假如你需要保留yytext的值在'unput()'调用后(就象在上面的例子样),你必须首先拷贝它到其他地方,或'%array'构建你的扫描器(看How The Input Is Matched)
        最后,注意你不能向后推EOF以试图标记一个带有文件结束标志的输入流.
        三).'input()'读入输入流中的下一个字符.例如,下面是吃进C注解的一个方法:

        %%
        "/*"        {
                registerint c;

                for ( ; ; )
                        {
                        while ( (c = input()) != '*' &&
                                c != EOF )
                        ;   /* eat up text of comment */

                        if ( c == '*' )
                        {
                        while ( (c =input()) == '*' )
                        ;
                        if ( c == '/' )
                        break;    /* found the end */
                        }

                        if ( c == EOF )
                        {
                        error( "EOF in comment" );
                        break;
                        }
                        }
                 }
                 
        (注意假如扫描器是用'C++'编译的,那么'input()'最好由'yyinput()'代替,为了避免与输入的'C++'流名字产生一个名称冲突.)
        四)YY_FLUSH_BUFFER flushes扫描器内部缓冲区以便下一次扫描器尝试匹配一个记号时,
        它将首先用YY_INPUT填满缓冲区(看下面The Generated Scanner).
        该动作是'yy_flush_buffer()'函数的更一般的情况,在Multiple Input Buffers这个段内描述.
        五)'yyterminate()'能被用在一个动作内的返回表达语句中.它终止扫描器并返回0给扫描器的调用者,显示"全搞定了".
        默认情况,在文件结束处'yyterminate()'也被调用.
        它是一个宏并且可以被重新定义.

The generated scanner(一般的扫描器)

flex的输出文件是'lex.yy.c',包含扫描例程'yylex()',一定数量的表被它用在匹配记号集,并且一定数量的辅助例程和宏.
默认情况,'yylex()'象下面样声明:

        int yylex()
        {
        ...various definitions and the actions in here ...
        }

(假如你的环境支持函数原型,那么它将是"int yylex(void)".)可以通过定义宏"YY_DECL"改变这个定义.举个例子,你可用:

#define YY_DECL float lexscan( a, b ) float a, b;

给扫描例程命名为lexscan,返回一个浮点数,并带了两个浮点数作参数.注意假如你给例程的参数用的是一个K&R格式/非原型
的函数声明,你必须用(';')终止该定义.无论'yylex()'何时被调用,它都将扫描从全局输入文件(默认是标准输入)来的记号集.
他连续扫描直到一个文件结束处end-of-file(即返回值0的那一点)或到它的动作执行到一个返回表达语句处.

当扫描器到达一个文件结束处end-of-file,并发调用没被定义除非yyin指向一个新的输入文件(连续扫描那个文件的情形),
或'yyrestart()'被调用.'yyrestart()'带了一个参数,一个'FILE *'的指针(可以为nil,假如你已经建立YY_INPUT以扫描
除yyin外的一个源文件的话),并且为从那个文件扫描初始化了yyin.本质上说,仅仅分配yyin给一个新输入文件或
调用'yyrestart()'那样做之间并没有什么不同;后者对于与前面的版本兼容有用,并且也能用来在扫描中转变输入文件.
还能通过带一个yyin的参数调用以废弃当前输入缓冲区;但更好的是用 YY_FLUSH_BUFFER (看上面).
注意'yyrestart()'不能重新设置开始状态to INITIAL(看下面Start Conditions)

如果yylex()是因为执行到一个动作内的返回表达而停止扫描,扫描器可以再次调用且它将假定自己是从停止处开始扫描.

默认情况(而且是出于效率的目的),扫描器用整块读入而不是简单的'getc()'调用读入字符从yyin,它使输入得到控制的
本质是定义YY_INPUT宏.YY_INPUT的调用序列是"YY_INPUT(buf,result,max_size)".它的动作是安置(达到max_size的)字符
到字符数组buf并且返回在整型值result内的或是读入字符的数字或是常量yy-null(unix系统下为0)以指示eof。
默认的YY_INPUT读入全局文件指针“yyin”。

定义的YY_INPUT的一个例子(在输入文件的定义段)

%{
#define YY_INPUT(buf,result,max_size) \
    { \
    int c = getchar(); \
    result = (c == EOF) ? YY_NULL :(buf[0] = c, 1); \
    }
%}
同时这个定义将改变出现字符的输入处理。

当扫描器从YY_INPUT收到一个end-of-file指示时,它就检查函数‘yywrap()’。
假如函数‘yywrap()返回false(zero),那么扫描器假定该函数已经安装yyin以指向另一个输入文件并且继续扫描。
假如yywrap()返回true(non-zero),那么扫描器终止,返回0给他的调用者。注意任一情况,开始状态仍然不变;
它不会回复到INITIAL.

假如你不提供你自己的‘yywrap()’版本。那么你必须用‘%option noyywrap’(在扫描器行为象"yywrap()返回1"的情况),
或你必须用‘-lf1’以包含默认版本的例程,即总是返回1的。

有三个可用于扫描内存缓冲区内而不是文件的例程:‘yy_scan_string()', ‘yy_scan_bytes()', and ‘yy_scan_buffer()'。他们的讨论
见Multiple InputBuffers.

扫描器写她的’ECHO’输出到全局变量yyout中(默认,标准输出),yyout可以由用户简单的赋给它别的一些文件指针而改变。


Start conditions(开始状态)

Flex为条件激活规则提供了一个机制,任何前缀带有“<sc>”的模式将只能在扫描器在以“sc”命名的开始状态之内时才被激活。例如,

<STRING>[^"]*        { /* eatup the string body ... */
            ...
            }

只有扫描器在“STRING”开始状态中时才被激活,又如

<INITIAL,STRING,QUOTE>\.        {/* handle an escape ... */
            ...
            }
只有当前开始状态是"INITIAL","STRING", 或"QUOTE"时才被激活。

开始状态在(后跟名称列表的'%s'或'%x'其中任意一个开头)的非锯齿输入行定义段(第一)段被声明,前者声明包含的开始状态,后者声明独占的
开始状态。用BEGIN动作激活一个开始状态。直到下一个BEGIN动作被执行,带有给定开始状态的规则将被激活并且带有别的开始状态的规则将
不被激活。假如开始状态是"包含的",那么不带有开始状态的规则也将被激活。假如开始状态是"独占的",那么将只有带有开始状态的规则有资格
被激活。偶然在同一个独占的开始状态之上的一组规则"描述一个在flex输入中不倚赖于任何别的规则的扫描器的"
因为这个原因,独占的开始状态使说明"mini-scanners"变的容易,"mini-scanners"的扫描器的输入扫描部分的语法构成上不同于剩余部分(例如,注释).


假如你对开始状态的包含和独占之间的的区别仍然有点模糊.这儿有一个简单的例解两者之间关系的例子.规则的组合:

%s example
%%

<example>foo  do_something();

bar            something_else();


 等价于

%x example
%%

<example>foo  do_something();

<INITIAL,example>bar   something_else();

没有'<INITIAL,example>'的限定,第二个例子中的模式'bar'将在开始状态'example'不被激活(也就是,不能匹配).假如我们只用'<example>'限定
'bar'.虽然.那么它只在'example'中被激活并不是INITIAL.在第一个例子中两种情况下都被激活,因为开始状态'example'是一个包含的('%s')开始状态.

也要注意特别的开始状态说明符'<*>'匹配所有的开始状态.因此,上面的例子也可被写成:

%x example
%%

<example>foo  do_something();

<*>bar    something_else();


默认规则(`ECHO' 任何不匹配的字符) 在开始条件中仍然激活.它等价于:

<*>.|\\n     ECHO;




//////////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////////


`BEGIN(0)' returns to the original state where only the rules with no startconditions are active.
This state can also be referred to as the start-condition "INITIAL",so `BEGIN(INITIAL)' is equivalent to `BEGIN(0)'.
 (The parentheses around the startcondition name are not required but are considered good style.)

`BEGIN(0)'返回到只有"没开始状态被激活的"规则的最初状态.这个状态也能作为开始状态"INITIAL",
因此'BEGIN(INITIAL)'等价于'BEGIN(0)'.(出于风格的考虑在开始状态名附近的括弧不需要)
BEGIN actions can also be given as indented code at the beginning of the rulessection. For example,
the following will cause the scanner to enter the "SPECIAL" startcondition whenever `yylex()' is called
and the global variable enter_special is true:
在规则段的开头,动作BEGIN也能被作为锯齿状代码给定.例如,
下面的情况当全局变量enter_special为true时,将造成扫描器进入"SPECIAL"开始条件而不论'yylex()'何时被调用:
        int enter_special;

%x SPECIAL
%%
        if ( enter_special )
            BEGIN(SPECIAL);

<SPECIAL>blahblahblah
...more rules follow...


 To illustrate the uses of startconditions, here is a scanner which provides two different interpretations of astring like "123.456".
 By default it will treat it as as threetokens, the integer "123", a dot ('.'), and the integer"456".
 But if the string is preceded earlier inthe line by the string "expect-floats" it will treat it as a singletoken,
 the floating-point number 123.456:
 举例说明开始状态的用途,这儿有一个扫描器提供了对形如"123.456"字符串两个不同的解释.
 默认情况扫描器将把字符串处理成3个记号,整形"123",一个圆点('.'),和整形"456".
 但假如字符串在某行被更早的字符串"expect-floats"领先, flex将把它处理成一个单一的记号,
 浮点数123.456:
%{
#include <math.h>
%}
%s expect

%%
expect-floats        BEGIN(expect);

<expect>[0-9]+"."[0-9]+     {
            printf( "found a float,= %f\n",
                    atof( yytext ));
            }
<expect>\n           {
            /* that's the end of theline, so
             * we need another"expect-number"
             * before we'll recognize anymore
             * numbers
             */
            BEGIN(INITIAL);
            }

[0-9]+      {

Version 2.5               December1994                        18

            printf( "found aninteger, = %d\n",
                    atoi( yytext ));
            }

"."         printf( "founda dot\n" );


 Here is a scanner which recognizes (anddiscards) C comments while maintaining a count of the current input line.
 这是个识别C注释的扫描器 维护当前输入行的数字,
%x comment
%%
        int line_num = 1;

"/*"         BEGIN(comment);

<comment>[^*\n]*        /* eatanything that's not a '*' */
<comment>"*"+[^*/\n]*  /* eat up '*'s not followed by '/'s */
<comment>\n            ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);


 This scanner goes to a bit of trouble tomatch as much text as possible with each rule.
 In general, when attempting to write ahigh-speed scanner try to match as much possible in each rule,
 as it's a big win.
 这个扫描器在每一个规则都要匹配那么多的文本时将有一些麻烦.一般来说,
 当试图写一个高速扫描器时每个规则应该匹配最多的可能(没有冗余,重复)


Note that start-conditions names are really integer values and can be stored assuch. Thus,
the above could be extended in the following fashion:
注意开始条件名实际上是一个整形值并且能象这样存储,因此,上面的例子可以按下面的时尚风格被扩展
%x comment foo
%%
        int line_num = 1;
        int comment_caller;

"/*"         {
             comment_caller =INITIAL;
             BEGIN(comment);
             }

...

<foo>"/*"    {
             comment_caller = foo;
             BEGIN(comment);
             }

<comment>[^*\n]*        /* eatanything that's not a '*' */
<comment>"*"+[^*/\n]*  /* eat up '*'s not followed by '/'s */
<comment>\n            ++line_num;
<comment>"*"+"/"        BEGIN(comment_caller);


 Furthermore, you can access the currentstart condition using the integer-valued YY_START macro.
 For example, the above assignments tocomment_caller could instead be written
 此外,你能用整形值YY_START宏存取当前的开始条件.举个例子,上面对comment_caller的赋值能代替为:
comment_caller = YY_START;


 Flex provides YYSTATE as an alias forYY_START (since that is what's used by AT&T lex).
 Flex提供YYSTATE作为YY_START的一个别名(因AT&T lex也是这样)

Note that start conditions do not have their own name-space; %s's and %x'sdeclare names in the same fashion as #define's.
注意开始条件没有自己的名空间;%s和%x的声明名称都是和#define's一样的时尚风格

Finally, here's an example of how to match C-style quoted strings usingexclusive start conditions,
including expanded escape sequences (but not including checking for a stringthat's too long):
最后,这儿是一个"怎样用独占的开始条件匹配C风格的引用字符串的"例子,
包括扩展的escapesequences(但没有包括对一个过长字符串的检测)
%x str

%%
        charstring_buf[MAX_STR_CONST];
        char *string_buf_ptr;

\"      string_buf_ptr = string_buf;BEGIN(str);

<str>\"        { /* sawclosing quote - all done */
        BEGIN(INITIAL);
        *string_buf_ptr = '\0';
        /* return string constant tokentype and
         * value to parser
         */
        }

<str>\n        {
        /* error - unterminated stringconstant */
        /* generate error message*/
        }

<str>\\[0-7]{1,3} {
        /* octal escape sequence */
        int result;

        (void) sscanf( yytext + 1, "%o",&result );

        if ( result > 0xff )
                /* error, constant isout-of-bounds */

        *string_buf_ptr++ = result;
        }

<str>\\[0-9]+ {
        /* generate error - bad escapesequence; something
         * like '\48' or '\0777777'
         */
        }

<str>\\n  *string_buf_ptr++ ='\n';
<str>\\t  *string_buf_ptr++ ='\t';
<str>\\r  *string_buf_ptr++ ='\r';
<str>\\b  *string_buf_ptr++ ='\b';
<str>\\f  *string_buf_ptr++ ='\f';

<str>\\(.|\n)  *string_buf_ptr++ =yytext[1];

<str>[^\\\n\"]+        {
        char *yptr = yytext;

        while ( *yptr )
                *string_buf_ptr++ =*yptr++;
        }


 Often, such as in some of the examplesabove, you wind up writing a whole bunch of rules all preceded by the samestart condition(s).
 Flex makes this a little easier andcleaner by introducing a notion of start condition scope.
 A start condition scope is begunwith:
 通常,诸如上面的一些例子,你经由同一个开始条件写出一串规则,
 Flex通过引进一个开始条件范围的想法使得这样做变的容易和清楚了一点
<SCs>{


 where SCs is a list of one or more startconditions. Inside the start condition scope,
 every rule automatically has the prefix`<SCs>' applied to it,
 until a `}' which matches the initial`{'. So, for example,
 一个或多个开始条件的列表.在开始条件范围的内部,每个规则自动的增加一个前缀<SCs>',
 直到一个匹配开始的'{'的'}'出现.那么,举个例子.
<ESC>{
    "\\n"   return '\n';
    "\\r"   return '\r';
    "\\f"   return '\f';
    "\\0"   return '\0';
}


 is equivalent to:
 等价于:

<ESC>"\\n"  return'\n';
<ESC>"\\r"  return'\r';
<ESC>"\\f"  return'\f';
<ESC>"\\0"  return'\0';


 Start condition scopes may benested.
 开始状态范围可以被嵌套.

Three routines are available for manipulating stacks of start conditions:
有三个例程可用于操作开始条件的栈:
`void yy_push_state(int new_state)'
pushes the current start condition onto the top of the start condition stackand switches to new_state as though
you had used `BEGIN new_state' (recall that start condition names are alsointegers).
把当前开始条件压进开始条件栈并转换到new_state就好像你已经使用过"BEGIN new_state"(回想开始条件名也是整形数).
void yy_pop_state()'
pops the top of the stack and switches to it via BEGIN.
出栈顶元素并经由BEGIN转换它.
`int yy_top_state()'
returns the top of the stack without altering the stack's contents.
不改变栈中内容返回栈顶元素.
The start condition stack grows dynamically and so has no built-in sizelimitation.
If memory is exhausted, program execution aborts.
开始条件栈动态增长并且没有内建的尺寸限制.假如内存耗尽,程序执行失败.

To use start condition stacks, your scanner must include a `%option stack'directive (see Options below).
要用开始条件栈,你的扫描器必须包含`%option stack'指令(看下面的选项)

Multiple input buffers多个输入缓冲区


 Some scanners (such as those whichsupport "include" files) require reading from several input streams.
 As flex scanners do a large amount ofbuffering, one cannot control where the next input will be read
 from by simply writing a YY_INPUT whichis sensitive to the scanning context.
 YY_INPUT is only called when the scannerreaches the end of its buffer,
 which may be a long time after scanninga statement such as an "include" which requires switching the inputsource.
 一些扫描器(象那些支持"include"文件的)需要从输入流读几遍.如flex扫描器处理大量的缓冲区,一遍不能
 "根据简单的写出一个上下文敏感的YY_INPUT"控制下一个输入
 应被读的位置.YY_INPUT只是在扫描器到达它的缓冲区末尾时被简单的被调用,
 这在扫描象一个"include"这样转换输入源的情况将花费很长一段时间

To negotiate these sorts of problems, flex provides a mechanism for creatingand switching between multiple input buffers.
An input buffer is created by using:
出于对这些种类程序的讨论,flex提供了一个机制为的是能在多个输入缓冲区间创建并转换.

创建一个输入缓冲区用:
YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
 which takes a FILE pointer and a sizeand creates a buffer associated with the given file and
 large enough to hold size characters(when in doubt, use YY_BUF_SIZE for the size).
 It returns a YY_BUFFER_STATE handle,which may then be passed to other routines (see below).
 The YY_BUFFER_STATE type is a pointer toan opaque struct yy_buffer_state structure,
 so you may safely initializeYY_BUFFER_STATE variables to `((YY_BUFFER_STATE) 0)' if you wish,
 and also refer to the opaque structurein order to correctly declare input buffers in source files other than that ofyour scanner.
 Note that the FILE pointer in the callto yy_create_buffer is only used as the value of yyin seen by YY_INPUT;
 if you redefine YY_INPUT so it no longeruses yyin, then you can safely pass a nil FILE pointer toyy_create_buffer.
 You select a particular buffer to scanfrom using:
带了一个FILE指针和一个缓冲区大小size并创建一个关联给定文件的,足够容纳size个字符的缓冲区(当有疑问时,用YY_BUF_SIZE).
它返回的一个YY_BUFFER_STATE句柄,被传到别的例程(看下面).YY_BUFFER_STATE类型是一个指向不透明的(可见的)结构yy_buffer_state,
因此只要你希望你就能安全的初始化YY_BUFFER_STATE的值为'((YY_BUFFER_STATE) 0)',
并且也暗示(refer to)该不透明结构是为了在源文件中正确地声明输入缓冲区而不是为了你的扫描器.
注意在yy_create_buffer调用中的文件指针只是用作"通过YY_INPUT看到的"yyin的值;
假如你重定义了YY_INPUT,扫描器将不再用yyin,这样你能安全地传送一个nil文件指针给yy_create_buffer.

void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
 switches the scanner's input buffer sosubsequent tokens will come from new_buffer.
 Note that `yy_switch_to_buffer()' may beused by `yywrap()' to
 set things up for continued scanning,instead of opening a new file and pointing yyin at it.
  Note also that switching input sourcesvia either `yy_switch_to_buffer()' or `yywrap()'
  does not change the startcondition.
转换扫描器的输入缓冲区以使并发的记号将来自new_buffer.
注意`yy_switch_to_buffer()'可以被`yywrap()'用来为连续扫描建立某些东西,代替打开新文件和指示在它里面的yyin.
也要注意经由`yy_switch_to_buffer()'或 `yywrap()'转换输入源不会改变开始条件.
 
void yy_delete_buffer( YY_BUFFER_STATE buffer )
is used to reclaim the storage associated with a buffer.
You can also clear the current contents of a buffer using:
被用来收回关联一个缓冲区的存储体.
你也能清除当前正在用的一个缓冲区的内容:
void yy_flush_buffer( YY_BUFFER_STATE buffer )


 This function discards the buffer's contents,so the next time the scanner attempts to match a token from the buffer,
 it will first fill the buffer anew usingYY_INPUT.
 这个函数丢弃缓冲区内容,以便下一次扫描器试图从该缓冲区匹配一个记号时, 它将首先用YY_INPUT填充该缓冲区.

`yy_new_buffer()' is an alias for `yy_create_buffer()', provided forcompatibility with the C++ use of new and delete for creating and destroyingdynamic objects.
`yy_new_buffer()'是`yy_create_buffer()'的别名,提供和C++的new和delete的兼容性以创建和回收动态对象.

Finally, the YY_CURRENT_BUFFER macro returns a YY_BUFFER_STATE handle to the currentbuffer.
最后,宏YY_CURRENT_BUFFER返回YY_BUFFER_STATE句柄给当前缓冲区.

Here is an example of using these features for writing a scanner which expandsinclude files (the `<<EOF>>' feature is discussed below):
这儿是一个用这些特色写出的一个扩展include文件的扫描器的例子( `<<EOF>>'特色在下面讨论)
/* the "incl" state is used for picking up the name
 * of an include file
 *//*"incl"条件被用来获得一个include文件的名称*/
%x incl

%{
#define MAX_INCLUDE_DEPTH 10
YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
int include_stack_ptr = 0;
%}

%%
include             BEGIN(incl);

[a-z]+              ECHO;
[^a-z\n]*\n?        ECHO;

<incl>[ \t]*      /* eat thewhitespace */
<incl>[^ \t\n]+   { /* got theinclude file name */
        if ( include_stack_ptr >=MAX_INCLUDE_DEPTH )
            {
            fprintf( stderr, "Includesnested too deeply" );
            exit( 1 );
            }

       include_stack[include_stack_ptr++] =
            YY_CURRENT_BUFFER;

        yyin = fopen( yytext,"r" );

        if ( ! yyin )
            error( ... );

        yy_switch_to_buffer(
            yy_create_buffer( yyin,YY_BUF_SIZE ) );

        BEGIN(INITIAL);
        }

<<EOF>> {
        if ( --include_stack_ptr < 0)
            {
            yyterminate();
            }

        else
            {
            yy_delete_buffer( YY_CURRENT_BUFFER);
            yy_switch_to_buffer(
                include_stack[include_stack_ptr] );
            }
        }


 Three routines are available for settingup input buffers for scanning in-memory strings instead of files.
 All of them create a new input bufferfor scanning the string, and return a corresponding YY_BUFFER_STATE handle(which you should delete with `yy_delete_buffer()' when done with it).
 They also switch to the new buffer using`yy_switch_to_buffer()', so the next call to `yylex()' will start scanning thestring.
 三个可用于为内存字符串而不是文件建立输入缓冲区的例程.他们都为扫描字符串建立新的输入缓冲区,而且都返回相应的YY_BUFFER_STATE 句柄(当处理句柄时你可用`yy_delete_buffer()'删除它的那个)
 他们也能用`yy_switch_to_buffer()'转换到新的缓冲区`,因此下一个对'yylex的调用将开始扫描该字符串.
yy_scan_string(const char *str)'
scans a NUL-terminated string.扫描一个非终结字符串.
`yy_scan_bytes(const char *bytes, int len)'
scans len bytes (including possibly NUL's) starting at location bytes.从bytes指定的位置开始扫描len个字节

Note that both of these functions create and scan a copy of the string or bytes.
(This may be desirable, since `yylex()' modifies the contents of the buffer itis scanning.)
You can avoid the copy by using:
注意这些函数中的两个建立和扫描"字符串或字节的"一个拷贝.(这个正是想要的,因为'yylex()'修改它扫描的缓冲区的内容)
`yy_scan_buffer(char *base, yy_size_t size)'
which scans in place the buffer starting at base, consisting of size bytes,
the last two bytes of which must be YY_END_OF_BUFFER_CHAR (ASCII NUL).
These last two bytes are not scanned; thus, scanning consists of `base[0]'through `base[size-2]',
inclusive. If you fail to set up base in this manner (i.e., forget the finaltwo YY_END_OF_BUFFER_CHAR bytes),
then `yy_scan_buffer()' returns a nil pointer instead of creating a new inputbuffer.
The type yy_size_t is an integral type to which you can cast an integerexpression
reflecting the size of the buffer.
扫描以base开头,由size个字节组成的缓冲区.最后两个字节必须是YY_END_OF_BUFFER_CHAR(ASCII标准的NUL).
这最后两个字节不会被扫描;因此,扫描范围是`base[0]'到`base[size-2]',包含`base[0]'和`base[size-2]'.
如果你按这个样子建立base失败(也就是,忘记YY_END_OF_BUFFER_CHAR两个字节),
那么`yy_scan_buffer()'返回一个nil指针代替没创建成的一个新输入缓冲区.
类型yy_size_t是你能cast成一个整形表达的一个整数类型
End-of-file rules


 The special rule"<<EOF>>" indicates actions which are to be taken when anend-of-file is encountered
 and yywrap() returns non-zero (i.e.,indicates no further files to process).
 The action must finish by doing one offour things:
 特别的"<<EOF>>"规则显示当遇到一个end-of-file时采取的动作并且yywrap()返回非0值
 (也就是,显示没有更多的文件要处理).
 该动作必须结束做以下4件事中的一件:
assigning yyin to a new input file (in previous versions of flex, after doingthe assignment you had to call the special action YY_NEW_FILE; this is nolonger necessary);
指派yyin到一新的输入文件(在以前的flex的版本,在分配完后你不得不调用专门的动作YY_NEW_FILE;这个不再必要)
executing a return statement;
执行一个返回表达;
executing the special `yyterminate()' action;
执行特别的yyterminate()'动作;
or, switching to a new buffer using `yy_switch_to_buffer()' as shown in theexample above.
或者,象上面的例子显示的一样.用yy_switch_to_buffer()转换到一个新的缓冲区.
<<EOF>> rules may not be used with other patterns; they may only bequalified with a list of start conditions. If an unqualified<<EOF>> rule is given,
it applies to all start conditions which do not already have<<EOF>> actions.
To specify an <<EOF>> rule for only the initial start condition,use
<<EOF>>规则不可以用别的模式;他们只能有资格带有开始条件的列表.假如给定一个无资格的<<EOF>>规则,它应用到所有开始条件
<INITIAL><<EOF>>


 These rules are useful for catchingthings like unclosed comments. An example:
 这些规则有利于捕捉类似"打开的注释"的东西.一个例子:
%x quote
%%

...other rules for dealing with quotes...

<quote><<EOF>>  {
         error( "unterminatedquote" );
         yyterminate();
         }
<<EOF>>  {
         if ( *++filelist )
             yyin = fopen( *filelist,"r" );
         else
            yyterminate();
         }


Miscellaneous macros
各种各样的宏

 The macro YY_USER_ACTION can be definedto provide an action which is always executed prior to the matched rule'saction.
 For example, it could be #define'd tocall a routine to convert yytext to lower-case.
 When YY_USER_ACTION is invoked, thevariable yy_act gives the number of the matched rule (rules are numberedstarting with 1).
 Suppose you want to profile how ofteneach of your rules is matched. The following would do the trick:
 宏YY_USER_ACTION能被定义以提供一个*总是被执行的动作*在匹配规则的动作执行前.例如,它能#define'd以调用一个例程转换yytext到字母的小写形式.
 当YY_USER_ACTION被呼叫时,变量yy_act给出匹配规则的数字(规则被从1开始数字化).
 假设你想要描述你的规则被匹配的频率.接下来的宏定义能达到目的:
#define YY_USER_ACTION ++ctr[yy_act]


 where ctr is an array to hold the countsfor the different rules.
 Note that the macro YY_NUM_RULES givesthe total number of rules (including the default rule, even if you use `-s', soa correct declaration for ctr is:
 ctr是一个为了不同规则保存counts的数组.注意宏YY_NUM_RULES给出了规则的总数(包括默认规则,即使你用来'-s',因此一个正确的ctr声明是:
int ctr[YY_NUM_RULES];


 The macro YY_USER_INIT may be defined toprovide an action which is always executed before the first scan (and beforethe scanner's internal initializations are done).
 For example, it could be used to call aroutine to read in a data table or open a logging file.
宏YY_USER_INIT可以被定义以提供一个*总是被执行的动作*在第一次扫描前(并且在扫描器内部初始化之前).
例如,可以定义宏来调用一个例程以读进数据表或打开一个日志文件.

The macro `yy_set_interactive(is_interactive)' can be used to control whetherthe current buffer is considered interactive. An interactive buffer isprocessed more slowly,
but must be used when the scanner's input source is indeed interactive to avoidproblems due to waiting to fill buffers (see the discussion of the `-I' flagbelow).
A non-zero value in the macro invocation marks the buffer as interactive, azero value as non-interactive.
Note that use of this macro overrides `%option always-interactive' or `%optionnever-interactive' (see Options below).
`yy_set_interactive()' must be invoked prior to beginning to scan the bufferthat is (or is not) to be considered interactive.
宏`yy_set_interactive(is_interactive)'用来控制当前的缓冲区是否被考虑是交互式.一个交互式的缓冲区处理的更慢,
但是当扫描器的输入源的确是交互式的时候避免由于等待填充缓冲区而出现的问题必须要考虑(看下面对'-I'标志的讨论)
宏内的一个非0值祈祷标记缓冲区为交互式,一个0值作为非交互式.
注意该宏的这个用途忽视'%option always-interactive'或'%option never-interactive'(看下面的选项).
'yy_set_interactive()'必须被提前调用以开始扫描被考虑为交互式的缓冲区

The macro `yy_set_bol(at_bol)' can be used to control whether the currentbuffer's scanning context
for the next token match is done as though at the beginning of a line.
A non-zero macro argument makes rules anchored with
宏'yy_set_bol(at_bol)'能被用来控制"为了完成下一个记号匹配"的当前缓冲区扫描的上下文好像在一行开头那样.宏的一个非0参数

The macro `YY_AT_BOL()' returns true if the next token scanned from the currentbuffer will have '^' rules active, false otherwise.
宏返回"YY_AT_BOL()'返回true假如从当前的缓冲区扫描得到的下一个记号将激活'^'规则,否则为false.

In the generated scanner, the actions are all gathered in one large switchstatement and separated using YY_BREAK,
which may be redefined. By default, it is simply a "break", toseparate each rule's action from the following rule's. Redefining YY_BREAKallows, for example, C++ users to #define YY_BREAK to do nothing (while beingvery careful that every rule ends with a "break" or a"return"!)
to avoid suffering from unreachable statement warnings where because a rule'saction ends with "return",
the YY_BREAK is inaccessible.
一般的扫描器内,所有的动作聚积在一个大的switch语句并且用YY_BREAK分开,YY_BREAK可以被重定义,默认情形,它是一个简单的"break",简单的把每个动作跟接下来的动作分离开来.允许重定义YY_BREAK,例如,C++用户#define YY_BREAK 做不了任何事(当变的非常小心)
以避免遭受"不可达的"声明警告,一个以"return"结束的规则的动作.YY_BREAK是难于达成的.

Values available to the user
用户可以用的变量

 This section summarizes the variousvalues available to the user in the rule actions.
 这一段概述用户可以在规则段使用的各种变量.

`char *yytext' holds the text of the current token.
 It may be modified but not lengthened(you cannot append characters to the end).
 If the special directive `%array'appears in the first section of the scanner description,
 then yytext is instead declared `charyytext[YYLMAX]',
 where YYLMAX is a macro definition thatyou can redefine in the first section
 if you don't like the default value(generally 8KB). Using `%array' results in somewhat slower scanners,
 but the value of yytext becomes immuneto calls to `input()' and `unput()',
 which potentially destroy its value whenyytext is a character pointer.
 The opposite of `%array' is `%pointer',which is the default.
 You cannot use `%array' when generatingC++ scanner classes (the `-+' flag).
 char *yytext保留当前记号的文本.
 它可以被修改但不能加长(你不能在末尾添加字符).
 假如特别的指令'%array'出现在扫描器说明的第一段,那么yytext被替代为声明"char yytext[YYLMAX]',
 在第一段定义宏YYLMAX的地方你能从新定义它.
 假如你不喜欢默认值(通常是8KB),用'%array"导致多少会慢一些的扫描器,但yytext的值对调用'input()'和'unput()'免役,
 当yytext是一个字符指针时,调用会潜在的破坏它的值.相对于'%array'是'%pointer',也即默认情况.
 在产生C++扫描器类时('-+'标志)你不能用'%array'.

`int yyleng' holds the length of the current token.
int yyleng保留当前记号的长度.

`FILE *yyin' is the file which by default flex reads from.
It may be redefined but doing so only makes sense before scanning begins orafter an EOF has been encountered.
Changing it in the midst of scanning will have unexpected results since flexbuffers its input;
use `yyrestart()' instead. Once scanning terminates because an end-of-file hasbeen seen,
you can assign yyin at the new input file and then call the scanner again tocontinue scanning.
'FILE *yyin'是flex默认读入的文件的指针.它可以被重定义但这样做只是在扫描前或遇到一个EOF后说的通
在扫描期间改变它将会有意外的结果因flex缓冲了它的输入;用'yyrestart()'代替.一旦因为一个end-of-file而扫描结束后,
你能将一个新的输入文件赋给yyin并且然后又调用扫描器以连续扫描.

`void yyrestart( FILE *new_file )' may be called to point yyin at the new inputfile.
The switch-over to the new file is immediate (any previously buffered-up inputis lost).
Note that calling `yyrestart()' with yyin as an argument thus throws away
the current input buffer and continues scanning the same input file.
`void yyrestart( FILE *new_file )' 可以被调用以便在新的输入文件中指向yyin.
 这种到新文件的转换直接的(任何以前的buffered-up输入都丢失了)
 注意带有一个参数yyin的yyrestart()调用因此而扔掉了当前的输入缓冲区并继续扫描这同一个输入文件.

`FILE *yyout' is the file to which `ECHO' actions are done. It can bereassigned by the user.
'FILE *yyout','ECHO'动作完成后的结果输出文件.它可以由用户重新赋值.

YY_CURRENT_BUFFER returns a YY_BUFFER_STATE handle to the current buffer.
YY_CURRENT_BUFFER返回一个YY_BUFFER_STATE句柄给当前缓冲区.

YY_START returns an integer value corresponding to the current start condition.
You can subsequently use this value with BEGIN to return to that startcondition.
YY_START返回一个相应于当前开始条件的整形值.随后你可以用BEGIN带着这个值返回到开始条件.

Interfacing with yacc与yacc的接口


 One of the main uses of flex is as acompanion to the yacc parser-generator.
 yacc parsers expect to call a routinenamed `yylex()' to find the next input token.
 The routine is supposed to return thetype of the next token as well as putting any associated value in the globalyylval.
 To use flex with yacc, one specifies the`-d' option to yacc to instruct it to generate the file `y.tab.h' containingdefinitions of all the `%tokens' appearing in the yacc input.
 This file is then included in the flexscanner. For example, if one of the tokens is "TOK_NUMBER", part ofthe scanner might look like:
 flex的一个主要用途是作为解析器yacc的一个组件.解析器希望调用一个叫'yylex()'的例程以找到下一个输入记号.
 该例程假定返回下一个记号的类型而且把一些关联值放进全局的yylval.
 结合使用yacc和flex,yacc的一个指定的'-d'选项指导yacc产生文件'y.tab.h'包含所有出现在yacc输入中的'%tokens'的定义
 这个文件然后被flex扫描器包括(include)进去.例如,假如记号是"TOK_NUMBER",扫描器部分看起来可能象:
%{
#include "y.tab.h"
%}

%%

[0-9]+        yylval = atoi( yytext );return TOK_NUMBER;


Options
选项

flex has the following options: flex有下面的选项:

`-b'
Generate backing-up information to `lex.backup'.
This is a list of scanner states which require backing up and the inputcharacters on which they do so.
By adding rules one can remove backing-up states. If all backing-up states areeliminated and `-Cf' or `-CF' is used,
the generated scanner will run faster (see the `-p' flag).
Only users who wish to squeeze every last cycle out of their scanners needworry about this option. (See the section on Performance Considerationsbelow.)
产生备份信息到'lex.backup'.这些信息是一个需要备份的扫描器状态的列表,该列表是输入字符的根据
通过加规则one能移除备份状态.假如所有的备份状态被移除并且用了'-Cf'或'-CF',产生的扫描器将会运行的快一点(看'-p'标志).
只有那些希望在他们的扫描器之外去squeeze每一个最后的循环的用户需要担心这个选项.(看下面的Performance considerations段)
`-c'
is a do-nothing, deprecated option included for POSIX compliance.
是一个不做什么事,为了对POSIX标准的承诺而包含进来的反对选项.
`-d'
makes the generated scanner run in debug mode. Whenever a pattern is recognizedand
the global yy_flex_debug is non-zero (which is the default), the scanner willwrite to stderr a line of the form:
--accepting rule at line 53 ("the matched text")
使产生的扫描器运行在调试模式.无论何时一个模式被认出并且全局的yy_flex_debug是非0值(即默认情况),扫描器将以这种格式写出一行错误
 The line number refers to the locationof the rule in the file defining the scanner (i.e., the file that was fed toflex).
 Messages are also generated when thescanner backs up, accepts the default rule,
 reaches the end of its input buffer (orencounters a NUL; at this point,
 the two look the same as far as thescanner's concerned), or reaches an end-of-file.
行数位置,在定义扫描器的文件的规则的位置(文件也就是提供给flex的文件).
消息也被产生当扫描器当扫描器backs up,接受默认规则,到达它的输入缓冲区的末尾(或遇到一个NUL;
此时,两者显得一样和扫描器的关系一样远),或到达一个end_of_file.
`-f'
specifies fast scanner. No table compression is done and stdio is bypassed. Theresult is large but fast.
This option is equivalent to `-Cfr' (see below).
指定的快速扫描器.不进行表压缩且省略标准I/O.结果巨大却速度快.这个选项等价于'-Cfr'(看下面).
`-h'
generates a "help" summary of flex's options to stdout and thenexits. `-?' and `--help' are synonyms for `-h'.
产生一个flex的选项的概要"help"到标准输出然后退出.`'-?'和'--help'是'-h'的同义字.
-i'
instructs flex to generate a case-insensitive scanner.
The case of letters given in the flex input patterns will be ignored,
and tokens in the input will be matched regardless of case.
The matched text given in yytext will have the preserved case (i.e., it willnot be folded).
指示flex产生一个扫描器对case迟钝的扫描器.在flex的输入为"给定的字符是case的"模式将被忽视,
并且在输入中的记号将被匹配而不顾及case.
在yytext给定的匹配的文本将保留case(i.e.,文本不会被折叠).
`-l'
turns on maximum compatibility with the original AT&T leximplementation.
Note that this does not mean full compatibility. Use of this option costs aconsiderable amount of performance,
and it cannot be used with the `-+, -f, -F, -Cf', or `-CF' options. For detailson the compatibilities it provides,
see the section "Incompatibilities With Lex And POSIX" below.
This option also results in the name YY_FLEX_LEX_COMPAT being #define'd in thegenerated scanner.
开启与最初的AT&T lex最大兼容性的实现.注意这不意味着完全兼容.这个选项会有相当大的效率花费,并且它不能跟'-+,-f,-F,-Cf','-CF'选项一起用.细节请看它提供的compathibilities,
看下面的"IncompatibilitiesWith Lex And POSIX"段.
这个选项也导致名称YY_FLEX_LEX_COMPAT要被#define'd进到产生的扫描器.
`-n'
is another do-nothing, deprecated option included only for POSIXcompliance.
又是一个不做什么事,为了对POSIX标准的承诺而包含进来的反对选项.
`-p'
generates a performance report to stderr. The report consists of commentsregarding features of the flex input file
which will cause a serious loss of performance in the resulting scanner.
If you give the flag twice, you will also get comments regarding features thatlead to minor performance losses.
Note that the use of REJECT, `%option yylineno' and variable trailingcontext
(see the Deficiencies / Bugs section below) entails a substantial performancepenalty;
use of `yymore()', the `^' operator, and the `-I' flag entail minor performancepenalties.
产生一个效率报告到stderr.该报告由关于flex输入文件"将对结果扫描器造成一个严重的效率损失的"的特色的注释组成.
假如给出该标志两次,你将也能得到"最小效率损失的"的报告.注意REJECT的用法,'%optionyylineno'和不定的紧随上下文(看下面的the Deficiencies / Bugs)蒙受实质上的性能报应.
用'yymore()','^'操作符号,'-I'标志蒙受较小的效率报应.
`-s'
causes the default rule (that unmatched scanner input is echoed to stdout) tobe suppressed.
If the scanner encounters input that does not match any of its rules, it abortswith an error.
This option is useful for finding holes in a scanner's rule set.
导致默认规则(即不匹配的扫描器输入回送到标准输出)被抑止.假如扫描器遇到不匹配它的任何规则的输入,它以一个错误表示失败.
该选项对找到扫描器规则集的漏洞很有用.
`-t'
instructs flex to write the scanner it generates to standard output instead of`lex.yy.c'.
指示flex写它产生的扫描器到标准输出而代替'lex.yy.c'.
`-v'
specifies that flex should write to stderr a summary of statistics regardingthe scanner it generates.
Most of the statistics are meaningless to the casual flex user, but the firstline identifies the version of flex
(same as reported by `-V'), and the next line the flags used when generatingthe scanner,
including those that are on by default.
指示flex写"关于它产生的扫描器的统计概要"到stderr.
大多数的统计对偶然用户来说是无意义的,但第一行flex版本的定义(同样可由'-V'报告),和当产生扫描器时下一行用的,只是默认情况包含这些东西.
`-w'
suppresses warning messages.
禁止警告消息.
`-B'
instructs flex to generate a batch scanner, the opposite of interactivescanners generated by `-I' (see below).
In general, you use `-B' when you are certain that your scanner will never beused interactively,
and you want to squeeze a little more performance out of it.
If your goal is instead to squeeze out a lot more performance,
you should be using the `-Cf' or `-CF' options (discussed below), which turn on`-B' automatically anyway.
指示flex产生一组扫描器,相对的经由'-I'产生交互的扫描器(看下面).一般说来,你用'-B'当你确定将从不用交互,
并且你想在它之外squeeze一点效率.假如你的目标是更大的效率,你应该用'-Cf'或'-CF'选项(讨论看下面),无论如何自动打开'-B'.
`-F'
specifies that the fast scanner table representation should be used (and stdiobypassed).
This representation is about as fast as the full table representation`(-f)',
and for some sets of patterns will be considerably smaller (and for others,larger).
In general, if the pattern set contains both "keywords" and acatch-all, "identifier" rule, such as in the set:
"case"    returnTOK_CASE;
"switch"  returnTOK_SWITCH;
...
"default" return TOK_DEFAULT;
[a-z]+    return TOK_ID;

 then you're better off using the fulltable representation.
 If only the "identifier" ruleis present and you then use a hash table or some such to detect thekeywords,
 you're better off using `-F'. Thisoption is equivalent to `-CFr' (see below). It cannot be used with `-+'.
指定快速表表示法应被启用(并且忽略标准I/O).这个表示法是关于尽可能快的完全表表示法'(-f)',
并且一些模式集合将变的非常小(并且别的,变的大).一般的,假如模式集合包含"keywords"和一个catch-all,"identifier"规则,诸如:
"case"    returnTOK_CASE;
"switch"  returnTOK_SWITCH;
...
"default" return TOK_DEFAULT;
[a-z]+    return TOK_ID;
那么你最好关闭完整表表示法.
假如只有"identifier"规则被提出并且你用哈希表或一些类似的来探测关键字,你最好关闭'-F',这个选项等价于'-CF'(看下面).它不能与'-+'一起用.

`-I'
instructs flex to generate an interactive scanner.
An interactive scanner is one that only looks ahead to decide what token hasbeen matched if it absolutely must.
It turns out that always looking one extra character ahead,
even if the scanner has already seen enough text to disambiguate the currenttoken,
is a bit faster than only looking ahead when necessary.
But scanners that always look ahead give dreadful interactive performance; forexample,
when a user types a newline, it is not recognized as a newline token until theyenter another token,
which often means typing in another whole line.
Flex scanners default to interactive unless you use the `-Cf' or `-CF'table-compression options (see below).
That's because if you're looking for high-performance you should be using oneof these options, so if you didn't,
flex assumes you'd rather trade off a bit of run-time performance for intuitiveinteractive behavior.
Note also that you cannot use `-I' in conjunction with `-Cf' or `-CF'.Thus,
this option is not really needed; it is on by default for all those cases inwhich it is allowed.
You can force a scanner to not be interactive by using `-B' (see above).
指示flex产生一个交互式的扫描器.
一个交互式的扫描器是一个只能向前看以决定什么记号已经被匹配假如它绝对必要.
结果是:总是向前看一个扩展的字符,甚至扫描器已经看到足够的文本以消除当前记号的二义性,比只在必要时才向前看快那么一点点.
但扫描器总是向前看会给出可怕的交互效率;举个例子,当用户输入一个newline,它不被识别直到用户输入另一个记号,
这经常意外着输入在另一个整行内.
Flex扫描器默认产生交互性除非你用'-Cf'或'-CF'表压缩选项(看下面).
那是因为假如你追求高效率你就应该用这些选项中的一个,假如你不用,flex假想你宁愿换取一点运行时效率而不是直观的交互式行为.
也要注意你不能用'-I'关联'-Cf'或'-CF',因此,这个-I选项实际上不需要;它只是在所有被允许的那些情况之上.
你可以用'-B'强迫一个扫描器注意不要成为交互式(看上面).
`-L'
instructs flex not to generate `#line' directives. Without this option,
flex peppers the generated scanner with #line directives so error messages inthe actions will be correctly located ith
respect to either the original flex input file (if the errors are due to codein the input file),or `lex.yy.c'
(if the errors are flex's fault -- you should report these sorts of errors tothe email address given below).
指示flex不要产生'#line'指令,没有这个选项,flex辛辣的产生扫描器带有#line指令因此在动作中的错误消息将被准确的定位
最初的flex输入文件(假如导致错误的是输入文件代码),或是'lex.yy.c'两者之一(假如是flex的缺陷--你应该报告这些错误到下面给出的email地址).
`-T'
makes flex run in trace mode.
It will generate a lot of messages to stderr concerning the form of the inputand
the resultant non-deterministic and deterministic finite automata.
This option is mostly for use in maintaining flex.
使flex运行在追踪模式.
这将产生大量的关于输入和作为结果而产生的确定与不确定的有穷自动机
这个选项大多数被用在维护flex上.
`-V'
prints the version number to stdout and exits. `--version' is a synonym for`-V'.
打印版本号到标准输出并退出.'--version'是同义字.
`-7'
instructs flex to generate a 7-bit scanner, i.e., one which can only recognized7-bit characters in its input.
The advantage of using `-7' is that the scanner's tables can be up to half thesize of those generated using the `-8' option (see below).
The disadvantage is that such scanners often hang or crash if their inputcontains an 8-bit character. Note, however, that unless you generate yourscanner using the `-Cf' or `-CF' table compression options,
 use of `-7' will save only a smallamount of table space, and make your scanner considerably less portable. Flex'sdefault behavior is to generate an 8-bit scanner unless you use the `-Cf' or`-CF',
 in which case flex defaults togenerating 7-bit scanners unless your site was always configured to generate8-bit scanners (as will often be the case with non-USA sites).
 You can tell whether flex generated a7-bit or an 8-bit scanner by inspecting the flag summary in the `-v'output
 as described above. Note that if you use`-Cfe' or `-CFe' (those table compression options,
 but also using equivalence classes asdiscussed see below), flex still defaults to generating an 8-bit scanner,
 since usually with these compressionoptions full 8-bit tables are not much more expensive than 7-bit tables.
指示flex产生一个7-位的扫描器,也即是,一个在其输入中只能认出7-位字符的扫描器.用'-7'的好处是扫描器的表能达到那些用'-8'选项表的一半的大小(看下面).
坏处是这样的扫描器经常挂起,崩溃假如他们的输入包含8-位字符.注意,不管怎样,除非你的扫描器用'-Cf'或'-CF'表压缩选项,
用'-7'将只节省少量的表空间,并且使你的扫描器移植性很差.Flex的默认行为是产生8-位的扫描器除非你用'-Cf'或'-CF',
在用了的情况flex默认产生7-位的扫描器除非你的位置已经配置了产生8-位扫描器(在美国之外的国家经常是这种情况).
你可以通过检查在'-v'输出的标志摘要来告诉flex是否产生7或8位扫描器,象上面描述的那样.注意假如你用了'-Cfe'或'-CFe'(那些表压缩选项,
但也用了象下面讨论的那样的等价类),flex仍然默认产生8位扫描器,
因为通常带有这些压缩选项的完整的8-位表并不比7-位表大多少.
`-8'
instructs flex to generate an 8-bit scanner, i.e., one which can recognize8-bit characters.
This flag is only needed for scanners generated using `-Cf' or `-CF',
as otherwise flex defaults to generating an 8-bit scanner anyway.
See the discussion of `-7' above for flex's default behavior and the tradeoffsbetween 7-bit and 8-bit scanners.
指示flex产生8-位扫描器,也就是,一个能识别8-位字符的扫描器.
这个标志只在产生扫描器用了'-Cf'或'-CF时需要.
看上面的'-7'讨论,flex的默认行为和7和8之间的交换.
`-+'
specifies that you want flex to generate a C++ scanner class.
See the section on Generating C++ Scanners below for details.
表示你想要产生一个C++扫描器类,细节请看下面的Generating C++ Scanners段.
`-C[aefFmr]'
controls the degree of table compression and, more generally, trade-offsbetween small scanners and fast scanners.
控制表压缩的程度且,更一般的,在小的扫描器与快的扫描器之间trade-offs(交换).
 `-Ca' ("align") instructs flexto trade off larger tables in the generated scanner for faster performancebecause
 the elements of the tables are betteraligned for memory access and computation. On some RISC architectures,
 fetching and manipulating long-words ismore efficient than with smaller-sized units such as shortwords.
 This option can double the size of thetables used by your scanner.
 '-Ca'("align"排列)指示flex在一般的扫描器中换掉大表为了更快的效率因为表的元素最好是均衡的以利于内存存取和计算.在一些RISC体系,
 获取并操作长字比短字更有效.这个选项能使表的尺寸加倍by你的扫描器.
 `-Ce' directs flex to constructequivalence classes, i.e.,
 sets of characters which have identicallexical properties (for example,
 if the only appearance of digits in theflex input is in the character class
 "[0-9]" then the digits '0','1', ..., '9' will all be put in the same equivalence class).
 Equivalence classes usually givedramatic reductions in the final table/object file sizes (typically a factor of2-5)
  and are pretty cheap performance-wise(one array look-up per character scanned).
  指示flex去构造等价类,也就是,有相同词属性的字符集(举个例子,
  假如在flex输入中的唯一的数字是在字符类"[0-9]"中那么数字'0','1',...,'9'所有的将放进同一个等价类)
  等价类通常戏剧性的减少在最终的table/object文件尺寸(典型的2-5中的一个因子)
  并且相当便宜聪明的效率(一个数组查阅每个扫描的字符).
   `-Cf' specifies that the full scannertables should be generated - flex should not
   compress the tables by takingadvantages of similar transition functions for different states.
   '-Cf'指示产生完整的扫描器表-flex不应该通过"类似的转换函数取得利益"为不同的状态压缩表
   `-CF' specifies that the alternatefast scanner representation (described above under the `-F' flag) should beused.
   This option cannot be used with`-+'.
   '-CF'指示应该被用轮流快速扫描器表示法(描述见上面的'-F'标志).
   这个选项不能跟'-+'一起用.
   `-Cm' directs flex to constructmeta-equivalence classes,
   which are sets of equivalence classes(or characters, if equivalence classes are not being used)
   that are commonly used together.Meta-equivalence classes are often a big win when using compressedtables,
   but they have a moderate performanceimpact (one or two "if" tests and one array look-up per characterscanned).
   '-Cm'指示flex构造 变换-等价类,变换-等价类是通常一起用的等价类的集合(或字符,假如等价类没被用).变换-等价类经常是一个大收益当用压缩表时,
   但是他们有一个中等的效率影响(每个字符需要:1个或2个"if"检测和一个数组循环).
    `-Cr' causes the generated scanner tobypass use of the standard I/O library (stdio) for input.
    Instead of calling `fread()' or`getc()', the scanner will use the `read()' system call,
    resulting in a performance gain whichvaries from system to system,
    but in general is probably negligibleunless you are also using `-Cf' or `-CF'.
    Using `-Cr' can cause strangebehavior if, for example,
    you read from yyin using stdio priorto calling the scanner
    (because the scanner will misswhatever text your previous reads left in the stdio input buffer).
    `-Cr' has no effect if you defineYY_INPUT (see The Generated Scanner above).
        '-Cr'造成产生的扫描器在输入时不用标准I/O库.代替的是调用'fread()'或'getc()',扫描器将用系统调用'read()',
        一个效率上的收获是在系统之间改变,但是一般是大概是可以忽略的除非你也用了'-Cf'或'-CF'.
        用'-Cr'能造成奇怪的行为if,例如,你用stdio从yyin读入早于调用扫描器(因为扫描器将错过你前面读入的在stdio的输入缓冲的文本)
    A lone `-C' specifies that thescanner tables should be compressed
    but neither equivalence classes normeta-equivalence classes should be used.
    一个单独的'-C'指示扫描器表应被压缩但既不能用等价类也不能用转换等价类.
    The options `-Cf' or `-CF' and `-Cm'do not make sense together - there is no opportunity for meta-equivalenceclasses
    选项'-Cf'或'-CF'和选项'-Cm'在一起不合理 -不存在机会对于转换等价类
    if the table is not being compressed.Otherwise the options may be freely mixed, and are cumulative.
    假如表没被压缩.否则选项可以被随意地混合,和累积(重复).
    The default setting is `-Cem', whichspecifies that flex should generate equivalence classes and meta-equivalenceclasses.
    默认的设置是'-Cem',指示flex应产生等价类和变换等价类
    This setting provides the highestdegree of table compression.
    这个设置提供了最大程度的表压缩.
    You can trade off faster-executingscanners at the cost of larger tables with the following generally beingtrue:
        你能交替使用快-执行扫描器在下面以大表为代价下列各项为真的情况下:
slowest & smallest
      -Cem
      -Cm
      -Ce
      -C
      -C{f,F}e
      -C{f,F}
      -C{f,F}a
fastest & largest

 Note that scanners with the smallesttables are usually generated and compiled the quickest,
 so during development you will usuallywant to use the default, maximal compression.
注意带有最小表的扫描器一般产生和编译是最快的,
因此在开发期间你将想要用默认的,最大压缩。
 `-Cfe' is often a good compromisebetween speed and size for production scanners.
对结果扫描器而言‘-Cfe’是一个好的折中在速度与尺寸之间
`-ooutput'
directs flex to write the scanner to the file `out-' put instead of `lex.yy.c'.
If you combine `-o' with the `-t' option, then the scanner is written to stdoutbut its `#line' directives (see the `-L' option above) refer to the fileoutput.
指示flex写扫描器到文件‘out-’代替到‘lex.yy.c’。
假如你联合‘-o’和’-t’选项,那么扫描器被写到stdout但它的‘#line’指令(看上面‘-L’选项)refer to the file output.
`-Pprefix'
changes the default `yy' prefix used by flex for all globally-visible variableand function names to instead be prefix. For example, `-Pfoo' changes the nameof yytext to `footext'. It also changes the name of the default output filefrom `lex.yy.c' to `lex.foo.c'. Here are all of the names affected:
改变默认的‘yy’前缀,为所有的全局变量和函数名以。举个例子,‘-Pfoo’改变yytext的名字为’footext‘。它也改变默认的输出文件名’lex.yy.c‘为‘lex.foo.c’这儿是所有受到影响的名字:
yy_create_buffer
yy_delete_buffer
yy_flex_debug
yy_init_buffer
yy_flush_buffer
yy_load_buffer_state
yy_switch_to_buffer
yyin
yyleng
yylex
yylineno
yyout
yyrestart
yytext
yywrap

 (If you are using a C++ scanner, thenonly yywrap and yyFlexLexer are affected.) Within your scanner itself, you canstill refer to the global variables and functions using either version of theirname; but externally, they have the modified name. This option lets you easilylink together multiple flex programs into the same executable. Note, though,that using this option also renames `yywrap()', so you now must either provideyour own (appropriately-named) version of the routine for your scanner, or use`%option noyywrap', as linking with `-lfl' no longer provides one for you bydefault.
(假如你用的是一个C++扫描器,那么只有yywrap和yyFlexLexer受影响)在你的扫描器内部,你仍能用他们名字版本提到全局变量和函数;但扩充的是,他们有了修正的名字。这个选项让你很容易的把多个flex程序联接为同一个可执行文件。注意,然而用这个选项的**也重命名为‘yywrap()’,因此你现在必须为你的扫描器提供你自己的(适当命名的)例程,或用‘%option noyywrap’,同样用‘-lf1’选项连接不再为你提供默认的例程。
`-Sskeleton_file'
overrides the default skeleton file from which flex constructs its scanners.You'll never need this option unless you are doing flex maintenance ordevelopment.
不顾默认的基干文件from which flexconstructs its scanners,你将不会需要该选项除非你维护
或开发flex。
flex also provides a mechanism for controlling options within the scannerspecification itself, rather than from the flex command-line. This is done byincluding `%option' directives in the first section of the scannerspecification. You can specify multiple options with a single `%option'directive, and multiple directives in the first section of your flex inputfile. Most options are given simply as names, optionally preceded by the word"no" (with no intervening whitespace) to negate their meaning. Anumber are equivalent to flex flags or their negation:
flex也提供一个机制来控制在扫描器内部说明它自己的选项,比从flex命令行说明好。这通过在扫描器说明的第一段中包含‘%option’指令,你能指定多个选项和一个单独的’%option’指令,还有你的flex输入文件的第一段中的多个指令。大多数选项简单地给定象名字。随意的单词“no”领先(没有空格的介入)以否定他们的意思。一个数字等价于flex的标志或它们的否定意思:
7bit            -7 option
8bit            -8 option
align           -Ca option
backup          -b option
batch           -B option
c++             -+ option

caseful or
case-sensitive  opposite of -i(default)

case-insensitive or
caseless        -i option

debug           -d option
default         opposite of -soption
ecs             -Ce option
fast            -F option
full            -f option
interactive     -I option
lex-compat      -l option
meta-ecs        -Cm option
perf-report     -p option
read            -Cr option
stdout          -t option
verbose         -v option
warn            opposite of -woption
                (use "%optionnowarn" for -w)

array           equivalent to"%array"
pointer         equivalent to"%pointer" (default)


 Some `%option's' provide featuresotherwise not available:
一些‘%option’s’以别的不可用的方式提供特色。
`always-interactive'
instructs flex to generate a scanner which always considers its input"interactive". Normally, on each new input file the scanner calls`isatty()' in an attempt to determine whether the scanner's input source isinteractive and thus should be read a character at a time. When this option isused, however, then no such call is made.
指示flex产生一个其输入总是交互式的扫描器。正常情况,对于每个新的输入文件扫描器调用’isatty()’试图确定扫描器的输入源是否是交互式的并且然后同时读入一个字符。当用了这个选项,无论如何,调用’isatty()’的表达将不会被产生
`main'
directs flex to provide a default `main()' program for the scanner, whichsimply calls `yylex()'. This option implies noyywrap (see below).
指示flex为扫描器提供默认的‘main()’程序,扫描器简单的调用‘yylex()’,这个选项暗示
noyywrap(看下面)。
`never-interactive'
instructs flex to generate a scanner which never considers its input"interactive" (again, no call made to `isatty())'. This is theopposite of `always-' interactive.
指导flex产生扫描器而从不考虑扫描器的输入是“交互式”的(又,没有产生对`isatty()的调用).这是与‘always’交互式相对的。
`stack'
enables the use of start condition stacks (see Start Conditions above).
使能开始条件栈(看上面Start Conditions)。
`stdinit'
if unset (i.e., `%option nostdinit') initializes yyin and yyout to nil FILEpointers, instead of stdin and stdout.
假如unset(也就是,%option nostdinit‘)初始化yyin和yyout到nit FILE指针,而不是stdin和stdout。
`yylineno'
directs flex to generate a scanner that maintains the number of the currentline read from its input in the global variable yylineno. This option isimplied by `%option lex-compat'.
指示flex产生一个扫描器保留“从它的输入读进到全局变量yylineno表示的当前行数”这个选项
`yywrap'
if unset (i.e., `%option noyywrap'), makes the scanner not call `yywrap()' uponan end-of-file, but simply assume that there are no more files to scan (untilthe user points yyin at a new file and calls `yylex()' again).
假如有效(),
flex scans your rule actions to determine whether you use the REJECT or`yymore()' features. The reject and yymore options are available to overrideits decision as to whether you use the options, either by setting them (e.g.,`%option reject') to indicate the feature is indeed used, or unsetting them toindicate it actually is not used (e.g., `%option noyymore').
Flex扫描你的规则动作以确定你是否用了REJECT或‘yymore()’特色。Reject和yymore选项可用于忽略它的“关于你是否用了该选项的”判定。

Three options take string-delimited values, offset with '=':
三个选项获得 字符串-定界 值,偏移用‘=’:
%option outfile="ABC"


 is equivalent to `-oABC', and

%option prefix="XYZ"


 is equivalent to `-PXYZ'.


Finally,

%option yyclass="foo"


 only applies when generating a C++scanner (`-+' option). It informs flex that you have derived `foo' as asubclass of yyFlexLexer so flex will place your actions in the member function`foo::yylex()' instead of `yyFlexLexer::yylex()'. It also generates a`yyFlexLexer::yylex()' member function that emits a run-time error (by invoking`yyFlexLexer::LexerError()') if called. See Generating C++ Scanners, below, foradditional information.
只在产生C++(‘-+’选项)扫描器时应用.它通知flex你已经derived‘foo’作为yyFlexLexer的子集因此flex把你的动作放进成员函数‘foo::yylex()’代替‘yyFlexLexer::yylex()’;
假如调用它也产生一个发出一个运行时错误(通过调用’yyFlexer::LexerError()')的`yyFlexLexer::yylex()'成员函数。

A number of options are available for lint purists who want to suppress theappearance of unneeded routines in the generated scanner. Each of thefollowing, if unset, results in the corresponding routine not appearing in thegenerated scanner:
许多选项为lint purists可用,lint purists是想要产生的扫描器中禁止不需要的例程出现的人。
下面的每一个,假如有效,导致相应的例程不会出现在产生的扫描器中:
input, unput
yy_push_state, yy_pop_state, yy_top_state
yy_scan_buffer, yy_scan_bytes, yy_scan_string


 (though `yy_push_state()' and friendswon't appear anyway unless you use `%option stack').
(虽然‘yy_push_state()’和友元不会出现除非你用`%option stack')。

Performance considerations性能考虑


 The main design goal of flex is that itgenerate high-performance scanners. It has been optimized for dealing well withlarge sets of rules. Aside from the effects on scanner speed of the tablecompression `-C' options outlined above, there are a number of options/actionswhich degrade performance. These are, from most expensive to least:
Flex的主要的设计目标是它产生高性能的扫描器。为了处理好大量规则集flex已被优化。除了上面概述的扫描器的表压缩速度’-C’选项的影响外,存在许多选项/动作降低性能。这些是,从最花费到最小:
REJECT
%option yylineno
arbitrary trailing context

pattern sets that require backing up
%array
%option interactive
%option always-interactive

'^' beginning-of-line operator
yymore()


 with the first three all being quiteexpensive and the last two being quite cheap. Note also that `unput()' isimplemented as a routine call that potentially does quite a bit of work, while`yyless()' is a quite-cheap macro; so if just putting back some excess text youscanned, use `yyless()'.
开始的3个都是非常昂贵且最后两个非常便宜。也要注意`unput()'作为一个(潜在的要做相当多的小工作的)例程调用实现,虽然‘yyless()‘是非常便宜的宏;如果只是putting back你扫描的一些额外的文本,那还是应该用’yyless()‘。

REJECT should be avoided at all costs when performance is important. It is aparticularly expensive option.
当性能很重要时应避免使用REJECT,它是一个显著的花费选项。

Getting rid of backing up is messy and often may be an enormous amount of workfor a complicated scanner. In principal, one begins by using the `-b' flag togenerate a `lex.backup' file. For example, on the input
去除备份是不可取的杂乱的并且经常因为一个复杂的扫描器而变得工程浩大,首要之一,用‘-b’标志开头以产生一个`lex.backup'文件。例如。输入
%%
foo        return TOK_KEYWORD;
foobar     return TOK_KEYWORD;


 the file looks like:

State #6 is non-accepting -
 associated rule line numbers:
       2       3
 out-transitions: [ o ]
 jam-transitions: EOF [ \001-n  p-\177 ]

State #8 is non-accepting -
 associated rule line numbers:
       3
 out-transitions: [ a ]
 jam-transitions: EOF [ \001-`  b-\177 ]

State #9 is non-accepting -
 associated rule line numbers:
       3
 out-transitions: [ r ]
 jam-transitions: EOF [ \001-q  s-\177 ]

Compressed tables always back up.


 The first few lines tell us that there'sa scanner state in which it can make a
 transition on an 'o' but not on anyother character, and that in that state the
 currently scanned text does not matchany rule. The state occurs when trying to
 match the rules found at lines 2 and 3in the input file. If the scanner is in
 that state and then reads somethingother than an 'o', it will have to back up
 to find a rule which is matched. With abit of head-scratching one can see that
 this must be the state it's in when ithas seen "fo". When this has happened,
 if anything other than another 'o' isseen, the scanner will have to back up to
 simply match the 'f' (by the defaultrule).
开始的几行告诉我们:存在一个扫描器状态,它能居于1个'o'之上但不是所有别的的字符之上
并且在那个状态内的当前扫描的文本不匹配任何字符.当试图匹配在输入文件的行2和3中找到的规则时状态产生.
假如扫描器是在那个状态内并且然后读入一些别的而不是1个'o',它将不得不备份以找到一个匹配的规则.
With a bit of head-scratching one can see that this must be the state it's inwhen it has seen "fo".
当这些发生后,假如不是另一个'o'的任何任何东西被看到,扫描器将不得不备份以简单地匹配'f'(根据默认规则).
The comment regarding State #8 indicates there's a problem when "foob"has been scanned.
Indeed, on any character other than an 'a', the scanner will have to back up toaccept "foo".
Similarly, the comment for State #9 concerns when "fooba" has beenscanned and an 'r' does not follow.
关于State #8的注释指示当"foob"被扫描到时存在一个问题.的确,基于任何不是1个'a'之上的字符将不得不备份以接受"foo".
类似地,关于State #9的注释关心当"fooba"被扫描到时1个'r'不允许.

The final comment reminds us that there's no point going to all the troubleof
removing backing up from the rules unless we're using `-Cf' or `-CF',
since there's no performance gain doing so with compressed scanners.
最后的注释提醒我们从规则段去除备份不会存在陷入麻烦的地方除非我们用了`-Cf' 或 `-CF',因带压缩表那样做没有效率.

The way to remove the backing up is to add "error" rules:

%%
foo         return TOK_KEYWORD;
foobar      return TOK_KEYWORD;

fooba       |
foob        |
fo          {
            /* false alarm, not really akeyword */
            return TOK_ID;
            }


 Eliminating backing up among a list ofkeywords can also be done using a "catch-all" rule:
 也能用一个"ctach-all"除去在关键字列表中的备份:
%%
foo         return TOK_KEYWORD;
foobar      return TOK_KEYWORD;

[a-z]+      return TOK_ID;


 This is usually the best solution whenappropriate.
这常常是最好的最合适的解决方法.

Backing up  tend to cascade. With acomplicated set of rules it's not uncommon to
get hundreds of messages. If one can decipher them, though, it often onlytakes
a dozen or so rules to eliminate the backing up (though it's easy to make amistake
and have an error rule accidentally match a valid token.
A possible future flex feature will be to automatically add rules to eliminatebacking up).
备份消息倾向于瀑布.带有规则的一个复杂集合,它通常会得到几百个消息.假如一个能解释他们,
然而,经常只是采取1打或那样的规则以剔除备份()
flex将来可能的特色是自动加规则以剔除
It's important to keep in mind that you gain the benefits of eliminatingbacking up
only if you eliminate every instance of backing up. Leaving just one means yougain nothing.
重要的是要牢记你只有在去除每一个备份情况才能从中获取好处,留下一个意味着你什么也得不到

Variable trailing context (where both the leading and trailing parts do nothave a fixed length)
entails almost the same performance loss as REJECT (i.e., substantial). So whenpossible a rule like:
不定的紧随上下文( 在领导者和紧随者两个部分都没有确定长度的地方)蒙受和REJECT同样的效率损失(也就是,本质上).
因此当一个可能的规则象:
%%
mouse|rat/(cat|dog)   run();


 is better written:

%%
mouse/cat|dog         run();
rat/cat|dog           run();


 or as

%%
mouse|rat/cat         run();
mouse|rat/dog         run();


 Note that here the special '|' actiondoes not provide any savings, and can even
 make things worse (see Deficiencies /Bugs below).
注意这儿特别的'|'动作不提供保留,并且甚至使事情更恶劣(看下面Deficiencies /Bugs)

Another area where the user can increase a scanner's performance (and onethat's
easier to implement) arises from the fact that the longer the tokens matched,the
faster the scanner will run. This is because with long tokens the processing ofmost
input characters takes place in the (short) inner scanning loop, and does notoften
have to go through the additional work of setting up the scanning environment(e.g., yytext) for the action.
Recall the scanner for C comments:
另一个用户会降低扫描器性能(并且是很容易出现)的地方发生在匹配长一点的记号的情况,更快的扫描器将运行.
这是因为由于长记号大多数输入字符处理将发生在内部扫描循环中,并且不是经常被迫经历额外的工作为动作安装扫描环境(例如,yytext).
回忆C注释的扫描器:
%x comment
%%
        int line_num = 1;

"/*"        BEGIN(comment);

<comment>[^*\n]*
<comment>"*"+[^*/\n]*
<comment>\n            ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);


 This could be sped up by writing itas:
这个可以被写出下面这样而加速:
%x comment
%%
        int line_num = 1;

"/*"        BEGIN(comment);

<comment>[^*\n]*
<comment>[^*\n]*\n     ++line_num;
<comment>"*"+[^*/\n]*
<comment>"*"+[^*/\n]*\n ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);


 Now instead of each newline requiringthe processing of another action, recognizing
 the newlines is "distributed"over the other rules to keep the matched text as long as possible.
 Note that adding rules does not slowdown the scanner!
 The speed of the scanner is independentof the number of rules or
 (modulo the considerations given at thebeginning of this section)
 how complicated the rules are withregard to operators such as '*' and '|'.
现在代替每一新行需要另一个动作的处理,认出新行是"distributed分布"在别的规则以保存匹配文本尽可能长.
注意加规则不会减慢扫描器的速度!
扫描器的速度倚赖规则的数目或其操作复杂性如'*'和'|'.

A final example in speeding up a scanner: suppose you want to scan through afile containing
identifiers and keywords, one per line and with no other extraneouscharacters,
and recognize all the keywords. A natural first approach is:
最后的例子是加速一个扫描器:假定你想要扫描完一个包括定义和关键字的文件,每一行没有别的无关的字符,
并且识别出所有的关键字.一个自然的入门是:
%%
asm      |
auto     |
break    |
... etc ...
volatile |
while    /* it's a keyword */

.|\n     /* it's not a keyword */


 To eliminate the back-tracking,introduce a catch-all rule:
 去除back-tracking,引入一个catch-all规则:
%%
asm      |
auto     |
break    |
... etc ...
volatile |
while    /* it's a keyword */

[a-z]+   |
.|\n     /* it's not a keyword */


 Now, if it's guaranteed that there's exactlyone word per line, then we can reduce the total number
 of matches by a half by merging in therecognition of newlines with that of the other tokens:
 现在.如果每行恰好有一个单词能被保证,那么我们通过"带有别的记号的新行的识别中合并一半"可以减少整个匹配的数字
%%
asm\n    |
auto\n   |
break\n  |
... etc ...
volatile\n |
while\n  /* it's a keyword */

[a-z]+\n |
.|\n     /* it's not a keyword */


 One has to be careful here, as we havenow reintroduced backing up into the scanner.
 In particular, while we know that therewill never be any characters in the input stream
 other than letters or newlines, flexcan't figure this out, and it will plan for possibly
 needing to back up when it has scanned atoken like "auto" and then the next character is something
 other than a newline or a letter.Previously it would then just match the "auto" rule and be done,
 but now it has no "auto" rule,only a "auto\n" rule. To eliminate the possibility of backing up,
 we could either duplicate all rules butwithout final newlines, or,
 since we never expect to encounter suchan input and therefore don't how it's classified,
 we can introduce one more catch-allrule, this one which doesn't include a newline:
这儿有一个要小心的是,当我们再导入备份到扫描器时.特别的,在我们知道在输入流将不再有任何字符而不是字母或新行时,
flex不能推断出这个情况,并且它将计划"为了可能的需要"做备份当它扫描象"auto"的记号并且下一个字符
不是1个新行或一个字母的某个东西时.前面它正好匹配"auto"规则且已完成,但是现在它没有"auto"规则,
只有一个"auto\n"规则.要排除备份的可能性,我们可以拷贝所有的末尾不带有新行的规则,或者,
既然我们从不希望遇到那样的输入也不关心它被怎样分类,
我们可以引入一个以上的catch-all规则,下面的一个不包含一个新行:
%%
asm\n    |
auto\n   |
break\n  |
... etc ...
volatile\n |
while\n  /* it's a keyword */

[a-z]+\n |
[a-z]+   |
.|\n     /* it's not a keyword */


 Compiled with `-Cf', this is about asfast as one can get a flex scanner to go for this particular problem.
带'-Cf'编译,对于这个特别的问题这是使得一个flex扫描器运行尽可能快的选项.
A final note: flex is slow when matching NUL's, particularly when a tokencontains multiple NUL's.
It's best to write rules which match short amounts of text if it's anticipatedthat the text will often
include NUL's.

一个最后的注解:flex在匹配NUL时很慢,尤其是当一个记号包含多个NUL时,
最好是写匹配短数量的规则假如预测文本将经常含有NUL.

另一个决定性的注解是关于效率的:正如How the Input isMatched节提到的,动态地改变yytext以适应处理巨大的记号的需要也是
很慢的因为它不久又要从开头重新扫描巨大的记号.因此如果效率是重要的,你应该尝试匹配"大"数量的文本而不是"巨大"数量,
在两者之间的捷径是用8K字符/记号.
Another final note regarding performance: as mentioned above in the section Howthe Input is Matched,
dynamically resizing yytext to accommodate huge tokens is a slow processbecause it presently requires
that the (huge) token be rescanned from the beginning. Thus if performance isvital,
you should attempt to match "large" quantities of text but not"huge" quantities,
where the cutoff between the two is at about 8K characters/token.


Generating C++ scanners 产生C++扫描器


flex provides two different ways to generate scanners for use with C++.

The first way is to simply compile a scanner generated by flex using a C++compiler instead of a C compiler.
You should not encounter any compilations errors
(please report any you find to the email address given in the Author sectionbelow).
You can then use C++ code in your rule actions instead of C code.
Note that the default input source for your scanner remains yyin,
and default echoing is still done to yyout. Both of these remain `FILE *'variables and not C++ streams.
flex提供了两种不同的方式来产生带有C++语法的扫描器.
第一个方法是简单地用一个C++编译器代替C编译器编译一个flex产生的扫描器.
你不应该遇到任何编辑错误
(请报告任何你发现的错误到下面作者段给出的email地址).
然后你可以在你的规则段中用C++代码代替C代码.
注意对你的扫描器默认的输入源仍然是yyin,并且默认回音仍是yyout.yyin和yyout两者仍然是`FILE *'变量而不是C++流.

You can also use flex to generate a C++ scanner class, using the `-+' option,
(or, equivalently, `%option c++'), which is automatically specified if the nameof
the flex executable ends in a `+', such as flex++. When using this option,
flex defaults to generating the scanner to the file `lex.yy.cc' instead of`lex.yy.c'.
The generated scanner includes the header file `FlexLexer.h',
which defines the interface to two C++ classes.
你也可以用flex产生一个C++扫描器类,用'-+'选项,
(或者,相当于,`%option c++'),

The first class, FlexLexer, provides an abstract base class defining thegeneral scanner class interface.
It provides the following member functions:
第一个类,FlexLexer,提供了一个"定义了一般扫描器类接口的"抽象基类.
该类提供了下面的成员函数:

`const char* YYText()'
returns the text of the most recently matched token, the equivalent ofyytext.
返回最近匹配记号的文本,yytext的等价物.

`int YYLeng()'
returns the length of the most recently matched token, the equivalent ofyyleng.
返回最近匹配记号的长度,yyleng的等价物.

`int lineno() const'
returns the current input line number (see `%option yylineno'), or 1 if`%option yylineno' was not used.
返回当前输入行数(看`%option yylineno'),或1假如`%option yylineno'没被用

`void set_debug( int flag )'
sets the debugging flag for the scanner, equivalent to assigning toyy_flex_debug
(see the Options section above).
Note that you must build the scanner using `%option debug' to include debugginginformation in it.
为扫描器设置调试标志,等价于赋值给yy_flex_debug(看上面的the Options section).
注意你必须用`%option debug'建造扫描器以包含进调试信息在扫描器内.

`int debug() const'
returns the current setting of the debugging flag.
返回当前调试标志的设置.
Also provided are member functions equivalent to `yy_switch_to_buffer(),yy_create_buffer()'
(though the first argument is an `istream*' object pointer and not a `FILE*',`yy_flush_buffer()',
`yy_delete_buffer()', and `yyrestart()' (again, the first argument is a`istream*' object pointer).
也提供了一个等价于`yy_switch_to_buffer(),yy_create_buffer()'的成员函数
(不过第一个参数是一个`istream*'对象指针而不是一个`FILE*',`yy_flush_buffer()',
`yy_delete_buffer()'和`yyrestart()'(再次声明,第一个参数是一个`istream*'对象指针)


The second class defined in `FlexLexer.h' is yyFlexLexer, which is derived fromFlexLexer.
It defines the following additional member functions:
第二个类是yyFlexLexer定义在`FlexLexer.h'内,派生自FlexLexer.
它定义了下面的附加的成员函数:

`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
constructs a yyFlexLexer object using the given streams for input and output.
If not specified, the streams default to cin and cout, respectively.
用给定的输入输出流构造一个yyFlexLexer对象,假如没有指定,默认流分别是cin和cout.

`virtual int yylex()'
performs the same role is `yylex()' does for ordinary flex scanners: it scansthe input stream,
consuming tokens, until a rule's action returns a value. If you derive asubclass S from yyFlexLexer
and want to access the member functions and variables of S inside `yylex()',
then you need to use `%option yyclass="S"' to inform flex that youwill be using that subclass
instead of yyFlexLexer. In this case, rather than generating`yyFlexLexer::yylex()',
flex generates `S::yylex()' (and also generates a dummy `yyFlexLexer::yylex()'
that calls `yyFlexLexer::LexerError()' if called).
扮演和'yylex()'同样的角色(为普通的flex扫描器):它扫描输入流,消化记号,直到一个规则的动作返回一个值.
假如你从yyFlexLexer派生了一个子类S并且想要在`yylex()'内存取S的成员函数和变量,
那么你需要用`%optionyyclass="S"'以告知flex你要用子类代替yyFlexLexer.
在这种情况,flex就产生`S::yylex()'而不是产生`yyFlexLexer::yylex()'
(并且也产生一个傀儡`yyFlexLexer::yylex()'调用`yyFlexLexer::LexerError()').

`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
reassigns yyin to new_in (if non-nil) and yyout to new_out (ditto),
deleting the previous input buffer if yyin is reassigned.
重新赋给yyin值new_in(假如非空)和赋给yyout值new_out(ditto),假如yyin被重新赋值删除以前的输入缓冲.

`int yylex( istream* new_in = 0, ostream* new_out = 0 )'
first switches the input streams via `switch_streams( new_in, new_out )' and
then returns the value of `yylex()'.
首先经由`switch_streams(new_in, new_out )'转换输入流而后返回yylex()的值.

In addition, yyFlexLexer defines the following protected virtual functionswhich you can redefine in derived classes to tailor the scanner:
此外,yyFlexLexer定义了下面的虚拟函数(你能在派生类中重定义以适应扫描器):

`virtual int LexerInput( char* buf, int max_size )'
reads up to `max_size' characters into buf and returns the number of charactersread.
To indicate end-of-input, return 0 characters. Note that"interactive" scanners
(see the `-B' and `-I' flags) define the macro YY_INTERACTIVE.
If you redefine LexerInput() and need to take different actions depending onwhether or not
the scanner might be scanning an interactive input source, you can test for thepresence
of this name via `#ifdef'.
读入最大到'max_size'个的字符到buf并且返回读入字符的个数.返回0个字符表示end-of-input(输入结束).
注意"interactive"扫描器(看`-B'和`-I'标志)定义了宏YY_INTERACTIVE.
假如你重定义LexerInput()且需要倚赖扫描器是否可能扫描一个交互式输入源而采取不同的动作,
你可以经由`#ifdef'检测这个名字的存在.

`virtual void LexerOutput( const char* buf, int size )'
writes out size characters from the buffer buf, which, whileNUL-terminated,
may also contain "internal" NUL's if the scanner's rules can matchtext with NUL's in them.
从缓冲区buf写出size个字符,which, while NUL-terminated,或许包含"internal内在的"NUL,
如果扫描器的规则能匹配带有NUL的文本的话.

`virtual void LexerError( const char* msg )'
reports a fatal error message. The default version of this function writes themessage to the stream cerr and exits.
报告致命的消息.该函数的默认版本写该消息到流cerr并且终止.
Note that a yyFlexLexer object contains its entire scanning state. Thus you canuse such objects to create reentrant scanners. You can instantiate multipleinstances of the same yyFlexLexer class, and you can also combine multiple C++scanner classes together in the same program using the `-P' option discussedabove. Finally, note that the `%array' feature is not available to C++ scannerclasses; you must use `%pointer' (the default).
注意一个yyFlexLexer对象包含它的整个扫描状态.因此你能用这样的对象创造可重入扫描器.

Here is an example of a simple C++ scanner: 这儿是一个简单的C++扫描器的例子

    // An example of using the flex C++scanner class.一个用flexC++扫描器类的例子

%{
int mylineno = 0;
%}

string  \"[^\n"]+\"

ws      [ \t]+

alpha   [A-Za-z]
dig     [0-9]
name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
num1   [-+]?{dig}+\.?([eE][-+]?{dig}+)?
num2   [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
number  {num1}|{num2}

%%

{ws}    /* skip blanks and tabs */

"/*"    {
        int c;

        while((c = yyinput()) != 0)
            {
            if(c == '\n')
                ++mylineno;

            else if(c == '*')
                {
                if((c = yyinput()) =='/')
                    break;
                else
                    unput(c);
                }
            }
        }

{number}  cout << "number" << YYText() << '\n';

\n        mylineno++;

{name}    cout << "name "<< YYText() << '\n';

{string}  cout << "string" << YYText() << '\n';

%%

Version 2.5               December1994                        44

int main( int /* argc */, char** /* argv */ )
    {
    FlexLexer* lexer = newyyFlexLexer;
    while(lexer->yylex() != 0)
        ;
    return 0;
    }


 If you want to create multiple(different) lexer classes,
 you use the `-P' flag (or the `prefix='option) to rename each yyFlexLexer to some other xxFlexLexer.
 You then can include`<FlexLexer.h>' in your other sources once per lexer class,
 first renaming yyFlexLexer as follows:
 假如你想要创建多个(不同的)词汇类,你可用'-P'标志(或'prefix='选项)来重命名一些其它xxFlexLexer的每一个yyFlexLexer.
 然后你就能在你的别的源文件对每一词汇类只包含一次<FLlexer.h>',
 象下面一样第一次重命名yyFlexLexer:
#undef yyFlexLexer
#define yyFlexLexer xxFlexLexer
#include <FlexLexer.h>

#undef yyFlexLexer
#define yyFlexLexer zzFlexLexer
#include <FlexLexer.h>


 if, for example, you used `%optionprefix="xx"' for one of your scanners and `%optionprefix="zz"' for the other.
 举个例子,假如,你用`%optionprefix="xx"'处理你的一个扫描器而用`%option prefix="zz"'处理另一个.

IMPORTANT: the present form of the scanning class is experimental and maychange considerably between major releases.
重要事项:实验用的扫描类的提出形式或许会被考虑在主要发行版之间改变.

Incompatibilities with lex and POSIX
lex和POSIX标准的不兼容性

flex is a rewrite of the AT&T Unix lex tool (the two implementations do notshare any code, though),
with some extensions and incompatibilities, both of which are of concern tothose who wish to write scanners
acceptable to either implementation. Flex is fully compliant with the POSIX lexspecification,
except that when using `%pointer' (the default), a call to `unput()' destroysthe contents of yytext,
which is counter to the POSIX specification.
flex是一个AT&T Unix的lex工具的重写品(虽然,两者的实现没有任何共有代码),带有一些扩展和不兼容性,扩展和不兼容性是
那些希望写出任一lex实现都接受代码的人所关心的.Flex完全顺从POSIX的lex规定,
除了---当用"%pointer'(默认),一个对'unput()'的调用将破坏yytext的内容---这一情况.

In this section we discuss all of the known areas of incompatibility betweenflex, AT&T lex,
and the POSIX specification.
这一节我们讨论所有知道的在flex,AT&T lex,和POSIX规定间的不兼容性.

flex's `-l' option turns on maximum compatibility with the original AT&Tlex implementation,
at the cost of a major loss in the generated scanner's performance.
We note below which incompatibilities can be overcome using the `-l' option.
flex的'-l'选项开启和最初AT&T lex实现的最大兼容,代价是在产生的扫描器效率的主要损失.
我们注意下面的不兼容可以用'-l'选项克服.

flex is fully compatible with lex with the following exceptions:
flex和拥有接下来的例外的lex完全兼容:

The undocumented lex scanner internal variable yylineno is not supported unless`-l' or `%option yylineno' is used.
yylineno should be maintained on a per-buffer basis, rather than a per-scanner(single global variable) basis.
yylineno is not part of the POSIX specification.
未公开的lex扫描器内部变量不支持除非用了`-l'或`%option yylineno'.
yylineno应该在一个per-buffer基础上被维护,而不是在一个per-scanner(单一的全局变量)的基础上.
yylineno不是POSIX标准的一部分.

The `input()' routine is not redefinable, though it may be called to readcharacters following whatever
has been matched by a rule. If `input()' encounters an end-of-file the normal`yywrap()' processing is done.
A "real" end-of-file is returned by `input()' as EOF. Input isinstead controlled by defining the YY_INPUT macro.
The flex restriction that `input()' cannot be redefined is in accordance withthe POSIX specification,
which simply does not specify any way of controlling the scanner's input otherthan by making an initial
assignment to yyin.
例程'input()'不能被重定义,虽然它可以被调用以读入已匹配一个规则的跟随在后的字符.
如果'input()'遇到一个end-of-file且正常的'yywrap()'处理完成后,
那么一个"真正的"end-of-file被'input()'作为EOF返回.输入的更换通过定义YY_INPUT宏来控制.
flex限定'input()'不能被重定义以兼容POSIX规定,
达到这个目的,是通过"简单的不指定任何方式控制扫描器输入"而不是通过"制造一个对yyin的初始化赋值".

The `unput()' routine is not redefinable. This restriction is in accordancewith POSIX.
flex scanners are not as reentrant as lex scanners. In particular, if you havean interactive scanner
and an interrupt handler which long-jumps out of the scanner, and the scanneris subsequently called again,
you may get the following message:
fatal flex scanner internal error--end of buffer missed
例程'unput()'不能被重定义.这个限定和POSIX兼容.
flex扫描器不是lex扫描器那样的重进入的.特别的,假如你已经有了一个交互的扫描器和
一个long-jumps到扫描器外的中断处理,并且如果该扫描器随后又被调用,那么你会得到下面的信息:
fatal flex scanner internal error--end of buffer missed

 To reenter the scanner, first use 要重新进入扫描器,首先用
yyrestart( yyin );

 Note that this call will throw away anybuffered input; usually this isn't a problem with an interactive scanner.
 Also note that flex C++ scanner classesare reentrant, so if using C++ is an option for you,
 you should use them instead.  See "Generating C++ Scanners" abovefor details.
`output()' is not supported. Output from the `ECHO' macro is done to thefile-pointer yyout (default stdout).
`output()' is not part of the POSIX specification.
lex does not support exclusive start conditions (%x), though they are in thePOSIX specification.
When definitions are expanded, flex encloses them in parentheses. With lex, thefollowing:
注意yyrestart(yyin)调用将抛弃任何输入缓冲;通常对带有交互的扫描器不是个问题.
也要注意flex的C++扫描器类是可重入的,因此假如用C++是你的一个选择,你就可以用它们代替.
细节看上面的"GeneratingC++ Scanners".
'output()'不被支持.输出从'ECHO'宏执行完到文件指针yyout(默认输出).'output()'不是POSIX标准的部分.
lex不支持独占的开始条件(%x),虽然他们是POSIX标准的一部分.
当定义被扩展时,flex在括号中封装他们.对lex,下面的:
NAME    [A-Z][A-Z0-9]*
%%
foo{NAME}?      printf( "Foundit\n" );
%%

 will not match the string"foo" because when the macro is expanded the rule is
 equivalent to"foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' isassociated with "[A-Z0-9]*".
 With flex, the rule will be expanded to"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
 Note that if the definition begins with`^' or ends with `$' then it is not expanded with parentheses,
 to allow these operators to appear indefinitions without losing their special meanings. But the `<s>, /',
 and `<<EOF>>' operatorscannot be used in a flex definition. Using `-l' results in the lex behavior of
 no parentheses around the definition.The POSIX specification is that the definition be enclosed in parentheses.
Some implementations of lex allow a rule's action to begin on a separate line,if the rule's pattern
has trailing whitespace:
将不会匹配字符串"foo"因为扩展后的宏等价于"foo[A-Z][A-Z0-9]*?"并且象'?'的优先级是关联"[A-Z0-9]*"的.
对flex,该规则将被扩展为"foo([A-Z][A-Z0-9]*)?"并且字符串"foo"将被匹配.注意假如定义以'^'开头或以'$'结尾那么
它不会带括号扩展,以允许这些operators出现在定义而又不失去它们专门的意义.但是`<s>, /',和`<<EOF>>'不能用在flex的
定义中.用'-l'导致lex的行为在定义附近没有括号.POSIX标准是定义被附在括号内.
一些lex的实现允许一个规则的动作在一个单独行中开始,假如规则模式有紧随空白时:
%%
foo|bar<space here>
  { foobar_action(); }

flex does not support this feature.
The lex `%r' (generate a Ratfor scanner) option is not supported. It is notpart of the POSIX specification.
After a call to `unput()', yytext is undefined until the next token is matched,unless the scanner was built
using `%array'. This is not the case with lex or the POSIX specification. The`-l' option does away with this incompatibility.
The precedence of the `{}' (numeric range) operator is different. lexinterprets "abc{1,3}" as "match one,
two, or three occurrences of 'abc'", whereas flex interprets it as"match 'ab' followed by one, two,
or three occurrences of 'c'". The latter is in agreement with the POSIXspecification.
The precedence of the `^' operator is different. lex interprets"^foo|bar" as "match either 'foo'
at the beginning of a line, or 'bar' anywhere", whereas flex interprets itas "match either 'foo' or 'bar'
if they come at the beginning of a line". The latter is in agreement withthe POSIX specification.
The special table-size declarations such as `%a' supported by lex are notrequired by flex scanners;
flex ignores them.
The name FLEX_SCANNER is #define'd so scanners may be written for use witheither flex or lex.
Scanners also include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSIONindicating which version of flex
generated the scanner (for example, for the 2.5 release, these defines would be2 and 5 respectively).
flex不支持这个特色.
lex的'%r'(产生一个Ratfor扫描器)选项flex不支持.它不是POSIX标准的组成部分.
flex在一个对'unput()'的调用后,直到下一个记号被匹配时才能明确yytext,除非扫描器用'%array'建造.
这个情况不是lex和POSIX的规定.'-l'选项可除掉这个不兼容性.

'{}'(数字范围)operator的优先级是不同的.lex把"abc{1,3}"解释为"匹配1,2,或3个'abc'的出现",
然而flex把它解释为"匹配后面跟着1,2,或3个'c'的'ab'的出现".后者是符合POSIX标准的.

'^'operator的优先级是不同的.lex把"^foo|bar"解释为"出现在一行开头的'foo',或是出现在任何地方的'bar'",
然而flex把它解释为"出现在一行开头的'foo',或是出现在一行开头的'bar'".后者是符合POSIX标准的.

lex中象'%a'这样的专门的表尺寸声明,flex不需要;flex就忽略了他们.

名字FLEX_SCANNER被#define,因此扫描器可以用flex或lex任意一个书写.
扫描器也包含YY_FLEX_MAJOR_VERSION和YY_FLEX_MINOR_VERSION指示是那一个flex版本产
生该扫描器(例如,对flex2.5版本,两者分别是2和5).

The following flex features are not included in lex or the POSIX specification:
下面的flex的特色是没被lex和POSIX包含在内的:
C++ scanners
%option
start condition scopes
start condition stacks
interactive/non-interactive scanners
yy_scan_string() and friends
yyterminate()
yy_set_interactive()
yy_set_bol()
YY_AT_BOL()
<<EOF>>
<*>
YY_DECL
YY_START
YY_USER_ACTION
YY_USER_INIT
#line directives
%{}'s around actions
multiple actions on a line


 plus almost all of the flex flags. Thelast feature in the list refers to the fact
 that with flex you can put multipleactions on the same line, separated with semicolons,
 while with lex, the following
 加上几乎所有的flex标志.在列表中最后一个提及的特色是对于flex你可以在同一行放置多个动作,以semicolons(分号)分割,
 而对于lex,下面的
foo    handle_foo();++num_foos_seen;


 is (rather surprisingly) truncated to
被(非常惊讶地)切割成
foo    handle_foo();


flex does not truncate the action. Actions that are not enclosed in braces aresimply terminated at the end of the line.
flex不会切割该动作.没有被附在花括号内的动作在行末尾被简单地终止.

Diagnostics
诊断

`warning, rule cannot be matched'
indicates that the given rule cannot be matched because it follows other rulesthat will always match the same text as it.
For example, in the following "foo" cannot be matched because itcomes after an identifier "catch-all" rule:
`warning, rule cannot be matched'
指示给定的规则不能被匹配因为它跟在其后的别的规则和它匹配相同的文本.
举个例子,在下面的"foo"不能
[a-z]+    got_identifier();
foo       got_foo();

 Using REJECT in a scanner suppressesthis warning. 在扫描器用REJECT禁止这个警告.
 
`warning, -s option given but default rule can be matched' ,警告,给定的-s选项但默认的规则能被匹配,
means that it is possible (perhaps only in a particular start condition)that
the default rule (match any single character) is the only one that will match aparticular input.
Since `-s' was given, presumably this is not intended.
意思是"默认规则(匹配任何单个字符)将是唯一一个'匹配一个特别输入的'规则"这一情况是有可能的(或许只在一个特别的开始条件内)

`reject_used_but_not_detected undefined'
`yymore_used_but_not_detected undefined'
These errors can occur at compile time. They indicate that the scanner usesREJECT or `yymore()' but
that flex failed to notice the fact, meaning that flex scanned the first twosections looking for
occurrences of these actions and failed to find any, but somehow you snuck somein (via a #include file,
for example). Use `%option reject' or `%option yymore' to indicate to flex thatyou really do use these features.
这些错误可能发生在编译期.他们表示扫描器用了REJECT或'yymore()'但flex没注意到这一事实,意味着flex扫描开始的两段并寻找
这些动作的发生时失败而没找到任何东西,但不知何故你snuck一些(例如,经由一个#include文件).
用'%option reject'或`%option yymore'以告诉flex你实际上用了这些特色.

`flex scanner jammed'
a scanner compiled with `-s' has encountered an input string which wasn'tmatched by any of its rules.
This error can also occur due to internal problems.
用'-s'编译的一个扫描器已遭遇一个不匹配它的任何规则输入字符串.这个错误也可能是由于内部问题而发生.

`token too large, exceeds YYLMAX'
your scanner uses `%array' and one of its rules matched a string longer thanthe `YYL-' MAX constant
(8K bytes by default). You can increase the value by #define'ing YYLMAX in thedefinitions section of
your flex input.
你的扫描器用了'%array'并且它的一个规则匹配的一个字符串的长度大于'YYL-'MAX常量(默认是8K字节).你可以通过
在你的flex输入的定义段中#define YYLMAX来增加该值.

`scanner requires -8 flag to use the character 'x''
Your scanner specification includes recognizing the 8-bit character x and youdid not specify the -8 flag,
and your scanner defaulted to 7-bit because you used the `-Cf' or `-CF' tablecompression options.
See the discussion of the `-7' flag for details.
你的扫描器规格包含识别8-位字符x并且你没有指定-8标志,而且你的扫描器默认是7-位因为你用了`-Cf'或`-CF'表压缩选项.
细节看`-7'标志的讨论.

`flex scanner push-back overflow'
you used `unput()' to push back so much text that the scanner's buffer couldnot hold both the pushed-back
text and the current token in yytext. Ideally the scanner should dynamicallyresize the buffer in this case,
but at present it does not.
你用`unput()'回送那么多文本以至于扫描器缓冲不能容纳回送的文本和当前在yytext中的记号.理论上扫描器在这种情况
应该能动态的改变缓冲的大小,但目前它还没有提供该功能.


`input buffer overflow, can't enlarge buffer because scanner uses REJECT'
the scanner was working on matching an extremely large token and needed toexpand the input buffer.
This doesn't work with scanners that use REJECT.
扫描器致力于匹配非常巨大的记号而需要扩展输入缓冲.对扫描器用了REJECT的情况这将不能工作.

`fatal flex scanner internal error--end of buffer missed'
This can occur in an scanner which is reentered after a long-jump has jumpedout (or over) the scanner's
activation frame. Before reentering the scanner, use:
这个可能发生在一个"在一个长跳转已跳出(或超过)扫描器的激活框架后"重入的扫描器,所以在重入扫描器前,用:
yyrestart( yyin );

 or, as noted above, switch to using theC++ scanner class.
 或者,正如上面注意到的,转换用C++扫描器类.
 
`too many start conditions in <> construct!'
you listed more start conditions in a <> construct than exist (so youmust have listed at least
one of them twice).
你在一个<>构造中列举了比存在的开始条件过多的开始条件(因此你必须至少已列举了它们一个中的两次)

Files

`-lfl'
library with which scanners must be linked.
扫描器必须连接的库.

`lex.yy.c'
generated scanner (called `lexyy.c' on some systems).
产生的扫描器(在一些系统上叫'lexyy.c')

`lex.yy.cc'
generated C++ scanner class, when using `-+'.
产生的C++扫描器类,当用了'-+'时.

`<FlexLexer.h>'
header file defining the C++ scanner base class, FlexLexer, and its derivedclass, yyFlexLexer.
定义C++扫描器基类的头文件,FlexLexer,并且它的派生类,yyFlexLexer.

`flex.skl'
skeleton scanner. This file is only used when building flex, not when flexexecutes.
脉络文件.该文件只用在建造flex时,当flex运行时不需要.

`lex.backup'
backing-up information for `-b' flag (called `lex.bck' on some systems).
因为'-b'标志而产生的备份信息(在一些系统上叫`lex.bck')

Deficiencies / Bugs
缺陷/错误

 Some trailing context patterns cannot beproperly matched and generate warning
 messages ("dangerous trailingcontext"). These are patterns where the ending of the first
 part of the rule matches the beginningof the second part, such as "zx*/xy*",
 where the 'x*' matches the 'x' at thebeginning of the trailing context.
 (Note that the POSIX draft states thatthe text matched by such patterns is undefined.)
一些紧随上下文模式不能适当的匹配和产生警告消息("dangerous trailing context").
这些模式存在于规则的第一部分的末尾匹配第二部分的开头,就如""zx*/xy*",'x*'匹配在紧随上下文中的'x'.
(注意在POSIX草案中匹配这样的规则的文本是未定义的)


For some trailing context rules, parts which are actually fixed-length are notrecognized as such,
leading to the abovementioned performance loss. In particular, parts using '|'or {n} (such as "foo{3}")
are always considered variable-length.
因一些紧随上下文规则,象那样固定长度的部分事实上不被识别,且导致上面提及的性能损失.特别地,用了
'|' 或 {n} (象"foo{3}")的部分也总是考虑成变长的.

Combining trailing context with the special '|' action can result in fixedtrailing context being
turned into the more expensive variable trailing context. For example, in thefollowing:
紧随上下文和特别的'|'动作结合在一起可能导致确定的紧随上下文变成更昂贵的可变紧随上下文.例如,在下面:

%%
abc      |
xyz/def


 Use of `unput()' invalidates yytext andyyleng, unless the `%array' directive
 or the `-l' option has been used.
unput的价值是使yytext和yyleng无效,除非已用了'%array'指令或'-l'选项

Pattern-matching of NUL's is substantially slower than matching othercharacters.
NUL的模式匹配实质上要比匹配其它字符慢.

Dynamic resizing of the input buffer is slow, as it entails rescanning all thetext matched so far
by the current (generally huge) token.
动态改变输入缓冲区的大小很慢,它必须蒙受重新扫描迄今为止所有的文本经由当前(通常是巨大的)记号匹配的.

Due to both buffering of input and read-ahead, you cannot intermix calls to<stdio.h> routines,
such as, for example, `getchar()', with flex rules and expect it to work. Call`input()' instead.
由于输入和预读两者的缓冲,你不能混入对<stdio.h>中例程的调用,诸如,
举个例子,'getchar()',带有flex规则并且期望它能工作.调用'input()'代替.

The total table entries listed by the `-v' flag excludes the number of tableentries
needed to determine what rule has been matched.The number of entries is equalto the number of DFA states
if the scanner does not use REJECT, and somewhat greater than the number ofstates if it does.
整个的表入口列表 通过'-v'标志排除"需要决定什么规则已被匹配的"表入口的数目.
假如扫描器没用REJECT,该入口数目等于DFA状态的数目,并且假如用了REJECT,则入口数目多少比DFA状态数目大些.

REJECT cannot be used with the `-f' or `-F' options.
REJECT不能和'-f'或'-F'选项用在一起.

The flex internal algorithms need documentation.
flex的内部算法见文档.

See also


lex(1), yacc(1), sed(1), awk(1).


John Levine, Tony Mason, and Doug Brown: Lex & Yacc; O'Reilly andAssociates. Be sure to get the 2nd edition.


M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.


Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers: Principles, Techniquesand Tools; Addison-Wesley (1986). Describes the pattern-matching techniquesused by flex (deterministic finite automata).


Author


 Vern Paxson, with the help of many ideasand much inspiration from Van Jacobson. Original version by Jef Poskanzer. Thefast table representation is a partial implementation of a design done by VanJacobson. The implementation was done by Kevin Gong and Vern Paxson.


Thanks to the many flex beta-testers, feedbackers, and contributors, especiallyFrancois Pinard, Casey Leedom, Stan Adermann, Terry Allen, DavidBarker-Plummer, John Basrai, Nelson H.F. Beebe, `[email protected]', Karl Berry,Peter A. Bigot, Simon Blanchard, Keith Bostic, Frederic Brehm, Ian Brockbank,Kin Cho, Nick Christopher, Brian Clapper, J.T. Conklin, Jason Coughlin, BillCox, Nick Cropper, Dave Curtis, Scott David Daniels, Chris G. Demetriou, TheoDeraadt, Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, Chris Faylor,Chris Flatters, Jon Forrest, Joe Gayda, Kaveh R. Ghazi, Eric Goldman,Christopher M. Gould, Ulrich Grepel, Peer Griebel, Jan Hajic, Charles Hemphill,NORO Hideo, Jarkko Hietaniemi, Scott Hofmann, Jeff Honig, Dana Hudes, EricHughes, John Interrante, Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara,Jeffrey R. Jones, Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence OKane, Amir Katz, `[email protected]', Kevin B. Kenny, Steve Kirsch, WinfriedKoenig, Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard, Craig Leres, JohnLevine, Steve Liddle, Mike Long, Mohamed el Lozy, Brian Madsen, Malte, JoeMarshall, Bengt Martensson, Chris Metcalf, Luke Mewburn, Jim Meyering, R.Alexander Milowski, Erik Naggum, G.T. Nicol, Landon Noll, James Nordby, MarcNozell, Richard Ohnemus, Karsten Pahnke, Sven Panne, Roland Pesch, WalterPelissero, Gaumond Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,Frederic Raimbault, Pat Rankin, Rick Richardson, Kevin Rodgers, Kai Uwe Rommel,Jim Roskind, Alberto Santini, Andreas Scherer, Darrell Schiebel, RafSchietekat, Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex Siegel,Eckehard Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, IanLance Taylor, Chris Thewalt, Richard M. Timoney, Jodi Tsai, Paul Tuinenga, GaryWeik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken Yap, Ron Zellar,Nathan Zelle, David Zuhn, and those whose names have slipped my marginalmail-archiving skills but whose contributions are appreciated all the same.


Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore, Craig Leres,John Levine, Bob Mulcahy, G.T. Nicol, Francois Pinard, Rich Salz, and Richard Stallmanfor help with various distribution headaches.


Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to BensonMargulies and Fred Burke for C++ support; to Kent Williams and Tom Epperly forC++ class support; to Ove Ewerlid for support of NUL's; and to Eric Hughes forsupport of multiple buffers.


This work was primarily done when I was with the Real Time Systems Group at theLawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there for thesupport I received.


Send comments to `[email protected]'.



This document was generated on 23 February 2001 using texi2html??1.56k.










 


你可能感兴趣的:(linux,Flex,buffer,input,character,performance)