ganggexiongqi

yacc 与 Lex 学习 [2]

我搜到了一篇为认为有帮助的文章，在这里把其中我认为可读的部分摘了出来。如果你想要阅读原版网页点击如下的网址即可，
http://ds9a.nl/lex-yacc/cvs/lex-yacc-howto.html或者，直接看本文末尾的附录A
令外也有很多与 yacc&lex相关的电子书籍我会尽快传到我的下载里面。
好了，我们进入正题。(翻译水平有限，你凑合着看吧！)
3. Lex
   Lex 工具可以产生 Lexer。Lexer可以将字符流作为输入，并且当它读到匹配一个预定的关键字(key)时将产生预定的动作。如下是一个简单的例子,
%{
#include <stdio.h>
%}
%%
stop    printf("Stop command received/n");
start   printf("Start command received/n");
%%
在例子中，在%{和%}之间的将会直接包含到输出程序中。我们需要的printf的定义包含在stdio.h中。
第二部分由 %%开始和结束。一旦遇到了关键字 'stop' ，它后面的 printf("Stop comand received/n");将会被执行。对于 'start'也一样。
你可以执行如下的步骤进行编译(假如你保存的文件名字为example1.l ,注意以 .l 结尾不是1)，
$ lex example1.l
$ gcc lex.yy.c -o example1 -ll
注意：如果你使用flex而不是lex工具的话，你应该使用 '-lfl' 代替‘-ll'。在RedHat 6.x和SuSE中如此。
这将产生一个可执行文件 'examlple1' 。如果你执行它(./example1)，它将等待你输入。只要你输入能够匹配关键字的串将会产生输出。结束该程序使用Ctrl+D(^D)。
你或许想知道这个程序是怎样执行的，要知道我们并没有定义main()函数。实际上这个函数在libl(liblex)中为你定义了，我们在编译时使用-ll引用了这个库。
3.1 正则表达式的匹配
例子如下，编译运行和上例一样。更多关于正则表达式的信息参见Yacc与Lex学习[1]
%{
#include <stdio.h>
%}
%%
[0123456789]+           printf("NUMBER/n");
[a-zA-Z][a-zA-Z0-9]*    printf("WORD/n");
%%
3.2 一个类似C语法的更复杂的例子
怎样解析如下的一个文件呢？
logging {
        category lame-servers { null; };
        category cname { null; };
};
zone "." {
        type hint;
        file "/etc/bind/db.root";
};
我们清楚地看到有各种符号在这个文件中，如：
    * WORDs, 比如'zone' 和'type'
    * FILENAMEs, 比如 '/etc/bind/db.root'
    * QUOTEs, 括着文件名的引号
    * OBRACEs, {
    * EBRACEs, }
    * SEMICOLONs, ;
向关的Lex文件如下（Example3.l）
%{
#include <stdio.h>
%}
%%
[a-zA-Z][a-zA-Z0-9]*    printf("WORD ");
[a-zA-Z0-9//.-]+        printf("FILENAME ");
/"                      printf("QUOTE ");
/{                      printf("OBRACE ");
/}                      printf("EBRACE ");
;                       printf("SEMICOLON ");
/n                      printf("/n");
[ /t]+                  /* 忽略空白符 */;
%%
编译如上例子，测试的时候，你可以把开始的测试文本贴到一个 test.txt文件中（与扩展名无关），然后执行
$ cat test.txt | ./example3
即可看到输出：
WORD OBRACE
WORD FILENAME OBRACE WORD SEMICOLON EBRACE SEMICOLON
WORD WORD OBRACE WORD SEMICOLON EBRACE SEMICOLON
EBRACE SEMICOLON
WORD QUOTE FILENAME QUOTE OBRACE
WORD WORD SEMICOLON
WORD QUOTE FILENAME QUOTE SEMICOLON
EBRACE SEMICOLON
4. YACC
    YACC 可以解析带有特定值的符号的输入流。YACC不知道'输入流'是什么，它需要预先处理好的符号，这清楚地描述了YACC和Lex之间的关系。
4.1 一个间的的保温箱控制器
我们想用简单的命令控制保温箱。和保温箱的一段交流可能如下：
heat on
              Heater on!
heat off
              Heater off!
target temerature 22
                New temperature set!
我们需要识别的符号有：heat,on/off(STATE),target,temperature,NUMBER.
符号解析器的代码如下:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+                  return NUMBER;
heat                    return TOKHEAT;
on|off                  return STATE;
target                  return TOKTARGET;
temperature             return TOKTEMPERATURE;
/n                      /* ignore end of line */;
[ /t]+                  /* ignore whitespace */;
%%
我们注意到两个重要的变化。首先，我们包含了文件'y.tab.h'。其次，我们不再输出什么了，我们仅仅返回了符号的名字。我们之所以做出这种变化是因为我们现在要把它们输出给YACC.YACC对我们在屏幕上输出了
什么不感兴趣。Y.tab.h中包含了这些符号的定义。
但是y.tab.h从哪里来的呢？他是由YACC依据我们就要创建的语法文件产生的。
commands: /* empty */
        | commands command
        ;
command:
       heat_switch
        |
        target_set
        ;
heat_switch:
        TOKHEAT STATE
        {
                printf("/tHeat turned on or off/n");
        }
        ;
target_set:
        TOKTARGET TOKTEMPERATURE NUMBER
        {
                printf("/tTemperature set/n");
        }
        ;
我把第一部分称为'root'。它告诉我们我们有'commands',这些命令有独立的'command'组成。你可以看到这些规则是递归的，因为它有包括了'commands'。
这意味着程序可以解析一系列的一条一条的命令。你可以通过读'How do Lex and YACC work internally'得到在递归方面的重要的细节。
第二部分定义了command是什么。我们仅仅支持两类命令，一条是'heat_switch'另一条是'target_set'。这正是 '|'表明的 ’一条命令要不是由heat_switch组成
要不是由target_set组成。
heat_switch 由符号HEAT(即单词'heat')后面跟着一个状态(我们在Lex中定义的'on' 或者'off')。
稍有点复杂的是 target_set，它由符号TARGET(单词'target')，符号TEMPERATURE(单词'temperature')和一个数字组成。
一个完整的YACC文件
前面的一段仅仅是YACC文件的语法部分，但是还有其他部分。如下就是我们先前忽略的头部：
%{
#include <stdio.h>
#include <string.h>
void yyerror(const char *str)
{
        fprintf(stderr,"error: %s/n",str);
}
int yywrap()
{
        return 1;
}
main()
{
        yyparse();
}
%}
%token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE
如果YACC发现有错误就会调用yyerror()函数。我们仅简单的把传来的值输出。但是有更好的做法。请见最后的'Further reading‘这一部分。
函数yywrap()可以用来继续从另一个文件中读取数据。要想得到更多相关信息请阅读'How do Lex and YACC work internally'这一章节.
main函数仅仅示意一切开始。
最后一行简单的定义了我们要用到的符号。如果编译时选用了'-d'选项那么y.tab.h会产生。
lex example4.l
yacc -d example4.y
gcc lex.yy.c y.tab.c -o example4
下面我附上完整的文件。
/*文件 example4.l*/
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+                  return NUMBER;
heat                    return TOKHEAT;
on|off                  return STATE;
target                  return TOKTARGET;
temperature             return TOKTEMPERATURE;
/n                      /* ignore end of line */;
[ /t]+                  /* ignore whitespace */;
%%
/*文件example4.y*/%{
#include <stdio.h>
#include <string.h>
void yyerror(const char *str)
{
        fprintf(stderr,"error: %s/n",str);
}
int yywrap()
{
return 1;
}
main()
{
yyparse();
}
%}
%token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE
%%
commands: /* empty */
        | commands command
        ;
command:
        heat_switch
        |
        target_set
        ;
heat_switch:
        TOKHEAT STATE
        {
                printf("/tHeat turned on or off/n");
        }
        ;
target_set:
        TOKTARGET TOKTEMPERATURE NUMBER
        {
                printf("/tTemperature set/n");
        }
        ;
%%
4.2   扩展保温箱的例子--处理变量
   正如我们看到的，我们正确的解析了保温箱的命令，甚至妥善的处理了错误的输入。但是你可能已经猜到了，程序不知道自己该做什么，它
仅仅是没有传递任何你输入的值。
   让我们开始增加读取目标温度的功能。为此，我们需要研究下 NUMBER在解析器中转化为整型值，这个值可以接下来被YACC读取。
无论什么时候Lex匹配到一个目标，它将这个目标存入字符串 'yytext'中。接下来 YACC将要从'yylval'中读出一个值。在例5中，我们看到解决方案
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+                  yylval=atoi(yytext); return NUMBER;
heat                    return TOKHEAT;
on|off                  yylval=!strcmp(yytext,"on"); return STATE;
target                  return TOKTARGET;
temperature             return TOKTEMPERATURE;
/n                      /* ignore end of line */;
[ /t]+                  /* ignore whitespace */;
%%
正如你看到的，我们在yytext上运行atoi(),把它的值存入yylval,这样YACC就可以访问它。对于STATE的匹配我们做的基本类似。我们
将它同‘on'比较，如果相等就将yy1val设置为1.请注意在Lex中设置独立的'on'和'off'的匹配将会使产生更快的代码，但是我想要演示一个更复杂的规则和动作。
现在，我们需要了解YACC怎样处理这一切的。Lex中的'yylval'在YACC中有不同的名字。让我们检测这个规则设定新的温度。
target_set:
        TOKTARGET TOKTEMPERATURE NUMBER
        {
                printf("/tTemperature set to %d/n",$3);
        }
        ;
要访问这条规则的第三部分(NUMBER)的值,我们使用$3。任何时候yylex()返回了，与返回符号相关的值都可以使用$-结构读出。
进一步，让我们观察新的'heat_switch''规则：heat_switch:
        TOKHEAT STATE
        {
                if($2)
                        printf("/tHeat turned on/n");
                else
                        printf("/tHeat turned off/n");
        }
到此，你就可以运行example5了。它将会正确的输出你输入的值。
以下部分不再翻译。
附录A:
http://ds9a.nl/lex-yacc/cvs/lex-yacc-howto.html [点击阅读源文]
1. Introduction
Welcome, gentle reader.
If you have been programming for any length of time in a Unix environment, you will have encountered the mystical programs Lex & YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison, where Flex is a Lex implementation by Vern Paxson and Bison the GNU version of YACC. We will call these programs Lex and YACC throughout - the newer versions are upwardly compatible, so you can use Flex and Bison when trying our examples.
These programs are massively useful, but as with your C compiler, their manpage does not explain the language they understand, nor how to use them. YACC is really amazing when used in combination with Lex, however, the Bison manpage does not describe how to integrate Lex generated code with your Bison program.
1.1 What this document is NOT
There are several great books which deal with Lex & YACC. By all means read these books if you need to know more. They provide far more information than we ever will. See the 'Further Reading' section at the end. This document is aimed at bootstrapping your use of Lex & YACC, to allow you to create your first programs.
The documentation that comes with Flex and BISON is also excellent, but no tutorial. They do complement my HOWTO very well though. They too are referenced at the end.
I am by no means a YACC/Lex expert. When I started writing this document, I had exactly two days of experience. All I want to accomplish is to make those two days easier for you.
In no way expect the HOWTO to show proper YACC and Lex style. Examples have been kept very simple and there may be better ways to write them. If you know how to, please let me know.
1.2 Downloading stuff
Please note that you can download all the examples shown, which are in machine readable form. See the homepage for details.
1.3 License
Copyright (c) 2001 by bert hubert. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, vX.Y or later (the latest version is presently available at http://www.opencontent.org/openpub/).
2. What Lex & YACC can do for you
When properly used, these programs allow you to parse complex languages with ease. This is a great boon when you want to read a configuration file, or want to write a compiler for any language you (or anyone else) might have invented.
With a little help, which this document will hopefully provide, you will find that you will never write a parser again by hand - Lex & YACC are the tools to do this.
2.1 What each program does on its own
Although these programs shine when used together, they each serve a different purpose. The next chapter will explain what each part does.
3. Lex
The program Lex generates a so called `Lexer'. This is a function that takes a stream of characters as its input, and whenever it sees a group of characters that match a key, takes a certain action. A very simple example:
    %{
    #include <stdio.h>
    %}
    %%
    stop    printf("Stop command received/n");
    start   printf("Start command received/n");
    %%
The first section, in between the %{ and %} pair is included directly in the output program. We need this, because we use printf later on, which is defined in stdio.h.
Sections are separated using '%%', so the first line of the second section starts with the 'stop' key. Whenever the 'stop' key is encountered in the input, the rest of the line (a printf() call) is executed.
Besides 'stop', we've also defined 'start', which otherwise does mostly the same.
We terminate the code section with '%%' again.
To compile Example 1, do this:
    lex example1.l
    cc lex.yy.c -o example1 -ll
    NOTE: If you are using flex, instead of lex, you may have to change '-ll' to '-lfl' in the compilation scripts. RedHat 6.x and SuSE need this, even when you invoke 'flex' as 'lex'!
This will generate the file 'example1'. If you run it, it waits for you to type some input. Whenever you type something that is not matched by any of the defined keys (ie, 'stop' and 'start') it's output again. If you enter 'stop' it will output 'Stop command received';
Terminate with a EOF (^D).
You may wonder how the program runs, as we didn't define a main() function. This function is defined for you in libl (liblex) which we compiled in with the -ll command.
3.1 Regular expressions in matches
This example wasn't very useful in itself, and our next one won't be either. It will however show how to use regular expressions in Lex, which are massively useful later on.
Example 2:
    %{
    #include <stdio.h>
    %}
    %%
    [0123456789]+           printf("NUMBER/n");
    [a-zA-Z][a-zA-Z0-9]*    printf("WORD/n");
    %%
This Lex file describes two kinds of matches (tokens): WORDs and NUMBERs. Regular expressions can be pretty daunting but with only a little work it is easy to understand them. Let's examine the NUMBER match:
[0123456789]+
This says: a sequence of one or more characters from the group 0123456789. We could also have written it shorter as:
[0-9]+
Now, the WORD match is somewhat more involved:
[a-zA-Z][a-zA-Z0-9]*
The first part matches 1 and only 1 character that is between 'a' and 'z', or between 'A' and 'Z'. In other words, a letter. This initial letter then needs to be followed by zero or more characters which are either a letter or a digit. Why use an asterisk here? The '+' signifies 1 or more matches, but a WORD might very well consist of only one character, which we've already matched. So the second part may have zero matches, so we write a '*'.
This way, we've mimicked the behaviour of many programming languages which demand that a variable name *must* start with a letter, but can contain digits afterwards. In other words, 'temperature1' is a valid name, but '1temperature' is not.
Try compiling Example 2, lust like Example 1, and feed it some text. Here is a sample session:
    $ ./example2
    foo
    WORD
    bar
    WORD
    123
    NUMBER
    bar123
    WORD
    123bar
    NUMBER
    WORD
You may also be wondering where all this whitespace is coming from in the output. The reason is simple: it was in the input, and we don't match on it anywhere, so it gets output again.
The Flex manpage documents its regular expressions in detail. Many people feel that the perl regular expression manpage (perlre) is also very useful, although Flex does not implement everything perl does.
Make sure that you do not create zero length matches like '[0-9]*' - your lexer might get confused and start matching empty strings repeatedly.
3.2 A more complicated example for a C like syntax
Let's say we want to parse a file that looks like this:
    logging {
            category lame-servers { null; };
            category cname { null; };
    };
    zone "." {
            type hint;
            file "/etc/bind/db.root";
    };
We clearly see a number of categories (tokens) in this file:
    * WORDs, like 'zone' and 'type'
    * FILENAMEs, like '/etc/bind/db.root'
    * QUOTEs, like those surrounding the filename
    * OBRACEs, {
    * EBRACEs, }
    * SEMICOLONs, ;
The corresponding Lex file is Example 3:
    %{
    #include <stdio.h>
    %}
    %%
    [a-zA-Z][a-zA-Z0-9]*    printf("WORD ");
    [a-zA-Z0-9//.-]+        printf("FILENAME ");
    /"                      printf("QUOTE ");
    /{                      printf("OBRACE ");
    /}                      printf("EBRACE ");
    ;                       printf("SEMICOLON ");
    /n                      printf("/n");
    [ /t]+                  /* ignore whitespace */;
    %%
When we feed our file to the program this Lex file generates (using example3.compile), we get:
    WORD OBRACE
    WORD FILENAME OBRACE WORD SEMICOLON EBRACE SEMICOLON
    WORD WORD OBRACE WORD SEMICOLON EBRACE SEMICOLON
    EBRACE SEMICOLON
    WORD QUOTE FILENAME QUOTE OBRACE
    WORD WORD SEMICOLON
    WORD QUOTE FILENAME QUOTE SEMICOLON
    EBRACE SEMICOLON
When compared with the configuration file mentioned above, it is clear that we have neatly 'Tokenized' it. Each part of the configuration file has been matched, and converted into a token.
And this is exactly what we need to put YACC to good use.
3.3 What we've seen
We've seen that Lex is able to read arbitrary input, and determine what each part of the input is. This is called 'Tokenizing'.
4. YACC
YACC can parse input streams consisting of tokens with certain values. This clearly describes the relation YACC has with Lex, YACC has no idea what 'input streams' are, it needs preprocessed tokens. While you can write your own Tokenizer, we will leave that entirely up to Lex.
A note on grammars and parsers. When YACC saw the light of day, the tool was used to parse input files for compilers: programs. Programs written in a programming language for computers are typically *not* ambiguous - they have just one meaning. As such, YACC does not cope with ambiguity and will complain about shift/reduce or reduce/reduce conflicts. More about ambiguity and YACC "problems" can be found in 'Conflicts' chapter.
4.1 A simple thermostat controller
Let's say we have a thermostat that we want to control using a simple language. A session with the thermostat may look like this:
    heat on
            Heater on!
    heat off
            Heater off!
    target temperature 22
            New temperature set!
The tokens we need to recognize are: heat, on/off (STATE), target, temperature, NUMBER.
The Lex tokenizer (Example 4) is:
    %{
    #include <stdio.h>
    #include "y.tab.h"
    %}
    %%
    [0-9]+                  return NUMBER;
    heat                    return TOKHEAT;
    on|off                  return STATE;
    target                  return TOKTARGET;
    temperature             return TOKTEMPERATURE;
    /n                      /* ignore end of line */;
    [ /t]+                  /* ignore whitespace */;
    %%
We note two important changes. First, we include the file 'y.tab.h', and secondly, we no longer print stuff, we return names of tokens. This change is because we are now feeding it all to YACC, which isn't interested in what we output to the screen. Y.tab.h has definitions for these tokens.
But where does y.tab.h come from? It is generated by YACC from the Grammar File we are about to create. As our language is very basic, so is the grammar:
    commands: /* empty */
            | commands command
            ;
    command:
            heat_switch
            |
            target_set
            ;
    heat_switch:
            TOKHEAT STATE
            {
                    printf("/tHeat turned on or off/n");
            }
            ;
    target_set:
            TOKTARGET TOKTEMPERATURE NUMBER
            {
                    printf("/tTemperature set/n");
            }
            ;
The first part is what I call the 'root'. It tells us that we have 'commands', and that these commands consist of individual 'command' parts. As you can see this rule is very recursive, because it again contains the word 'commands'. What this means is that the program is now capable of reducing a series of commands one by one. Read the chapter 'How do Lex and YACC work internally' for important details on recursion.
The second rule defines what a command is. We support only two kinds of commands, the 'heat_switch' and the 'target_set'. This is what the |-symbol signifies - 'a command consists of either a heat_switch or a target_set'.
A heat_switch consists of the HEAT token, which is simply the word 'heat', followed by a state (which we defined in the Lex file as 'on' or 'off').
Somewhat more complicated is the target_set, which consists of the TARGET token (the word 'target'), the TEMPERATURE token (the word 'temperature') and a number.
A complete YACC file
The previous section only showed the grammar part of the YACC file, but there is more. This is the header that we omitted:
    %{
    #include <stdio.h>
    #include <string.h>

    void yyerror(const char *str)
    {
            fprintf(stderr,"error: %s/n",str);
    }

    int yywrap()
    {
            return 1;
    }

    main()
    {
            yyparse();
    }
    %}
    %token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE
The yyerror() function is called by YACC if it finds an error. We simply output the message passed, but there are smarter things to do. See the 'Further reading' section at the end.
The function yywrap() can be used to continue reading from another file. It is called at EOF and you can than open another file, and return 0. Or you can return 1, indicating that this is truly the end. For more about this, see the 'How do Lex and YACC work internally' chapter.
Then there is the main() function, that does nothing but set everything in motion.
The last line simply defines the tokens we will be using. These are output using y.tab.h if YACC is invoked with the '-d' option.
Compiling & running the thermostat controller
    lex example4.l
    yacc -d example4.y
    cc lex.yy.c y.tab.c -o example4
A few things have changed. We now also invoke YACC to compile our grammar, which creates y.tab.c and y.tab.h. We then call Lex as usual. When compiling, we remove the -ll flag: we now have our own main() function and don't need the one provided by libl.
    NOTE: if you get an error about your compiler not being able to find 'yylval', add this to example4.l, just beneath #include <y.tab.h>:
    extern YYSTYPE yylval;
    This is explained in the 'How Lex and YACC work internally' section.
A sample session:
    $ ./example4
    heat on
            Heat turned on or off
    heat off
            Heat turned on or off
    target temperature 10
            Temperature set
    target humidity 20
    error: parse error
    $
This is not quite what we set out to achieve, but in the interest of keeping the learning curve manageable, not all cool stuff can be presented at once.
4.2 Expanding the thermostat to handle parameters
As we've seen, we now parse the thermostat commands correctly, and even flag mistakes properly. But as you might have guessed by the weasely wording, the program has no idea of what it should do, it does not get passed any of the values you enter.
Let's start by adding the ability to read the new target temperature. In order to do so, we need to learn the NUMBER match in the Lexer to convert itself into an integer value, which can then be read in YACC.
Whenever Lex matches a target, it puts the text of the match in the character string 'yytext'. YACC in turn expects to find a value in the variable 'yylval'. In Example 5, we see the obvious solution:
    %{
    #include <stdio.h>
    #include "y.tab.h"
    %}
    %%
    [0-9]+                  yylval=atoi(yytext); return NUMBER;
    heat                    return TOKHEAT;
    on|off                  yylval=!strcmp(yytext,"on"); return STATE;
    target                  return TOKTARGET;
    temperature             return TOKTEMPERATURE;
    /n                      /* ignore end of line */;
    [ /t]+                  /* ignore whitespace */;
    %%
As you can see, we run atoi() on yytext, and put the result in yylval, where YACC can see it. We do much the same for the STATE match, where we compare it to 'on', and set yylval to 1 if it is equal. Please note that having a separate 'on' and 'off' match in Lex would produce faster code, but I wanted to show a more complicated rule and action for a change.
Now we need to learn YACC how to deal with this. What is called 'yylval' in Lex has a different name in YACC. Let's examine the rule setting the new temperature target:
    target_set:
            TOKTARGET TOKTEMPERATURE NUMBER
            {
                    printf("/tTemperature set to %d/n",$3);
            }
            ;
To access the value of the third part of the rule (ie, NUMBER), we need to use $3. Whenever yylex() returns, the contents of yylval are attached to the terminal, the value of which can be accessed with the $-construct.
To expound on this further, let's observe the new 'heat_switch' rule:
    heat_switch:
            TOKHEAT STATE
            {
                    if($2)
                            printf("/tHeat turned on/n");
                    else
                            printf("/tHeat turned off/n");
            }
            ;
If you now run example5, it properly outputs what you entered.
4.3 Parsing a configuration file
Let's repeat part of the configuration file we mentioned earlier:
    zone "." {
            type hint;
            file "/etc/bind/db.root";
    };
Remember that we already wrote a Lexer for this file. Now all we need to do is write the YACC grammar, and modify the Lexer so it returns values in a format YACC can understand.
In the lexer from Example 6 we see:
    %{
    #include <stdio.h>
    #include "y.tab.h"
    %}
    %%
    zone                    return ZONETOK;
    file                    return FILETOK;
    [a-zA-Z][a-zA-Z0-9]*    yylval=strdup(yytext); return WORD;
    [a-zA-Z0-9//.-]+        yylval=strdup(yytext); return FILENAME;
    /"                      return QUOTE;
    /{                      return OBRACE;
    /}                      return EBRACE;
    ;                       return SEMICOLON;
    /n                      /* ignore EOL */;
    [ /t]+                  /* ignore whitespace */;
    %%
If you look carefully, you can see that yylval has changed! We no longer expect it to be an integer, but in fact assume that it is a char *. In the interest of keeping things simple, we invoke strdup and waste a lot of memory. Please note that this may not be a problem in many areas where you only need to parse a file once, and then exit.
We want to store character strings because we are now mostly dealing with names: file names and zone names. In a later chapter we will explain how to deal with multiple types of data.
In order to tell YACC about the new type of yylval, we add this line to the header of our YACC grammar:
#define YYSTYPE char *
The grammar itself is again more complicated. We chop it in parts to make it easier to digest.
    commands:
            |
            commands command SEMICOLON
            ;
    command:
            zone_set
            ;
    zone_set:
            ZONETOK quotedname zonecontent
            {
                    printf("Complete zone for '%s' found/n",$2);
            }
            ;
This is the intro, including the aforementioned recursive 'root'. Please note that we specify that commands are terminated (and separated) by ;'s. We define one kind of command, the 'zone_set'. It consists of the ZONE token (the word 'zone'), followed by a quoted name and the 'zonecontent'. This zonecontent starts out simple enough:
    zonecontent:
            OBRACE zonestatements EBRACE
It needs to start with an OBRACE, a {. Then follow the zonestatements, followed by an EBRACE, }.
    quotedname:
            QUOTE FILENAME QUOTE
            {
                    $$=$2;
            }
This section defines what a 'quotedname' is: a FILENAME between QUOTEs. Then it says something special: the value of a quotedname token is the value of the FILENAME. This means that the quotedname has as its value the filename without quotes.
This is what the magic '$$=$2;' command does. It says: my value is the value of my second part. When the quotedname is now referenced in other rules, and you access its value with the $-construct, you see the value that we set here with $$=$2.
    NOTE: this grammar chokes on filenames without either a '.' or a '/' in them.
    zonestatements:
            |
            zonestatements zonestatement SEMICOLON
            ;
    zonestatement:
            statements
            |
            FILETOK quotedname
            {
                    printf("A zonefile name '%s' was encountered/n", $2);
            }
            ;
This is a generic statement that catches all kinds of statements within the 'zone' block. We again see the recursiveness.
    block:
            OBRACE zonestatements EBRACE SEMICOLON
            ;
    statements:
            | statements statement
            ;
    statement: WORD | block | quotedname
This defines a block, and 'statements' which may be found within.
When executed, the output is like this:
    $ ./example6
    zone "." {
            type hint;
            file "/etc/bind/db.root";
            type hint;
    };
    A zonefile name '/etc/bind/db.root' was encountered
    Complete zone for '.' found
5. Making a Parser in C++
Although Lex and YACC predate C++, it is possible to generate a C++ parser. While Flex includes an option to generate a C++ lexer, we won't be using that, as YACC doesn't know how to deal with it directly.
My preferred way to make a C++ parser is to have Lex generate a plain C file, and to let YACC generate C++ code. When you then link your application, you may run into some problems because the C++ code by default won't be able to find C functions, unless you've told it that those functions are extern "C".
To do so, make a C header in YACC like this:
    extern "C"
    {
            int yyparse(void);
            int yylex(void);
            int yywrap()
            {
                    return 1;
            }
    }
If you want to declare or change yydebug, you must now do it like this:
    extern int yydebug;
    main()
    {
            yydebug=1;
            yyparse();
    }
This is because C++'s One Definition Rule, which disallows multiple definitions of yydebug.
You may also find that you need to repeat the #define of YYSTYPE in your Lex file, because of C++'s stricter type checking.
To compile, do something like this:
    lex bindconfig2.l
    yacc --verbose --debug -d bindconfig2.y -o bindconfig2.cc
    cc -c lex.yy.c -o lex.yy.o
    c++ lex.yy.o bindconfig2.cc -o bindconfig2
Because of the -o statement, y.tab.h is now called bindconfig2.cc.h, so take that into account.
To summarize: don't bother to compile your Lexer in C++, keep it in C. Make your Parser in C++ and explain your compiler that some functions are C functions with extern "C" statements.
6. How do Lex and YACC work internally
In the YACC file, you write your own main() function, which calls yyparse() at one point. The function yyparse() is created for you by YACC, and ends up in y.tab.c.
yyparse() reads a stream of token/value pairs from yylex(), which needs to be supplied. You can code this function yourself, or have Lex do it for you. In our examples, we've chosen to leave this task to Lex.
The yylex() as written by Lex reads characters from a FILE * file pointer called yyin. If you do not set yyin, it defaults to standard input. It outputs to yyout, which if unset defaults to stdout. You can also modify yyin in the yywrap() function which is called at the end of a file. It allows you to open another file, and continue parsing.
If this is the case, have it return 0. If you want to end parsing at this file, let it return 1.
Each call to yylex() returns an integer value which represents a token type. This tells YACC what kind of token it has read. The token may optionally have a value, which should be placed in the variable yylval.
By default yylval is of type int, but you can override that from the YACC file by re#defining YYSTYPE.
The Lexer needs to be able to access yylval. In order to do so, it must be declared in the scope of the lexer as an extern variable. The original YACC neglects to do this for you, so you should add the following to your lexter, just beneath #include <y.tab.h>:
extern YYSTYPE yylval;
Bison, which most people are using these days, does this for you automatically.
6.1 Token values
As mentioned before, yylex() needs to return what kind of token it encountered, and put its value in yylval. When these tokens are defined with the %token command, they are assigned numerical id's, starting from 256.
Because of that fact, it is possible to have all ascii characters as a token. Let's say you are writing a calculator, up till now we would have written the lexer like this:
    [0-9]+          yylval=atoi(yytext); return NUMBER;
    [ /n]+          /* eat whitespace */;
    -               return MINUS;
    /*              return MULT;
    /+              return PLUS;
    ...
Our YACC grammer would then contain:
            exp:    NUMBER
                    |
                    exp PLUS exp
                    |
                    exp MINUS exp
                    |
                    exp MULT exp
This is needlessly complicated. By using characters as shorthands for numerical token id's, we can rewrite our lexer like this:
[0-9]+          yylval=atoi(yytext); return NUMBER;
[ /n]+          /* eat whitespace */;
.               return (int) yytext[0];
This last dot matches all single otherwise unmatched characters.
Our YACC grammer would then be:
            exp:    NUMBER
                    |
                    exp '+' exp
                    |
                    exp '-' exp
                    |
                    exp '*' exp
This is lots shorter and also more obvious. You do not need to declare these ascii tokens with %token in the header, they work out of the box.
One other very good thing about this construct is that Lex will now match everything we throw at it - avoiding the default behaviour of echoing unmatched input to standard output. If a user of this calculator uses a ^, for example, it will now generate a parsing error, instead of being echoed to standard output.
6.2 Recursion: 'right is wrong'
Recursion is a vital aspect of YACC. Without it, you can't specify that a file consists of a sequence of independent commands or statements. Out of its own accord, YACC is only interested in the first rule, or the one you designate as the starting rule, with the '%start' symbol.
Recursion in YACC comes in two flavours: right and left. Left recursion, which is the one you should use most of the time, looks like this:
commands: /* empty */
        |
        commands command
This says: a command is either empty, or it consists of more commands, followed by a command. They way YACC works means that it can now easily chop off individual command groups (from the front) and reduce them.
Compare this to right recursion, which confusingly enough looks better to many eyes:
commands: /* empty */
        |
        command commands
But this is expensive. If used as the %start rule, it requires YACC to keep all commands in your file on the stack, which may take a lot of memory. So by all means, use left recursion when parsing long statements, like entire files. Sometimes it is hard to avoid right recursion but if your statements are not too long, you do not need to go out of your way to use left recursion.
If you have something terminating (and therefore separating) your commands, right recursion looks very natural, but is still expensive:
commands: /* empty */
        |
        command SEMICOLON commands
The right way to code this is using left recursion (I didn't invent this either):
commands: /* empty */
        |
        commands command SEMICOLON
Earlier versions of this HOWTO mistakenly used right recursion. Markus Triska kindly informed us of this.
6.3 Advanced yylval: %union
Currently, we need to define *the* type of yylval. This however is not always appropriate. There will be times when we need to be able to handle multiple data types. Returning to our hypothetical thermostat, perhaps we want to be able to choose a heater to control, like this:
    heater mainbuiling
            Selected 'mainbuilding' heater
    target temperature 23
            'mainbuilding' heater target temperature now 23
What this calls for is for yylval to be a union, which can hold both strings and integers - but not simultaneously.
Remember that we told YACC previously what type yylval was supposed to by by defining YYSTYPE. We could conceivably define YYSTYPE to be a union this way, by YACC has an easier method for doing this: the %union statement.
Based on Example 4, we now write the Example 7 YACC grammar. First the intro:
    %token TOKHEATER TOKHEAT TOKTARGET TOKTEMPERATURE
    %union
    {
            int number;
            char *string;
    }
    %token <number> STATE
    %token <number> NUMBER
    %token <string> WORD
We define our union, which contains only a number and a string. Then using an extended %token syntax, we explain to YACC which part of the union each token should access.
In this case, we let the STATE token use an integer, as before. Same goes for the NUMBER token, which we use for reading temperatures.
New however is the WORD token, which is declared to need a string.
The Lexer file changes a bit too:
    %{
    #include <stdio.h>
    #include <string.h>
    #include "y.tab.h"
    %}
    %%
    [0-9]+                  yylval.number=atoi(yytext); return NUMBER;
    heater                  return TOKHEATER;
    heat                    return TOKHEAT;
    on|off                  yylval.number=!strcmp(yytext,"on"); return STATE;
    target                  return TOKTARGET;
    temperature             return TOKTEMPERATURE;
    [a-z0-9]+               yylval.string=strdup(yytext);return WORD;
    /n                      /* ignore end of line */;
    [ /t]+                  /* ignore whitespace */;
    %%
As you can see, we don't access the yylval directly anymore, we add a suffix indicating which part we want to access. We don't need to do that in the YACC grammar however, as YACC performs the magic for us:
    heater_select:
            TOKHEATER WORD
            {
                    printf("/tSelected heater '%s'/n",$2);
                    heater=$2;
            }
            ;
Because of the %token declaration above, YACC automatically picks the 'string' member from our union. Note also that we store a copy of $2, which is later used to tell the user which heater he is sending commands to:
    target_set:
            TOKTARGET TOKTEMPERATURE NUMBER
            {
                    printf("/tHeater '%s' temperature set to %d/n",heater,$3);
            }
            ;
For more details, read example7.y.
7. Debugging
Especially when learning, it is important to have debugging facilities. Luckily, YACC can give a lot of feedback. This feedback comes at the cost of some overhead, so you need to supply some switches to enable it.
When compiling your grammar, add --debug and --verbose to the YACC commandline. In your grammar C heading, add the following:
int yydebug=1;
This will generate the file 'y.output' which explains the state machine that was created.
When you now run the generated binary, it will output a *lot* of what is happening. This includes what state the state machine currently has, and what tokens are being read.
Peter Jinks wrote a page on debugging which contains some common errors and how to solve them.
7.1 The state machine
Internally, your YACC parser runs a so called 'state machine'. As the name implies, this is a machine that can be in several states. Then there are rules which govern transitions from one state to another. Everything starts with the so called 'root' rule I mentioned earlier.
To quote from the output from the Example 7 y.output:
    state 0
        ZONETOK     , and go to state 1
        $default    reduce using rule 1 (commands)
        commands    go to state 29
        command     go to state 2
        zone_set    go to state 3
By default, this state reduces using the 'commands' rule. This is the aforementioned recursive rule that defines 'commands' to be built up from individual command statements, followed by a semicolon, followed by possibly more commands.
This state reduces until it hits something it understands, in this case, a ZONETOK, ie, the word 'zone'. It then goes to state 1, which deals further with a zone command:
    state 1
        zone_set -> ZONETOK . quotedname zonecontent   (rule 4)
        QUOTE       , and go to state 4
        quotedname go to state 5
The first line has a '.' in it to indicate where we are: we've just seen a ZONETOK and are now looking for a 'quotedname'. Apparently, a quotedname starts with a QUOTE, which sends us to state 4.
To follow this further, compile Example 7 with the flags mentioned in the Debugging section.
7.2 Conflicts: 'shift/reduce', 'reduce/reduce'
Whenever YACC warns you about conflicts, you may be in for trouble. Solving these conflicts appears to be somewhat of an art form that may teach you a lot about your language. More than you possibly would have wanted to know.
The problems revolve around how to interpret a sequence of tokens. Let's suppose we define a language that needs to accept both these commands:
            delete heater all
            delete heater number1
To do this, we define this grammar:
            delete_heaters:
                    TOKDELETE TOKHEATER mode
                    {
                            deleteheaters($3);
                    }

            mode:   WORD
            delete_a_heater:
                    TOKDELETE TOKHEATER WORD
                    {
                            delete($3);
                    }
You may already be smelling trouble. The state machine starts by reading the word 'delete', and then needs to decide where to go based on the next token. This next token can either be a mode, specifying how to delete the heaters, or the name of a heater to delete.
The problem however is that for both commands, the next token is going to be a WORD. YACC has therefore no idea what to do. This leads to a 'reduce/reduce' warning, and a further warning that the 'delete_a_heater' node is never going to be reached.
In this case the conflict is resolved easily (ie, by renaming the first command to 'delete heaters all', or by making 'all' a separate token), but sometimes it is harder. The y.output file generated when you pass yacc the --verbose flag can be of tremendous help.
8. Further reading
GNU YACC (Bison) comes with a very nice info-file (.info) which documents the YACC syntax very well. It mentions Lex only once, but otherwise it's very good. You can read .info files with Emacs or with the very nice tool 'pinfo'. It is also available on the GNU site: BISON Manual .
Flex comes with a good manpage which is very useful if you already have a rough understanding of what Flex does. The Flex Manual is also available online.
After this introduction to Lex and YACC, you may find that you need more information. I haven't read any of these books yet, but they sound good:
Bison-The Yacc-Compatible Parser Generator
    By Charles Donnelly and Richard Stallman. An Amazon user found it useful.
Lex & Yacc
    By John R. Levine, Tony Mason and Doug Brown. Considered to be the standard work on this subject, although a bit dated. Reviews over at Amazon .
Compilers : Principles, Techniques, and Tools
    By Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman. The 'Dragon Book'. From 1985 and they just keep printing it. Considered the standard work on constructing compilers. Amazon
Thomas Niemann wrote a document discussing how to write compilers and calculators with Lex & YACC. You can find it here .
The moderated usenet newsgroup comp.compilers can also be very useful but please keep in mind that the people there are not a dedicated parser helpdesk! Before posting, read their interesting page and especially the FAQ .
Lex - A Lexical Analyzer Generator by M. E. Lesk and E. Schmidt is one of the original reference papers. It can be found here .
Yacc: Yet Another Compiler-Compiler by Stephen C. Johnson is one of the original reference papers for YACC. It can be found here . It contains useful hints on style.
9. Acknowledgements & Thanks
    * Pete Jinks <pjj%cs.man.ac.uk>
    * Chris Lattner <sabre%nondot.org>
    * John W. Millaway <johnmillaway%yahoo.com>
    * Martin Neitzel <neitzel%gaertner.de>
    * Sumit Pandaya <sumit%elitecore.com>
    * Esmond Pitt <esmond.pitt%bigpond.com>
    * Eric S. Raymond
    * Bob Schmertz <schmertz%wam.umd.edu>
    * Adam Sulmicki <adam%cfar.umd.edu>
    * Markus Triska <triska%gmx.at>
    * Erik Verbruggen <erik%road-warrior.cs.kun.nl>
    * Gary V. Vaughan <gary%gnu.org> (read his awesome Autobook )
    * Ivo van der Wijk ( Amaze Internet )
1. Introduction
Welcome, gentle reader.
If you have been programming for any length of time in a Unix environment, you will have encountered the mystical programs Lex & YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison, where Flex is a Lex implementation by Vern Paxson and Bison the GNU version of YACC. We will call these programs Lex and YACC throughout - the newer versions are upwardly compatible, so you can use Flex and Bison when trying our examples.
These programs are massively useful, but as with your C compiler, their manpage does not explain the language they understand, nor how to use them. YACC is really amazing when used in combination with Lex, however, the Bison manpage does not describe how to integrate Lex generated code with your Bison program.
1.1 What this document is NOT
There are several great books which deal with Lex & YACC. By all means read these books if you need to know more. They provide far more information than we ever will. See the 'Further Reading' section at the end. This document is aimed at bootstrapping your use of Lex & YACC, to allow you to create your first programs.
The documentation that comes with Flex and BISON is also excellent, but no tutorial. They do complement my HOWTO very well though. They too are referenced at the end.
I am by no means a YACC/Lex expert. When I started writing this document, I had exactly two days of experience. All I want to accomplish is to make those two days easier for you.
In no way expect the HOWTO to show proper YACC and Lex style. Examples have been kept very simple and there may be better ways to write them. If you know how to, please let me know.
1.2 Downloading stuff
Please note that you can download all the examples shown, which are in machine readable form. See the homepage for details.
1.3 License
Copyright (c) 2001 by bert hubert. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, vX.Y or later (the latest version is presently available at http://www.opencontent.org/openpub/).
2. What Lex & YACC can do for you
When properly used, these programs allow you to parse complex languages with ease. This is a great boon when you want to read a configuration file, or want to write a compiler for any language you (or anyone else) might have invented.
With a little help, which this document will hopefully provide, you will find that you will never write a parser again by hand - Lex & YACC are the tools to do this.
2.1 What each program does on its own
Although these programs shine when used together, they each serve a different purpose. The next chapter will explain what each part does.
3. Lex
The program Lex generates a so called `Lexer'. This is a function that takes a stream of characters as its input, and whenever it sees a group of characters that match a key, takes a certain action. A very simple example:
    %{
    #include <stdio.h>
    %}
    %%
    stop    printf("Stop command received/n");
    start   printf("Start command received/n");
    %%
The first section, in between the %{ and %} pair is included directly in the output program. We need this, because we use printf later on, which is defined in stdio.h.
Sections are separated using '%%', so the first line of the second section starts with the 'stop' key. Whenever the 'stop' key is encountered in the input, the rest of the line (a printf() call) is executed.
Besides 'stop', we've also defined 'start', which otherwise does mostly the same.
We terminate the code section with '%%' again.
To compile Example 1, do this:
    lex example1.l
    cc lex.yy.c -o example1 -ll
    NOTE: If you are using flex, instead of lex, you may have to change '-ll' to '-lfl' in the compilation scripts. RedHat 6.x and SuSE need this, even when you invoke 'flex' as 'lex'!
This will generate the file 'example1'. If you run it, it waits for you to type some input. Whenever you type something that is not matched by any of the defined keys (ie, 'stop' and 'start') it's output again. If you enter 'stop' it will output 'Stop command received';
Terminate with a EOF (^D).
You may wonder how the program runs, as we didn't define a main() function. This function is defined for you in libl (liblex) which we compiled in with the -ll command.
3.1 Regular expressions in matches
This example wasn't very useful in itself, and our next one won't be either. It will however show how to use regular expressions in Lex, which are massively useful later on.
Example 2:
    %{
    #include <stdio.h>
    %}
    %%
    [0123456789]+           printf("NUMBER/n");
    [a-zA-Z][a-zA-Z0-9]*    printf("WORD/n");
    %%
This Lex file describes two kinds of matches (tokens): WORDs and NUMBERs. Regular expressions can be pretty daunting but with only a little work it is easy to understand them. Let's examine the NUMBER match:
[0123456789]+
This says: a sequence of one or more characters from the group 0123456789. We could also have written it shorter as:
[0-9]+
Now, the WORD match is somewhat more involved:
[a-zA-Z][a-zA-Z0-9]*
The first part matches 1 and only 1 character that is between 'a' and 'z', or between 'A' and 'Z'. In other words, a letter. This initial letter then needs to be followed by zero or more characters which are either a letter or a digit. Why use an asterisk here? The '+' signifies 1 or more matches, but a WORD might very well consist of only one character, which we've already matched. So the second part may have zero matches, so we write a '*'.
This way, we've mimicked the behaviour of many programming languages which demand that a variable name *must* start with a letter, but can contain digits afterwards. In other words, 'temperature1' is a valid name, but '1temperature' is not.
Try compiling Example 2, lust like Example 1, and feed it some text. Here is a sample session:
    $ ./example2
    foo
    WORD
    bar
    WORD
    123
    NUMBER
    bar123
    WORD
    123bar
    NUMBER
    WORD
You may also be wondering where all this whitespace is coming from in the output. The reason is simple: it was in the input, and we don't match on it anywhere, so it gets output again.
The Flex manpage documents its regular expressions in detail. Many people feel that the perl regular expression manpage (perlre) is also very useful, although Flex does not implement everything perl does.
Make sure that you do not create zero length matches like '[0-9]*' - your lexer might get confused and start matching empty strings repeatedly.
3.2 A more complicated example for a C like syntax
Let's say we want to parse a file that looks like this:
    logging {
            category lame-servers { null; };
            category cname { null; };
    };
    zone "." {
            type hint;
            file "/etc/bind/db.root";
    };
We clearly see a number of categories (tokens) in this file:
    * WORDs, like 'zone' and 'type'
    * FILENAMEs, like '/etc/bind/db.root'
    * QUOTEs, like those surrounding the filename
    * OBRACEs, {
    * EBRACEs, }
    * SEMICOLONs, ;
The corresponding Lex file is Example 3:
    %{
    #include <stdio.h>
    %}
    %%
    [a-zA-Z][a-zA-Z0-9]*    printf("WORD ");
    [a-zA-Z0-9//.-]+        printf("FILENAME ");
    /"                      printf("QUOTE ");
    /{                      printf("OBRACE ");
    /}                      printf("EBRACE ");
    ;                       printf("SEMICOLON ");
    /n                      printf("/n");
    [ /t]+                  /* ignore whitespace */;
    %%
When we feed our file to the program this Lex file generates (using example3.compile), we get:
    WORD OBRACE
    WORD FILENAME OBRACE WORD SEMICOLON EBRACE SEMICOLON
    WORD WORD OBRACE WORD SEMICOLON EBRACE SEMICOLON
    EBRACE SEMICOLON
    WORD QUOTE FILENAME QUOTE OBRACE
    WORD WORD SEMICOLON
    WORD QUOTE FILENAME QUOTE SEMICOLON
    EBRACE SEMICOLON
When compared with the configuration file mentioned above, it is clear that we have neatly 'Tokenized' it. Each part of the configuration file has been matched, and converted into a token.
And this is exactly what we need to put YACC to good use.
3.3 What we've seen
We've seen that Lex is able to read arbitrary input, and determine what each part of the input is. This is called 'Tokenizing'.
4. YACC
YACC can parse input streams consisting of tokens with certain values. This clearly describes the relation YACC has with Lex, YACC has no idea what 'input streams' are, it needs preprocessed tokens. While you can write your own Tokenizer, we will leave that entirely up to Lex.
A note on grammars and parsers. When YACC saw the light of day, the tool was used to parse input files for compilers: programs. Programs written in a programming language for computers are typically *not* ambiguous - they have just one meaning. As such, YACC does not cope with ambiguity and will complain about shift/reduce or reduce/reduce conflicts. More about ambiguity and YACC "problems" can be found in 'Conflicts' chapter.
4.1 A simple thermostat controller
Let's say we have a thermostat that we want to control using a simple language. A session with the thermostat may look like this:
    heat on
            Heater on!
    heat off
            Heater off!
    target temperature 22
            New temperature set!
The tokens we need to recognize are: heat, on/off (STATE), target, temperature, NUMBER.
The Lex tokenizer (Example 4) is:
    %{
    #include <stdio.h>
    #include "y.tab.h"
    %}
    %%
    [0-9]+                  return NUMBER;
    heat                    return TOKHEAT;
    on|off                  return STATE;
    target                  return TOKTARGET;
    temperature             return TOKTEMPERATURE;
    /n                      /* ignore end of line */;
    [ /t]+                  /* ignore whitespace */;
    %%
We note two important changes. First, we include the file 'y.tab.h', and secondly, we no longer print stuff, we return names of tokens. This change is because we are now feeding it all to YACC, which isn't interested in what we output to the screen. Y.tab.h has definitions for these tokens.
But where does y.tab.h come from? It is generated by YACC from the Grammar File we are about to create. As our language is very basic, so is the grammar:
    commands: /* empty */
            | commands command
            ;
    command:
            heat_switch
            |
            target_set
            ;
    heat_switch:
            TOKHEAT STATE
            {
                    printf("/tHeat turned on or off/n");
            }
            ;
    target_set:
            TOKTARGET TOKTEMPERATURE NUMBER
            {
                    printf("/tTemperature set/n");
            }
            ;
The first part is what I call the 'root'. It tells us that we have 'commands', and that these commands consist of individual 'command' parts. As you can see this rule is very recursive, because it again contains the word 'commands'. What this means is that the program is now capable of reducing a series of commands one by one. Read the chapter 'How do Lex and YACC work internally' for important details on recursion.
The second rule defines what a command is. We support only two kinds of commands, the 'heat_switch' and the 'target_set'. This is what the |-symbol signifies - 'a command consists of either a heat_switch or a target_set'.
A heat_switch consists of the HEAT token, which is simply the word 'heat', followed by a state (which we defined in the Lex file as 'on' or 'off').
Somewhat more complicated is the target_set, which consists of the TARGET token (the word 'target'), the TEMPERATURE token (the word 'temperature') and a number.
A complete YACC file
The previous section only showed the grammar part of the YACC file, but there is more. This is the header that we omitted:
    %{
    #include <stdio.h>
    #include <string.h>

    void yyerror(const char *str)
    {
            fprintf(stderr,"error: %s/n",str);
    }

    int yywrap()
    {
            return 1;
    }

    main()
    {
            yyparse();
    }
    %}
    %token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE
The yyerror() function is called by YACC if it finds an error. We simply output the message passed, but there are smarter things to do. See the 'Further reading' section at the end.
The function yywrap() can be used to continue reading from another file. It is called at EOF and you can than open another file, and return 0. Or you can return 1, indicating that this is truly the end. For more about this, see the 'How do Lex and YACC work internally' chapter.
Then there is the main() function, that does nothing but set everything in motion.
The last line simply defines the tokens we will be using. These are output using y.tab.h if YACC is invoked with the '-d' option.
Compiling & running the thermostat controller
    lex example4.l
    yacc -d example4.y
    cc lex.yy.c y.tab.c -o example4
A few things have changed. We now also invoke YACC to compile our grammar, which creates y.tab.c and y.tab.h. We then call Lex as usual. When compiling, we remove the -ll flag: we now have our own main() function and don't need the one provided by libl.
    NOTE: if you get an error about your compiler not being able to find 'yylval', add this to example4.l, just beneath #include <y.tab.h>:
    extern YYSTYPE yylval;
    This is explained in the 'How Lex and YACC work internally' section.
A sample session:
    $ ./example4
    heat on
            Heat turned on or off
    heat off
            Heat turned on or off
    target temperature 10
            Temperature set
    target humidity 20
    error: parse error
    $
This is not quite what we set out to achieve, but in the interest of keeping the learning curve manageable, not all cool stuff can be presented at once.
4.2 Expanding the thermostat to handle parameters
As we've seen, we now parse the thermostat commands correctly, and even flag mistakes properly. But as you might have guessed by the weasely wording, the program has no idea of what it should do, it does not get passed any of the values you enter.
Let's start by adding the ability to read the new target temperature. In order to do so, we need to learn the NUMBER match in the Lexer to convert itself into an integer value, which can then be read in YACC.
Whenever Lex matches a target, it puts the text of the match in the character string 'yytext'. YACC in turn expects to find a value in the variable 'yylval'. In Example 5, we see the obvious solution:
    %{
    #include <stdio.h>
    #include "y.tab.h"
    %}
    %%
    [0-9]+                  yylval=atoi(yytext); return NUMBER;
    heat                    return TOKHEAT;
    on|off                  yylval=!strcmp(yytext,"on"); return STATE;
    target                  return TOKTARGET;
    temperature             return TOKTEMPERATURE;
    /n                      /* ignore end of line */;
    [ /t]+                  /* ignore whitespace */;
    %%
As you can see, we run atoi() on yytext, and put the result in yylval, where YACC can see it. We do much the same for the STATE match, where we compare it to 'on', and set yylval to 1 if it is equal. Please note that having a separate 'on' and 'off' match in Lex would produce faster code, but I wanted to show a more complicated rule and action for a change.
Now we need to learn YACC how to deal with this. What is called 'yylval' in Lex has a different name in YACC. Let's examine the rule setting the new temperature target:
    target_set:
            TOKTARGET TOKTEMPERATURE NUMBER
            {
                    printf("/tTemperature set to %d/n",$3);
            }
            ;
To access the value of the third part of the rule (ie, NUMBER), we need to use $3. Whenever yylex() returns, the contents of yylval are attached to the terminal, the value of which can be accessed with the $-construct.
To expound on this further, let's observe the new 'heat_switch' rule:
    heat_switch:
            TOKHEAT STATE
            {
                    if($2)
                            printf("/tHeat turned on/n");
                    else
                            printf("/tHeat turned off/n");
            }
            ;
If you now run example5, it properly outputs what you entered.
4.3 Parsing a configuration file
Let's repeat part of the configuration file we mentioned earlier:
    zone "." {
            type hint;
            file "/etc/bind/db.root";
    };
Remember that we already wrote a Lexer for this file. Now all we need to do is write the YACC grammar, and modify the Lexer so it returns values in a format YACC can understand.
In the lexer from Example 6 we see:
    %{
    #include <stdio.h>
    #include "y.tab.h"
    %}
    %%
    zone                    return ZONETOK;
    file                    return FILETOK;
    [a-zA-Z][a-zA-Z0-9]*    yylval=strdup(yytext); return WORD;
    [a-zA-Z0-9//.-]+        yylval=strdup(yytext); return FILENAME;
    /"                      return QUOTE;
    /{                      return OBRACE;
    /}                      return EBRACE;
    ;                       return SEMICOLON;
    /n                      /* ignore EOL */;
    [ /t]+                  /* ignore whitespace */;
    %%
If you look carefully, you can see that yylval has changed! We no longer expect it to be an integer, but in fact assume that it is a char *. In the interest of keeping things simple, we invoke strdup and waste a lot of memory. Please note that this may not be a problem in many areas where you only need to parse a file once, and then exit.
We want to store character strings because we are now mostly dealing with names: file names and zone names. In a later chapter we will explain how to deal with multiple types of data.
In order to tell YACC about the new type of yylval, we add this line to the header of our YACC grammar:
#define YYSTYPE char *
The grammar itself is again more complicated. We chop it in parts to make it easier to digest.
    commands:
            |
            commands command SEMICOLON
            ;
    command:
            zone_set
            ;
    zone_set:
            ZONETOK quotedname zonecontent
            {
                    printf("Complete zone for '%s' found/n",$2);
            }
            ;
This is the intro, including the aforementioned recursive 'root'. Please note that we specify that commands are terminated (and separated) by ;'s. We define one kind of command, the 'zone_set'. It consists of the ZONE token (the word 'zone'), followed by a quoted name and the 'zonecontent'. This zonecontent starts out simple enough:
    zonecontent:
            OBRACE zonestatements EBRACE
It needs to start with an OBRACE, a {. Then follow the zonestatements, followed by an EBRACE, }.
    quotedname:
            QUOTE FILENAME QUOTE
            {
                    $$=$2;
            }
This section defines what a 'quotedname' is: a FILENAME between QUOTEs. Then it says something special: the value of a quotedname token is the value of the FILENAME. This means that the quotedname has as its value the filename without quotes.
This is what the magic '$$=$2;' command does. It says: my value is the value of my second part. When the quotedname is now referenced in other rules, and you access its value with the $-construct, you see the value that we set here with $$=$2.
    NOTE: this grammar chokes on filenames without either a '.' or a '/' in them.
    zonestatements:
            |
            zonestatements zonestatement SEMICOLON
            ;
    zonestatement:
            statements
            |
            FILETOK quotedname
            {
                    printf("A zonefile name '%s' was encountered/n", $2);
            }
            ;
This is a generic statement that catches all kinds of statements within the 'zone' block. We again see the recursiveness.
    block:
            OBRACE zonestatements EBRACE SEMICOLON
            ;
    statements:
            | statements statement
            ;
    statement: WORD | block | quotedname
This defines a block, and 'statements' which may be found within.
When executed, the output is like this:
    $ ./example6
    zone "." {
            type hint;
            file "/etc/bind/db.root";
            type hint;
    };
    A zonefile name '/etc/bind/db.root' was encountered
    Complete zone for '.' found
5. Making a Parser in C++
Although Lex and YACC predate C++, it is possible to generate a C++ parser. While Flex includes an option to generate a C++ lexer, we won't be using that, as YACC doesn't know how to deal with it directly.
My preferred way to make a C++ parser is to have Lex generate a plain C file, and to let YACC generate C++ code. When you then link your application, you may run into some problems because the C++ code by default won't be able to find C functions, unless you've told it that those functions are extern "C".
To do so, make a C header in YACC like this:
    extern "C"
    {
            int yyparse(void);
            int yylex(void);
            int yywrap()
            {
                    return 1;
            }
    }
If you want to declare or change yydebug, you must now do it like this:
    extern int yydebug;
    main()
    {
            yydebug=1;
            yyparse();
    }
This is because C++'s One Definition Rule, which disallows multiple definitions of yydebug.
You may also find that you need to repeat the #define of YYSTYPE in your Lex file, because of C++'s stricter type checking.
To compile, do something like this:
    lex bindconfig2.l
    yacc --verbose --debug -d bindconfig2.y -o bindconfig2.cc
    cc -c lex.yy.c -o lex.yy.o
    c++ lex.yy.o bindconfig2.cc -o bindconfig2
Because of the -o statement, y.tab.h is now called bindconfig2.cc.h, so take that into account.
To summarize: don't bother to compile your Lexer in C++, keep it in C. Make your Parser in C++ and explain your compiler that some functions are C functions with extern "C" statements.
6. How do Lex and YACC work internally
In the YACC file, you write your own main() function, which calls yyparse() at one point. The function yyparse() is created for you by YACC, and ends up in y.tab.c.
yyparse() reads a stream of token/value pairs from yylex(), which needs to be supplied. You can code this function yourself, or have Lex do it for you. In our examples, we've chosen to leave this task to Lex.
The yylex() as written by Lex reads characters from a FILE * file pointer called yyin. If you do not set yyin, it defaults to standard input. It outputs to yyout, which if unset defaults to stdout. You can also modify yyin in the yywrap() function which is called at the end of a file. It allows you to open another file, and continue parsing.
If this is the case, have it return 0. If you want to end parsing at this file, let it return 1.
Each call to yylex() returns an integer value which represents a token type. This tells YACC what kind of token it has read. The token may optionally have a value, which should be placed in the variable yylval.
By default yylval is of type int, but you can override that from the YACC file by re#defining YYSTYPE.
The Lexer needs to be able to access yylval. In order to do so, it must be declared in the scope of the lexer as an extern variable. The original YACC neglects to do this for you, so you should add the following to your lexter, just beneath #include <y.tab.h>:
extern YYSTYPE yylval;
Bison, which most people are using these days, does this for you automatically.
6.1 Token values
As mentioned before, yylex() needs to return what kind of token it encountered, and put its value in yylval. When these tokens are defined with the %token command, they are assigned numerical id's, starting from 256.
Because of that fact, it is possible to have all ascii characters as a token. Let's say you are writing a calculator, up till now we would have written the lexer like this:
    [0-9]+          yylval=atoi(yytext); return NUMBER;
    [ /n]+          /* eat whitespace */;
    -               return MINUS;
    /*              return MULT;
    /+              return PLUS;
    ...
Our YACC grammer would then contain:
            exp:    NUMBER
                    |
                    exp PLUS exp
                    |
                    exp MINUS exp
                    |
                    exp MULT exp
This is needlessly complicated. By using characters as shorthands for numerical token id's, we can rewrite our lexer like this:
[0-9]+          yylval=atoi(yytext); return NUMBER;
[ /n]+          /* eat whitespace */;
.               return (int) yytext[0];
This last dot matches all single otherwise unmatched characters.
Our YACC grammer would then be:
            exp:    NUMBER
                    |
                    exp '+' exp
                    |
                    exp '-' exp
                    |
                    exp '*' exp
This is lots shorter and also more obvious. You do not need to declare these ascii tokens with %token in the header, they work out of the box.
One other very good thing about this construct is that Lex will now match everything we throw at it - avoiding the default behaviour of echoing unmatched input to standard output. If a user of this calculator uses a ^, for example, it will now generate a parsing error, instead of being echoed to standard output.
6.2 Recursion: 'right is wrong'
Recursion is a vital aspect of YACC. Without it, you can't specify that a file consists of a sequence of independent commands or statements. Out of its own accord, YACC is only interested in the first rule, or the one you designate as the starting rule, with the '%start' symbol.
Recursion in YACC comes in two flavours: right and left. Left recursion, which is the one you should use most of the time, looks like this:
commands: /* empty */
        |
        commands command
This says: a command is either empty, or it consists of more commands, followed by a command. They way YACC works means that it can now easily chop off individual command groups (from the front) and reduce them.
Compare this to right recursion, which confusingly enough looks better to many eyes:
commands: /* empty */
        |
        command commands
But this is expensive. If used as the %start rule, it requires YACC to keep all commands in your file on the stack, which may take a lot of memory. So by all means, use left recursion when parsing long statements, like entire files. Sometimes it is hard to avoid right recursion but if your statements are not too long, you do not need to go out of your way to use left recursion.
If you have something terminating (and therefore separating) your commands, right recursion looks very natural, but is still expensive:
commands: /* empty */
        |
        command SEMICOLON commands
The right way to code this is using left recursion (I didn't invent this either):
commands: /* empty */
        |
        commands command SEMICOLON
Earlier versions of this HOWTO mistakenly used right recursion. Markus Triska kindly informed us of this.
6.3 Advanced yylval: %union
Currently, we need to define *the* type of yylval. This however is not always appropriate. There will be times when we need to be able to handle multiple data types. Returning to our hypothetical thermostat, perhaps we want to be able to choose a heater to control, like this:
    heater mainbuiling
            Selected 'mainbuilding' heater
    target temperature 23
            'mainbuilding' heater target temperature now 23
What this calls for is for yylval to be a union, which can hold both strings and integers - but not simultaneously.
Remember that we told YACC previously what type yylval was supposed to by by defining YYSTYPE. We could conceivably define YYSTYPE to be a union this way, by YACC has an easier method for doing this: the %union statement.
Based on Example 4, we now write the Example 7 YACC grammar. First the intro:
    %token TOKHEATER TOKHEAT TOKTARGET TOKTEMPERATURE
    %union
    {
            int number;
            char *string;
    }
    %token <number> STATE
    %token <number> NUMBER
    %token <string> WORD
We define our union, which contains only a number and a string. Then using an extended %token syntax, we explain to YACC which part of the union each token should access.
In this case, we let the STATE token use an integer, as before. Same goes for the NUMBER token, which we use for reading temperatures.
New however is the WORD token, which is declared to need a string.
The Lexer file changes a bit too:
    %{
    #include <stdio.h>
    #include <string.h>
    #include "y.tab.h"
    %}
    %%
    [0-9]+                  yylval.number=atoi(yytext); return NUMBER;
    heater                  return TOKHEATER;
    heat                    return TOKHEAT;
    on|off                  yylval.number=!strcmp(yytext,"on"); return STATE;
    target                  return TOKTARGET;
    temperature             return TOKTEMPERATURE;
    [a-z0-9]+               yylval.string=strdup(yytext);return WORD;
    /n                      /* ignore end of line */;
    [ /t]+                  /* ignore whitespace */;
    %%
As you can see, we don't access the yylval directly anymore, we add a suffix indicating which part we want to access. We don't need to do that in the YACC grammar however, as YACC performs the magic for us:
    heater_select:
            TOKHEATER WORD
            {
                    printf("/tSelected heater '%s'/n",$2);
                    heater=$2;
            }
            ;
Because of the %token declaration above, YACC automatically picks the 'string' member from our union. Note also that we store a copy of $2, which is later used to tell the user which heater he is sending commands to:
    target_set:
            TOKTARGET TOKTEMPERATURE NUMBER
            {
                    printf("/tHeater '%s' temperature set to %d/n",heater,$3);
            }
            ;
For more details, read example7.y.
7. Debugging
Especially when learning, it is important to have debugging facilities. Luckily, YACC can give a lot of feedback. This feedback comes at the cost of some overhead, so you need to supply some switches to enable it.
When compiling your grammar, add --debug and --verbose to the YACC commandline. In your grammar C heading, add the following:
int yydebug=1;
This will generate the file 'y.output' which explains the state machine that was created.
When you now run the generated binary, it will output a *lot* of what is happening. This includes what state the state machine currently has, and what tokens are being read.
Peter Jinks wrote a page on debugging which contains some common errors and how to solve them.
7.1 The state machine
Internally, your YACC parser runs a so called 'state machine'. As the name implies, this is a machine that can be in several states. Then there are rules which govern transitions from one state to another. Everything starts with the so called 'root' rule I mentioned earlier.
To quote from the output from the Example 7 y.output:
    state 0
        ZONETOK     , and go to state 1
        $default    reduce using rule 1 (commands)
        commands    go to state 29
        command     go to state 2
        zone_set    go to state 3
By default, this state reduces using the 'commands' rule. This is the aforementioned recursive rule that defines 'commands' to be built up from individual command statements, followed by a semicolon, followed by possibly more commands.
This state reduces until it hits something it understands, in this case, a ZONETOK, ie, the word 'zone'. It then goes to state 1, which deals further with a zone command:
    state 1
        zone_set -> ZONETOK . quotedname zonecontent   (rule 4)
        QUOTE       , and go to state 4
        quotedname go to state 5
The first line has a '.' in it to indicate where we are: we've just seen a ZONETOK and are now looking for a 'quotedname'. Apparently, a quotedname starts with a QUOTE, which sends us to state 4.
To follow this further, compile Example 7 with the flags mentioned in the Debugging section.
7.2 Conflicts: 'shift/reduce', 'reduce/reduce'
Whenever YACC warns you about conflicts, you may be in for trouble. Solving these conflicts appears to be somewhat of an art form that may teach you a lot about your language. More than you possibly would have wanted to know.
The problems revolve around how to interpret a sequence of tokens. Let's suppose we define a language that needs to accept both these commands:
            delete heater all
            delete heater number1
To do this, we define this grammar:
            delete_heaters:
                    TOKDELETE TOKHEATER mode
                    {
                            deleteheaters($3);
                    }

            mode:   WORD
            delete_a_heater:
                    TOKDELETE TOKHEATER WORD
                    {
                            delete($3);
                    }
You may already be smelling trouble. The state machine starts by reading the word 'delete', and then needs to decide where to go based on the next token. This next token can either be a mode, specifying how to delete the heaters, or the name of a heater to delete.
The problem however is that for both commands, the next token is going to be a WORD. YACC has therefore no idea what to do. This leads to a 'reduce/reduce' warning, and a further warning that the 'delete_a_heater' node is never going to be reached.
In this case the conflict is resolved easily (ie, by renaming the first command to 'delete heaters all', or by making 'all' a separate token), but sometimes it is harder. The y.output file generated when you pass yacc the --verbose flag can be of tremendous help.
8. Further reading
GNU YACC (Bison) comes with a very nice info-file (.info) which documents the YACC syntax very well. It mentions Lex only once, but otherwise it's very good. You can read .info files with Emacs or with the very nice tool 'pinfo'. It is also available on the GNU site: BISON Manual .
Flex comes with a good manpage which is very useful if you already have a rough understanding of what Flex does. The Flex Manual is also available online.
After this introduction to Lex and YACC, you may find that you need more information. I haven't read any of these books yet, but they sound good:
Bison-The Yacc-Compatible Parser Generator
    By Charles Donnelly and Richard Stallman. An Amazon user found it useful.
Lex & Yacc
    By John R. Levine, Tony Mason and Doug Brown. Considered to be the standard work on this subject, although a bit dated. Reviews over at Amazon .
Compilers : Principles, Techniques, and Tools
    By Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman. The 'Dragon Book'. From 1985 and they just keep printing it. Considered the standard work on constructing compilers. Amazon
Thomas Niemann wrote a document discussing how to write compilers and calculators with Lex & YACC. You can find it here .
The moderated usenet newsgroup comp.compilers can also be very useful but please keep in mind that the people there are not a dedicated parser helpdesk! Before posting, read their interesting page and especially the FAQ .
Lex - A Lexical Analyzer Generator by M. E. Lesk and E. Schmidt is one of the original reference papers. It can be found here .
Yacc: Yet Another Compiler-Compiler by Stephen C. Johnson is one of the original reference papers for YACC. It can be found here . It contains useful hints on style.
9. Acknowledgements & Thanks
    * Pete Jinks <pjj%cs.man.ac.uk>
    * Chris Lattner <sabre%nondot.org>
    * John W. Millaway <johnmillaway%yahoo.com>
    * Martin Neitzel <neitzel%gaertner.de>
    * Sumit Pandaya <sumit%elitecore.com>
    * Esmond Pitt <esmond.pitt%bigpond.com>
    * Eric S. Raymond
    * Bob Schmertz <schmertz%wam.umd.edu>
    * Adam Sulmicki <adam%cfar.umd.edu>
    * Markus Triska <triska%gmx.at>
    * Erik Verbruggen <erik%road-warrior.cs.kun.nl>
    * Gary V. Vaughan <gary%gnu.org> (read his awesome Autobook )
    * Ivo van der Wijk ( Amaze Internet )

你可能感兴趣的:(command,File,token,yacc,recursion,whitespace)

uniapp中uploadFile的用法 Vae_Mars uniapp uni-app 前端数据库 mysql
基本语法uni.uploadFile(OBJECT)OBJECT是一个包含上传相关配置的对象，常见参数如下：参数类型必填说明urlString是开发者服务器地址。filePathString是要上传文件资源的本地路径。nameString是文件对应的key，开发者在服务端可以通过这个key获取文件的二进制内容。headerObject否HTTP请求Header，Header中不能设置Referer
Linux搭建NFS服务零一客 linux 运维云计算
1.概述NetworkFileSystem的缩写，它最大的功能是可以通过网络使用挂载的方式，让不同的机器、不同的操作系统可以共享彼此的文件2.名称软件名nfs-utils服务名nfs或者nfs-server3.端口nfs-servertcp/2049负责建立连接rpcbindtcp/111负责传输数据4.配置文件主配置文件/etc/nfs.conf存储配置文件/etc/exports#书写格式：共
【springboot】一一一一访问本地磁盘路径下的图片暴力袋鼠哥 springBoot java spring
配置properties#上传路径videoIconLocation=E:/JavaCode/Carimgspring.mvc.static-path-pattern=/**spring.resources.static-locations=classpath:/static/,file:${videoIconLocation}前端数据省略后台代码privatefinalStringIMG_URL
django-rest-framework-jwt与django-rest-framework-simplejwt的对比及使用 2401_87298624 django python 后端
作者在YouTube讲解Token结构通过他的讲解，我们发现token分为三部分，以"."进行分割，使用Base64编码。第一部分我们称它为头部（header)第二部分我们称其为净负载（payload)第三部分是签名（signature)作者讲解Header需要加参数通过作者讲解可知道，添加验证后，需要在Header中添加参数Authorization,内容为JWT。使用文档通过使用文档，我们可以
python版聊天软件 qinhoupingss Python
#!/usr/bin/python3.4fromtkinterimport*fromtkinter.filedialogimportaskopenfilenameimporttimefromthreadingimportThreadimportsocketimportsysimportos.pathimportprocess#聊天实现类classChatClient(Thread):#构造函数de
Java基础语法与相应面试技巧 self-discipline634 java 青少年编程开发语言
Java基础语法与相应面试技巧注释//单行注释/*多行注释*//**文档注释（Javadoc）*@param参数说明*@return返回值说明*@throws异常说明*/面试题：如何生成API文档？javadoc-ddocfilename.java文档注释中常见的tag有哪些？Tag作用示例@param描述方法参数@paramname用户名@return描述返回值@return操作是否成功@thr
命令模式介绍及应用案例高飞的Leo 设计模式命令模式
命令模式介绍命令模式（CommandPattern）是一种行为设计模式，它将请求封装为一个对象，从而使你可以用不同的请求对客户进行参数化，并且支持请求的排队、记录日志、撤销操作等功能。命令模式的核心思想是将“请求”封装成对象，使得请求的发送者和接收者解耦，从而可以灵活地扩展和修改请求的处理逻辑。命令模式的主要角色：Command（命令接口）：定义执行操作的接口。ConcreteCommand（具体
Gitee批量删除仓库 jaymou 开发工具 gitee
Gitee批量删除仓库文章目录Gitee批量删除仓库生成一个GiteeToken通过Python调用GiteeAPI参考文档生成一个GiteeToken右上角下拉->设置->安全设置->私人令牌->生成新令牌，注意将令牌保存（只会出现一次）通过Python调用GiteeAPI顶部帮助与支持->产品文档->OpenAPI文档API地址：Gitee帮助中心根据相关API生成的Python代码impor
linux官方文档链接、EXT4_DEFM_JMODE_DATA yangzhao0001 ext4
https://docs.kernel.org/https://www.kernel.org/doc/EXT4_DEFM_JMODE_DATAhttps://www.kernel.org/doc/html/v4.19/filesystems/ext4/ondisk/index.html
【20期获取股票数据API接口】如何用Python、Java等五种主流语言实例演示获取股票行情api接口之沪深A股实时最新分时MACD数据及接口API说明文档无名的小码农 python java 开发语言股票api 股票数据股票数据接口
在量化分析领域，实时且准确的数据接口是成功的基石。经过多次实际测试，我将已确认可用的数据接口分享给正在从事量化分析的朋友们，希望能够对你们的研究和工作有所帮助，接下来我会用Python、JavaScript（Node.js）、Java、C#和Ruby五种主流语言的实例代码给大家逐一演示一下如何获取各类股票数据。在下方，所有演示中的API接口Url链接结尾的ZHITU_TOKEN_LIMIT_TES
产品端对接三方登录设计方案后端java程序员
写在最前面，该方案是三方应用免登设计方案。去除了涵盖公司机密信息部分，阅读起来只能给大家一个思路参考~流程说明对接钉钉、飞书、企业微信及自研三方平台实现H5免登录，不同平台协议的核心差异在授权流程，下面详细对钉钉侧说明。钉钉（企业内部应用授权协议）临时凭证时效性：通过dd.getAuthCode获取的授权码（code）仅5分钟有效，且需后端在失效前完成access_token和用户信息的获取。两步
cmd：读取电脑硬件序列号 _乐多_ cmd python python cmd
一、读取电脑硬件序列号1.cmd在没有使用第三方库的情况下，要读取电脑的硬件序列号通常需要使用操作系统提供的工具或命令行。以下是一个示例，展示如何使用Windows操作系统的命令行工具wmic来获取硬件序列号：打开命令提示符（CommandPrompt）：按下Win+R，输入cmd，然后按Enter。在命令提示符中，输入以下命令以获取计算机的硬件序列号：wmicbiosgetserialnumbe
‌【Python性能革命】：深入解析高性能编程与六大核心优化技术（附完整代码实战）一个天蝎座白勺程序猿 python 开发语言 numpy numba
目录‌一、背景与挑战：为什么Python需要性能优化？‌‌二、性能分析：定位瓶颈的四大工具‌‌1.cProfile：函数级耗时分析‌2.line_profiler：逐行代码分析‌3.memory_profiler：内存占用分析‌4.py-spy：实时性能监控‌三、六大核心优化技术详解‌‌1.算法与数据结构优化‌‌2.向量化计算：NumPy替代原生循环‌‌3.并发与并行：突破GIL限制‌‌4.JIT
ARM驱动学习之静态申请字符类设备号 JT灬新一嵌入式 C arm开发学习
ARM驱动学习之静态申请字符类设备号内核文件：宏定义MKDEV的头文件“include/linux/kdev_t.h”register_chrdev_region和unregister_chrdev_region在头文件"include/linux/fs.h"1.“module_param.c“改写为“request_cdev_num.c”修改Makefile对应名字2.添加linux头文件/*字
一周掌握Flutter开发--8. 调试与性能优化（下）江上清风山间明月 Flutter flutter 性能优化 android
文章目录8.调试与性能优化（下）8.4使用`RepaintBoundary`优化渲染性能8.5使用`const`构造函数8.6避免在`build`方法中执行耗时操作8.7使用`Profile`模式测试性能8.8使用`PerformanceOverlay`监控帧率8.9使用`MemoryProfiler`检测内存泄漏8.10使用`Isolate`处理耗时任务总结8.调试与性能优化（下）在上一部分中，
PaddlePaddle Uie-Base 信息抽取 weixin_37806923 paddlepaddle 人工智能
微调代码，打标签后的文件放在work目录下不会被删除，若放在data下重启环境后会被删掉pythondoccano.py\--doccano_file./work/admin.jsonl\--task_typeext\--save_dir./data\--splits0.80.20\--schema_langchexportfinetuned_model=./checkpoint/model_be
JavaScript实现批量修改文件类型算法(附完整源码) 源代码大师 JavaScript实战教程 1024程序员节
JavaScript实现批量修改文件类型算法以下是JavaScript实现批量修改文件类型的完整源码：//获取文件夹中所有文件functiongetAllFiles(dirPath,arrayOfFiles){constfiles=fs.readdirSync(dirPath)arrayOfFiles=arrayOfFiles||[]files.forEach(function(file){if(
Visual Studio | 性能探测器一个不务正业的程序猿开发工具 visual studio ide
文章目录一、性能探测器1、核心功能2、数据采集3、数据分析3.1、CPU分析前言：VisualStudio（VS）提供的性能探测器（PerformanceProfiler）是一款强大的工具，它能够帮助开发者分析应用程序的性能，找出性能瓶颈，进而优化代码。一、性能探测器1、核心功能VS性能探测器提供的核心功能，如下：CPU分析：记录各函数CPU使用时间，找出高CPU占用代码段，优化程序运行效率。内存
【Tauri2】006——注册通信函数疏狂难除 Tauri2 Tauri2 Rust
前言【Tauri2】005——tauri::command属性与invoke函数-CSDN博客https://blog.csdn.net/qq_63401240/article/details/146581991?spm=1001.2014.3001.5502前面说过，通信函数greet被属性command修饰，在代码模板中创造了宏__cmd__greet这里就介绍一下怎么注册的.invoke_h
fopen()，fopen_s()和wfopen_s()的差异程工助力英语中国话 Visual C++2017 从入门到精通 mfc fopen fopen_s wfopen_s
书籍：《VisualC++2017从入门到精通》的2.7字符串环境：visualstudio2022内容：GetBuffer()函数以下是fopen()、fopen_s()和_wfopen_s()的核心差异解析，结合安全特性、参数设计及适用场景：1.函数原型与参数设计特性fopen()fopen_s()_wfopen_s()参数类型constchar*filename,constchar*mode
EF Core 乐观并发控制（并发令牌） lgaof65822@gmail.com .netcore
文章目录前言一、乐观并发的核心思想二、实现方法1）使用并发令牌（ConcurrencyToken）2）处理并发冲突三、工作原理四、适用场景五、与悲观并发的对比六、最佳实践总结前言EntityFramework(EF)Core默认支持乐观并发控制（OptimisticConcurrencyControl），它通过检测数据冲突（而不是显式加锁）来保证数据一致性。一、乐观并发的核心思想无锁机制：允许多个
【Tauri2】008——简单说说配置文件疏狂难除 Tauri2 Tauri2 Rust
前言配置文件，即tauri.conf.jsonConfigurationFiles|Taurihttps://tauri.app/zh-cn/develop/configuration-files/这个文件的作用该文件由Tauri运行时和TauriCLI使用。你可以定义构建设置（例如在tauribuild或tauridev启动前运行的命令），设置应用程序的名称和版本，控制Tauri运行时，以及配置
Servlet案例下载文件，图片验证码星星不打輰 JavaWeb servlet
Servlet案例（下载文件，图片验证码）实现图片下载，响应到浏览器中去（设置响应的附件名response.setHeader("Content-Disposition","attachment;filename="+UUID.randomUUID().toString()+".png");）//TODO:实现文件的下载操作@WebServlet("/servlet5")publicclassMy
ngx_http_core_error_page 若云止水 nginx
定义在src\http\ngx_http_core_module.cstaticchar*ngx_http_core_error_page(ngx_conf_t*cf,ngx_command_t*cmd,void*conf){ngx_http_core_loc_conf_t*clcf=conf;u_char*p;ngx_int_toverwrite;ngx_str_t*value,uri,args
【Docker项目实战】使用Docker部署NoteFlow笔记工具江湖有缘 Docker部署项目实战合集 docker 笔记 oracle
【Docker项目实战】使用Docker部署NoteFlow笔记工具前言一、NoteFlow介绍1.1NoteFlow简介1.2主要特点包括：二、本次实践规划2.1本地环境规划2.2本次实践介绍三、本地环境检查3.1检查Docker服务状态3.2检查Docker版本3.3检查dockercompose版本四、构建NoteFlow容器4.1拉取NoteFlow项目4.2编辑Dockerfile文件4
【Docker镜像】Python项目之使用Dockerfile构建镜像（二）江湖有缘 Docker小白快速入门 docker python 容器
【Docker镜像】Python项目之使用Dockerfile构建镜像前言一、Docker介绍1.1Docker简介1.2Docker特点1.3Docker镜像简介二、Dockerfile介绍2.1简介2.2主要特点三、本次实践规划2.1本地环境规划2.2本次实践介绍三、本地环境检查3.1检查Docker服务状态3.2检查Docker版本3.3检查dockercompose版本四、编辑python
老婆问我：“大模型的 Token 究竟是个啥？” 人工智能
什么是Token？最近DeepSeek很火，老婆又问我：大模型里的Token到底是个什么东西？我：所谓Token，Token，分而治之。“Token就是模型眼中的‘最小语言单位’。”它既不是一个完整的字，也不一定是一个完整的词，而是介于两者之间的东西。比如：“我爱吃苹果”→可能被拆成["我","爱","吃","苹果"]“Artificialintelligence”→可能被拆成["Artifici
nginx：关于刷新404问题 dingcho 运维前端 nginx 运维
location/{root/www/wwww.kingbal.com;if(!-e$request_filename){rewrite^(.*)$/index.html?s=$1last;break;}}
Transformer架构完整代码示例码猿小菜鸡计算机视觉人工智能 transformer 深度学习人工智能 pytorch 源代码管理
Transformer架构完整代码#!/usr/bin/python3.9#-*-coding:utf-8-*-#@Time:2023/6/2910:48#@File:abd_transformer_cyd.py#@Software:PyCharmimportmathimporttorchimportcollectionsimportnumpyasnpimporttorch.nnasnnfromc
库学习02-Pandas库 m0_74803856 库学习学习 pandas python 自然语言处理
以下参考RealPython:https://realpython.com/pandas-read-write-files/https://realpython.com/pandas-dataframe/一、简介pandas是一个功能强大且灵活的Python包，可让您处理标记和时间序列数据。它还提供统计方法、绘图等。pandas的一个重要特性是它能够写入和读取Excel、CSV和许多其他类型的文件
java解析APK 3213213333332132 java apk linux 解析APK
解析apk有两种方法 1、结合安卓提供apktool工具，用java执行cmd解析命令获取apk信息 2、利用相关jar包里的集成方法解析apk 这里只给出第二种方法，因为第一种方法在linux服务器下会出现不在控制范围之内的结果。 public class ApkUtil { /** * 日志对象 */ private static Logger
nginx自定义ip访问N种方法 ronin47 nginx 禁止ip访问
　　　因业务需要，禁止一部分内网访问接口，　由于前端架了F5，直接用deny或allow是不行的，这是因为直接获取的前端Ｆ５的地址。　　　所以开始思考有哪些主案可以实现这样的需求，目前可实施的是三种：　　　一：把ip段放在redis里，写一段lua 二：利用geo传递变量，写一段
mysql timestamp类型字段的CURRENT_TIMESTAMP与ON UPDATE CURRENT_TIMESTAMP属性 dcj3sjt126com mysql
timestamp有两个属性，分别是CURRENT_TIMESTAMP 和ON UPDATE CURRENT_TIMESTAMP两种，使用情况分别如下： 1. CURRENT_TIMESTAMP 当要向数据库执行insert操作时，如果有个timestamp字段属性设为 CURRENT_TIMESTAMP，则无论这
struts2+spring+hibernate分页显示 171815164 Hibernate
分页显示一直是web开发中一大烦琐的难题，传统的网页设计只在一个JSP或者ASP页面中书写所有关于数据库操作的代码，那样做分页可能简单一点，但当把网站分层开发后，分页就比较困难了，下面是我做Spring+Hibernate+Struts2项目时设计的分页代码，与大家分享交流。　　1、DAO层接口的设计，在MemberDao接口中定义了如下两个方法： public in
构建自己的Wrapper应用 g21121 rap
我们已经了解Wrapper的目录结构，下面可是正式利用Wrapper来包装我们自己的应用，这里假设Wrapper的安装目录为:/usr/local/wrapper。首先，创建项目应用 &nb
[简单]工作记录_多线程相关 53873039oycg 多线程
最近遇到多线程的问题,原来使用异步请求多个接口(n*3次请求) 方案一使用多线程一次返回数据,最开始是使用5个线程,一个线程顺序请求3个接口,超时终止返回缺点测试发现必须3个接
调试jdk中的源码，查看jdk局部变量程序员是怎么炼成的 jdk 源码
转自：http://www.douban.com/note/211369821/ 学习jdk源码时使用-- 学习java最好的办法就是看jdk源代码，面对浩瀚的jdk（光源码就有40M多，比一个大型网站的源码都多）从何入手呢，要是能单步调试跟进到jdk源码里并且能查看其中的局部变量最好了。可惜的是sun提供的jdk并不能查看运行中的局部变量
Oracle RAC Failover 详解 aijuans oracle
Oracle RAC 同时具备HA(High Availiablity) 和LB(LoadBalance). 而其高可用性的基础就是Failover(故障转移). 它指集群中任何一个节点的故障都不会影响用户的使用，连接到故障节点的用户会被自动转移到健康节点，从用户感受而言，是感觉不到这种切换。 Oracle 10g RAC 的Failover 可以分为3种： 1. Client-Si
form表单提交数据编码方式及tomcat的接受编码方式 antonyup_2006 JavaScript tomcat 浏览器互联网 servlet
原帖地址：http://www.iteye.com/topic/266705 form有2中方法把数据提交给服务器，get和post,分别说下吧。（一）get提交 1.首先说下客户端（浏览器）的form表单用get方法是如何将数据编码后提交给服务器端的吧。对于get方法来说，都是把数据串联在请求的url后面作为参数，如：http://localhost:
JS初学者必知的基础百合不是茶 js函数 js入门基础
JavaScript是网页的交互语言,实现网页的各种效果, JavaScript 是世界上最流行的脚本语言。 JavaScript 是属于 web 的语言，它适用于 PC、笔记本电脑、平板电脑和移动电话。 JavaScript 被设计为向 HTML 页面增加交互性。许多 HTML 开发者都不是程序员，但是 JavaScript 却拥有非常简单的语法。几乎每个人都有能力将小的
iBatis的分页分析与详解 bijian1013 java ibatis
分页是操作数据库型系统常遇到的问题。分页实现方法很多，但效率的差异就很大了。iBatis是通过什么方式来实现这个分页的了。查看它的实现部分，发现返回的PaginatedList实际上是个接口，实现这个接口的是PaginatedDataList类的对象，查看PaginatedDataList类发现，每次翻页的时候最
精通Oracle10编程SQL(15)使用对象类型 bijian1013 oracle 数据库 plsql
/* *使用对象类型 */ --建立和使用简单对象类型 --对象类型包括对象类型规范和对象类型体两部分。 --建立和使用不包含任何方法的对象类型 CREATE OR REPLACE TYPE person_typ1 as OBJECT( name varchar2(10),gender varchar2(4),birthdate date ); drop type p
【Linux命令二】文本处理命令awk bit1129 linux命令
awk是Linux用来进行文本处理的命令，在日常工作中，广泛应用于日志分析。awk是一门解释型编程语言，包含变量，数组，循环控制结构，条件控制结构等。它的语法采用类C语言的语法。 awk命令用来做什么？ 1.awk适用于具有一定结构的文本行，对其中的列进行提取信息 2.awk可以把当前正在处理的文本行提交给Linux的其它命令处理，然后把直接结构返回给awk 3.awk实际工
JAVA(ssh2框架)+Flex实现权限控制方案分析白糖_ java
目前项目使用的是Struts2+Hibernate+Spring的架构模式，目前已经有一套针对SSH2的权限系统，运行良好。但是项目有了新需求：在目前系统的基础上使用Flex逐步取代JSP，在取代JSP过程中可能存在Flex与JSP并存的情况，所以权限系统需要进行修改。【SSH2权限系统的实现机制】权限控制分为页面和后台两块：不同类型用户的帐号分配的访问权限是不同的，用户使
angular.forEach boyitech AngularJS AngularJS API angular.forEach
angular.forEach 描述: 循环对obj对象的每个元素调用iterator, obj对象可以是一个Object或一个Array. Iterator函数调用方法: iterator(value, key, obj), 其中obj是被迭代对象，key是obj的property key或者是数组的index，value就是相应的值啦. (此函数不能够迭代继承的属性.)
java-谷歌面试题-给定一个排序数组，如何构造一个二叉排序树 bylijinnan 二叉排序树
import java.util.LinkedList; public class CreateBSTfromSortedArray { /** * 题目:给定一个排序数组，如何构造一个二叉排序树 * 递归 */ public static void main(String[] args) { int[] data = { 1, 2, 3, 4,
action执行2次 Chen.H JavaScript jsp XHTML css Webwork
xwork 写道 <action name="userTypeAction" class="com.ekangcount.website.system.view.action.UserTypeAction"> <result name="ssss" type="dispatcher">
[时空与能量]逆转时空需要消耗大量能源 comsci 能源
无论如何,人类始终都想摆脱时间和空间的限制....但是受到质量与能量关系的限制,我们人类在目前和今后很长一段时间内,都无法获得大量廉价的能源来进行时空跨越..... 在进行时空穿梭的实验中,消耗超大规模的能源是必然
oracle的正则表达式(regular expression)详细介绍 daizj oracle 正则表达式
正则表达式是很多编程语言中都有的。可惜oracle8i、oracle9i中一直迟迟不肯加入，好在oracle10g中终于增加了期盼已久的正则表达式功能。你可以在oracle10g中使用正则表达式肆意地匹配你想匹配的任何字符串了。正则表达式中常用到的元数据(metacharacter)如下： ^ 匹配字符串的开头位置。 $ 匹配支付传的结尾位置。 *
报表工具与报表性能的关系 datamachine 报表工具 birt 报表性能润乾报表
在选择报表工具时，性能一直是用户关心的指标，但是，报表工具的性能和整个报表系统的性能有多大关系呢？要回答这个问题，首先要分析一下报表的处理过程包含哪些环节，哪些环节容易出现性能瓶颈，如何优化这些环节。一、报表处理的一般过程分析 1、用户选择报表输入参数后，报表引擎会根据报表模板和输入参数来解析报表，并将数据计算和读取请求以SQL的方式发送给数据库。 2、
初一上学期难记忆单词背诵第一课 dcj3sjt126com word english
what 什么 your 你 name 名字 my 我的 am 是 one 一 two 二 three 三 four 四 five 五 class 班级，课 six 六 seven 七 eight 八 nince 九 ten 十 zero 零 how 怎样 old 老的 eleven 十一 twelve 十二 thirteen
我学过和准备学的各种技术 dcj3sjt126com 技术
语言VB https://msdn.microsoft.com/zh-cn/library/2x7h1hfk.aspxJava http://docs.oracle.com/javase/8/C# https://msdn.microsoft.com/library/vstudioPHP http://php.net/manual/en/Html
struts2中token防止重复提交表单蕃薯耀重复提交表单 struts2中token
struts2中token防止重复提交表单 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年7月12日 11:52:32 星期日 ht
线性查找二维数组 hao3100590 二维数组
1.算法描述有序（行有序，列有序，且每行从左至右递增，列从上至下递增）二维数组查找，要求复杂度O(n) 2.使用到的相关知识：结构体定义和使用，二维数组传递（http://blog.csdn.net/yzhhmhm/article/details/2045816） 3.使用数组名传递这个的不便之处很明显，一旦确定就是不能设置列值 //使
spring security 3中推荐使用BCrypt算法加密密码 jackyrong Spring Security
spring security 3中推荐使用BCrypt算法加密密码了，以前使用的是md5， Md5PasswordEncoder 和 ShaPasswordEncoder，现在不推荐了，推荐用bcrpt Bcrpt中的salt可以是随机的，比如： int i = 0; while (i < 10) { String password = "1234
学习编程并不难,做到以下几点即可! lampcy java html 编程语言
不论你是想自己设计游戏，还是开发iPhone或安卓手机上的应用，还是仅仅为了娱乐，学习编程语言都是一条必经之路。编程语言种类繁多，用途各异，然而一旦掌握其中之一，其他的也就迎刃而解。作为初学者，你可能要先从Java或HTML开始学，一旦掌握了一门编程语言，你就发挥无穷的想象，开发各种神奇的软件啦。 1、确定目标学习编程语言既充满乐趣，又充满挑战。有些花费多年时间学习一门编程语言的大学生到
架构师之mysql----------------用group+inner join,left join ,right join 查重复数据（替代in) nannan408 right join
1.前言。如题。 2.代码 (1)单表查重复数据,根据a分组 SELECT m.a,m.b, INNER JOIN （select a,b,COUNT(*) AS rank FROM test.`A` A GROUP BY a HAVING rank>1 )k ON m.a=k.a （2）多表查询，使用改为le
jQuery选择器小结 VS 节点查找（附css的一些东西） Everyday都不同 jquery css name选择器追加元素查找节点
最近做前端页面，频繁用到一些jQuery的选择器，所以特意来总结一下：测试页面： <html> <head> <script src="jquery-1.7.2.min.js"></script> <script> /*$(function() { $(documen
关于EXT tntxia ext
ExtJS是一个很不错的Ajax框架，可以用来开发带有华丽外观的富客户端应用，使得我们的b/s应用更加具有活力及生命力。ExtJS是一个用 javascript编写，与后台技术无关的前端ajax框架。因此，可以把ExtJS用在.Net、Java、Php等各种开发语言开发的应用中。 ExtJs最开始基于YUI技术，由开发人员Jack
一个MIT计算机博士对数学的思考 xjnine Math
在过去的一年中，我一直在数学的海洋中游荡，research进展不多，对于数学世界的阅历算是有了一些长进。为什么要深入数学的世界？作为计算机的学生，我没有任何企图要成为一个数学家。我学习数学的目的，是要想爬上巨人的肩膀，希望站在更高的高度，能把我自己研究的东西看得更深广一些。说起来，我在刚来这个学校的时候，并没有预料到我将会有一个深入数学的旅程。我的导师最初希望我去做的题目，是对appe