如何写一个SQL解析器呢?这里先抛出第一步:单词切分。下面举个简单例子:
select.l 文件:
%{ /* nothing */ enum{ BEGIN_OF_INPUT = 0, SELECT, COLUMN, FROM, TABLE, SEP, END_OF_INPUT }; int state = BEGIN_OF_INPUT; %} %% \n { if (SEP == state) { printf("========= Cong!!! found a valid sql ==========\n"); } else if (BEGIN_OF_INPUT < state) { printf("========= :-( not a valid sql ==========\n"); } state = BEGIN_OF_INPUT; printf("please input a sql and press Enter\n"); } SELECT { if (state == (SELECT - 1)) state++; printf("select state=%d\n", state); } FROM { if (state == (FROM - 1)) state++; printf("from state=%d\n", state); } [a-zA-Z_]+ { if (state == (COLUMN - 1) || state == (TABLE - 1)) state++; printf("any word state=%d\n", state); } ; { if (state == (SEP - 1)) { state++; } printf("seperator. state=%d\n", state); } . ; /* ignore others */ %% main() { yylex(); }
lex select.l gcc lex.yy.c -ll
./a.out
注意:
写lex文件规则的时候,严格的规则必须写在松散的规则前面。下面是一个错误的例子:
[a-zA-Z_]+ { if (state == (COLUMN - 1) || state == (TABLE - 1)) state++; printf("any word state=%d\n", state); } FROM { if (state == (FROM - 1)) state++; printf("from state=%d\n", state); }因为FROM这个永远都走不到,全部都被[a-zA-Z_]拦住了。
================================================================
5年前写过的关于lex yacc的文章:
VC与YACC、LEX集成
一个简单的C语言词法分析与语法分析器【原】