Lexical analysis and parsing are prerequisite for any language. There are many tools available in the industry which can help in achieving this goal.
To give you an idea, following is a listing of some of the available lexer and parser generator.
Parser generators:-
In this post, I will be discussing a brief about the major functions, classes and concepts used during MySQL query parsing.
Lexer used in MySQL
On the contrary, MySQL uses a hand-written lexer for lexical analysis. The benefit being most of the code can be optimized keeping SQL syntax in mind.
(Please refer to the files sql_lex.h and sql_lex.cc)
Some key observations:-
The following classes gives an outline of the lex implementation in MySQL. (along with short description, pulled from the comments in the source code.)
This is the base class for st_select_lex and st_select_lex_unit.
st_select_lex_unit is container of either
– One SELECT
– UNION of selects
Store information of parsed SELECT statement (without union)
st_select_lex and st_select_lex_unit are both inherited form select_lex_node
This class represents the character input stream consumed during lexical analysis.
In addition to consuming the input stream, this class performs some comment pre processing, by filtering out out-of-bound special text from the query input stream.
Two buffers, with pointers inside each buffers, are maintained in parallel. The ‘raw’ buffer is the original query text, which may contain out-of-bound comments. The ‘cpp’ (for comments pre processor) is the pre-processed buffer that contains only the query text that should be seen once out-of-bound data is removed.
Class representing list of all tables used by statement and other information which is necessary for opening and locking its tables, like SQL command for this statement.
Also contains information about stored functions used by statement since during its execution we may have to add all tables used by its stored functions/triggers to this list in order to pre-open and lock them.
Also used by LEX::reset_n_backup/restore_backup_query_tables_list() methods to save and restore this informatio.
The state of the lex parsing. This is saved in the THD struct
The internal state of the syntax parser. This object is only available during parsing, and is private to the syntax parser implementation (sql_yacc.yy).
Internal state of the parser.
The complete state consist of:
– state data used during lexical parsing,
– state data used during syntactic parsing.
Parser used in MySQL
MySQL uses “bison” tool as it’s parser generator.
The parser contains the standard set of functions generated by bison tool.
All the grammar rules being defined in the file sql_yacc.yy
sql_yacc.cc and sql_yacc.h are the generated output file from bison.
Note: Since bison gives a flexibility to override the name of functions and variables used in the parser, in MySQL code it replaces “yy” with “MYSQL”.In the Makefile, bison is called with the argument “-p” (“/usr/bin/bison -y -p MYSQL“).Please go through the man page of bison for more details.The idea behind providing such flexibility was to allow multiple parsers in the same application |
Note: So, the parser function is not yyparse(), it is MYSQLparse(). The same being called from the function parse_sql(), in file sql_parser.cc. Macros are internally used in the generated code to achieve this goal.A code snippet from sql_yacc.cc:#define yyparse MYSQLparse#define yylex MYSQLlex #define yyerror MYSQLerror #define yylval MYSQLlval #define yychar MYSQLchar #define yydebug MYSQLdebug #define yynerrs MYSQLnerrs |
References:
Note: Since bison gives a flexibility to override the name of functions and variables used in the parser, in MySQL code it replaces “yy” with “MYSQL”.In the Makefile, bison is called with the argument “-p” (“/usr/bin/bison -y -p MYSQL“).Please go through the man page of bison for more details.The idea behind providing such flexibility was to allow multiple parsers in the same application |
Note: So, the parser function is not yyparse(), it is MYSQLparse(). The same being called from the function parse_sql(), in file sql_parser.cc. Macros are internally used in the generated code to achieve this goal.A code snippet from sql_yacc.cc:#define yyparse MYSQLparse#define yylex MYSQLlex #define yyerror MYSQLerror #define yylval MYSQLlval #define yychar MYSQLchar #define yydebug MYSQLdebug #define yynerrs MYSQLnerrs |
References: