Author's name: Carel-Jan Engel Author's Email: [email protected] |
Date written: Mar 24, 2005 Oracle version(s): N/A |
In documentation about tuning SQL, I see references to parse trees. What is aparse tree ? |
Back to index of questions
A parse-tree is an internal structure, created by the compiler or interpreter while parsing some language construction. Parsing is also known as 'syntax analysis'.
An example (slightly adapted version of the example found at page 6 of the famous 'Dragon Book', Compilers: principles, techniques and tools, by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Published by Addison Wesley. My copy is from 1986) will illustrate a parse tree. Rather than dealing with the complexities of a SQL statement, let's take a rather simple language construction: The assignment of the result of an expression to a variable:
一个例子(稍微适应版本:发现了著名的“龙书”,编译器第6页的例子。原则,技术和工具,由阿尔弗雷德五阿霍,拉维Sethi和杰弗里乌尔曼由Addison Wesley出版,我的副本从1986年)将展示一个解析树。处理一个SQL语句的复杂性,而不是让我们来看看一个相当简单的语言建设:一个变量的表达式的结果的分配:
result := salary + bonus * 1.10
When the compiler analyzes this statement the resulting parse-tree will look like this :
当编译器分析该语句生成的解析树看起来像这样:
assignment
________ statement ____
/ | \
/ := \
identifier ___ expression _______
| / | \
result / + \
expression __ expression ___
| / | \
identifier / * \
| expression expression
salary | |
identifier number
| |
bonus 1.10
The picture is an upside-down representation of a tree. The language elements in this small simple assignment are:identifiers (result, salary, bonus), operators (:=, +, *), and anumber (1.10). 'Identifier' is the language element that names a variable, function or procedure. 'Operator' is the language element that represents some action to be taken, upon theoperands at either end of the operator. Number is a constant, 1.10 in this statement. The syntax rules (' grammar') will specify which 'sentences' are valid.
图片是倒树的代表性。在这个小的简单的赋值的语言元素是:标识符(因此,工资,奖金),操作符(:=,+ *)和(1.10)aNumber的。 “标识”是一个变量,函数或过程的语言元素名称。 “经营者”的语言元素,代表了一些,在运营商的两端后应采取theoperands的行动。数量是一个常数,在此声明1.10。将指定的语法规则(“语法”),“句子”是有效的。
After successfully decomposing the statement into its internal representation, the compiler or interpreter can 'walk the tree' to create the executable code for the construction. An interpreter will not generate code for the execution, but will invoke built-in executing functions by itself. Let's take the interpreter for the rest of the explanation, execution of the steps is easier to explain than the code-generation of a compiler. For the example I assume the bonus to be 100, and the salary to be 1000. The tree-walk will start at the root of the tree, the assignment statement. The rule for the assignment will tell the interpreter that the right hand has to be evaluated first. This evaluation is also known as 'reduction'. The right hand side of the assignment needs to be reduced to a value, the result of the expression, before it can be assigned to the variable at the left hand side of the statement.
成功后分解成它的内部表示的语句,编译器或解释可以“行走的树”的建设,以创建可执行代码。解释器将不生成执行代码,但会调用内置的执行本身的职能。让我们看看其他的解释解释,执行的步骤是比一个编译器的代码生成更容易解释。对于这个例子,我假定为100,奖金和工资为1000。树步行将开始在树的根,赋值语句。转让的规则会告诉解释器,右手先计算。这种评价也被称为“还原”。转让的右侧的需求将减少到一个值,表达式的结果,才可以分配的语句左边的变量。
The first node at the right-hand side of the statement contains an expression with a '+' operator. The right hand side of the '+' operator needs to be assigned to the left hand side. So the walk goes on to the next node at the right hand side. There the interpreter will detect the expression with the '*' operator. The left hand side of this operator needs to be multiplied with the right hand side. The interpreter goes on to the right hand side, and detects an expression that consists of a single number: 1.10. This side is fully reduced, the result can be stored and the interpreter walks the tree back up to the '*' operator, and starts evaluating its left hand side. This is an expression that consists of one single identifier, representing a variable, 'bonus'., The memory location represented by this variable is read and it's contents (100) will be multiplied by the right hand side result, 1.10. This expression has been fully reduced to the result 110 now. The interpreter walks up, to the '+' operator, and starts evaluating its left hand side. There it will again detect an identifier, 'salary'. Its location is read (1000) and the expression is reduced to a number, 1000. The right and left hand side will be added, resulting in 1,110. Now the expression at the right hand side of the assignment is fully reduced, and the interpreter walks up the tree, finds the assignment operator ':='. This instructs the interpreter to copy the result of the expression to the left hand side. The left hand side contains an identifier, 'result'. The memory location represented by 'result' will be filled with the result of the expression, 1,110.
在右侧的声明的第一个节点包含一个“+”操作符的表达式。右侧的“+”运算需要被分配到左侧。所以走在右侧的下一个节点。有解释器将检测到的“*”操作符表达式。这个操作符左边需要乘以右侧。口译员的右侧,并检测到表达,由一个单一的数字:1.10。此方是完全还原,结果可存储和口译各界树“*”操作符,并开始评估其左侧。这是一个包含一个单一的标识符代表一个变量,“奖金”,这个变量所代表的内存位置读取和它的内容(100)将右边的结果,1.10乘以的表达。此表达式已全面降低到现在的结果110。口译员走了,“+”运算,并开始评估其左侧。在那里,它会再次检测标识符,“薪水”。它的位置是只读(1000),表达的是一个数字,1000。将增加的权利和左侧,导致在1110。现在,在右侧转让的表达式是完全降低,口译员走了树,发现赋值运算符':='.这指示解释复制的表达左侧。左侧包含一个标识符,“结果”。 “结果”所代表的内存位置将被填充与表达式的结果,1110。
It is just a simplified explanation of how an interpreter or compiler uses a parse tree. It's out of scope of this answer to create a complete introduction to compiler building practices. However, it might be clear that creating a parse-tree consumes some resources. Before the language elements can be recognized they must be read character by character, type checking and possible conversion needs to be done, identifiers (tables, columns etc.) need to be identified and checked in the data dictionary, and so on. After this 'hard parse' the parse tree is composed, and is a far cheaper form to use to execute a statement than doing all this analysis over and over again. Therefore, storing the parse-tree in the SQL-area for future use can save quite some time during the processing of SQL-statements that have come across before.
它仅仅是一个简单解释如何解释器或编译器的使用解析树。这是这个答案的范围,建立一个完整的介绍编译器的建设实践。但是,它可能是明确的,创建解析树消耗一些资源。的语言元素,可以确认之前,他们必须予以字符的字符,类型检查和可能的转换,需要做的,标识符(表,列等)的需要确定和检查数据字典,依此类推。在此之后的“硬解析”的解析树组成,是一个便宜得多的形式用来执行比一遍又一遍的做这一切分析的声明。因此,存储在SQL区,供日后使用解析树,跨前的SQL语句的处理过程中可以节省一段时间。
Further reading: If you are interested in compiler building techniques, consider reading The Dragon Book. It can be found at: http://www.amazon.co.uk/exec/obidos/ASIN/0201101947/026-7499645-2696457
注:以上翻译是用google直接翻译的,意思不明白,参考英文原文