语法分析

syn-tax(语法)

the way in which words are put together to form 
phrases, clauses, or sentences.

上下文无关文法

 context-free grammars, a more powerful method
of describing languages. Such grammars can describe certain features that have
a recursive structure, which makes them useful in a variety of applications.
Context-free grammars were first used in the study of human languages. One
way of understanding the relationship of terms such as noun, verb, and preposition
and their respective phrases leads to a natural recursion because noun phrases
may appear inside verb phrases and vice versa. Context-free grammars help us
organize and understand these relationships.
An important application of context-free grammars occurs in the specification
and compilation of programming languages. A grammar for a programming language often appears as a reference for people trying to learn the language syntax.
Designers of compilers and interpreters for programming languages often start
by obtaining a grammar for the language. Most compilers and interpreters contain a component called a parser that extracts the meaning of a program prior to
generating the compiled code or performing the interpreted execution. A number of methodologies facilitate the construction of a parser once a context-free
grammar is available. Some tools even automatically generate the parser from
the grammar.
1

上下文无关语言

Any language that can
be generated by some context-free grammar is called a context-free language
(CFL)

产生式

A grammar consists of a collection of substitution rules, also called productions.

A0A1
A → B
B → #

FORMAL DEFINITION OF A CONTEXT-FREE GRAMMAR(形式化定义)

A context-free grammar is a 4-tuple (V, Σ, R, S), where
1. V is a finite set called the variables,
2. Σ is a finite set, disjoint from V , called the terminals,
3. R is a finite set of rules, with each rule being a variable and a
string of variables and terminals, and
4. S ∈ V is the start variable.

龙书关于上下文无关文法的形式化定义

a context-free grammar (grammar for short) consists of terminals, nonterminals, a start symbol, and productions.
1. Terminals are the basic symbols from which strings are formed. The term
"token name" is a synonym for "terminal" and frequently we will use the
word "token" for terminal when it is clear that we are talking about just
the token name. We assume that the terminals are the first components
of the tokens output by the lexical analyzer. In (4.4), the terminals are
the keywords if and else and the symbols " c" and ") ."
2. Nonterminals are syntactic variables that denote sets of strings. In (4.4),
stmt and expr are nonterminals. The sets of strings denoted by nonterminals help define the language generated by the grammar. Nonterminals
impose a hierarchical structure on the language that is key to syntax
analysis and translation.
3. In a grammar, one nonterminal is distinguished as the start symbol, and
the set of strings it denotes is the language generated by the grammar.
Conventionally, the productions for the start symbol are listed first.
4. The productions of a grammar specify the manner in which the terminals and nonterminals can be combined to form strings. Each production
consists of:
(a) A nonterminal called the head or left side of the production; this
production defines some of the strings denoted by the head.
(b) The symbol --+. Sometimes : : = has been used in place of the arrow.
(c) A body or right side consisting of zero or more terminals and nonterminals. The components of the body describe one way in which
strings of the nonterminal at the head can be constructed.

乔姆斯基范式 Chomsky normal form

When working with context-free grammars, it is often convenient to have them in simplified form. One of the simplest and most useful forms is called the Chomsky normal form.

文法定理

Any context-free language is generated by a context-free grammar in Chomsky normal form.

语法分析树

//语法树用图形的方式呈现
A parse tree pictorially shows how the start symbol of a grammar deriv es a
string in the language. If non terminal A has a pro duction A --> X Y Z , then a parse tree may have an interior node labeled A with three children labeled X ,Y , and Z , from left to righ t:
A
X Y Z
F ormally , giv en a con text-free grammar, a p arse tr e e according to the grammar is a tree with the follo wing prop erties

文法二义性

设计好文法避免二义性,常用的就是添加规则

we need to design unam biguous grammars for compiling applications, 
or to use am biguous grammars with additional rules to resolve the ambiguities

二义性文法举例

string ->string + string | string - string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

消除二义性的文法

expr --> expr + term | expr - term | term
term --> term * factor | term / factor | factor
factor --> digit | ( expr )

语法分析

语法分析是决定如何使用文法生成非终结符串的过程
语法树是用来辅助想象用的

Parsing is the process of determining how a string of terminals can be generated by a grammar. In discussing this problem, it is helpful to think of a parse tree being constructed, even though a compiler may not construct one, in practice.
However, a parser must be capable of constructing the tree in principle, or else the translation cannot be guaranteed correct

大多数语法分析归为 “自顶向下”、“自底向上”两种方法

Most parsing methods fall into one of two classes, called the top-down and bottom-up methods. These terms refer to the order in which nodes in the parse tree are constructed. 
In top-down parsers, construction starts at the root and proceeds towards the leaves, while in bottom-up parsers, construction starts at the leaves and proceeds towards the root. 
The popularity of top-down parsers is due to the fact that efficient parsers can be constructed more easily by hand using top-down methods. Bottom-up parsing,
 however, can handle a larger class of grammars and translation schemes, so software tools for generating parsers directly from grammars often use bottom-up methods.
向前看符号

The current terminal(终结符) being scanned in the input is frequently referred to as the lookahead symbol. Initially(刚开始地), the lookahead symbol is the first, i.e., leftmost(最左), terminal of the input string.

产生式试错

一般情况下,产生式不正确的时候就去回溯,为非终结符选择产生式是一个尝试并犯错的过程
In general, the selection of a production for a nonterminal may involve trialand-error; that is, we may have to try a production and backtrack to try another production if the first is found to be unsuitable.
A production is unsuitable if, after using the production, we cannot complete the tree to match the input string.

预测分析法

Predictive parsing relies on information about the first symbols that can be
generated by a production body. More precisely, let a be a string of grammar
symbols (terminals and/or nonterminals). We define FIRsT(a) tb be the set of
terminals that appear as the first symbols of one or more strings of terminals
generated from a. If a is E or can generate E, then E is also in, FIRsT(a) .

你可能感兴趣的:(人工智能,编译原理,自动机)