参考学习备忘:
转载来自ref
Suppose you want to write a program to recognize the word “main” in an input program. Logically, your program will look something like this:
cin >> char
while (char != “m”) cin >> char
if (cin >> char != “a”) go to step 1
if (cin >> char != “i”) go to step 1
if (cin >> char != “n”) go to step 1
done
We can explain each step in this program as follows:
Initialization
Looking for “m”
Recognized “m”, looking for “a”
Recognized “ma”, looking for “i”
Recognized “mai”, looking for “n”
Recognized “main”
Each step in the program corresponds to a different place in the recognition process. We can capture this behavior in a graph
each node in the graph represents a step in the process
arcs in the graph represent movement from one step to another
labels on the arcs correspond to the input required to make a transition
A finite automaton (FA) is a simple idealized machine used to recognize patterns within input taken from some character set (or alphabet) C. The job of an FA is to accept or reject an input depending on whether the pattern defined by the FA occurs in the input.
A finite automaton consists of:
We execute our FA on an input sequence as follows:
Begin in the start state
If the next input char matches the label on a transition from the current state to a new state, go to that new state
Continue making transitions on each input char
If no move is possible, then stop
If in accepting state, then accept
It is fairly straightforward to translate an FA into a program. Consider a 4-state FA to recognize “main” in a program.
Let FA = {S,C,T,s0,F}
S = {s0, s1, s2, s3, s4}
C = {a,b,..z,A,B,..Z,0,1,..9,+,-,*,/}
F = {s4}
T = {(s0,m,s1), (s0,C-m,s0),
(s1,a,s2), (s1,m,s1), (s1,C-a-m,s0),
(s2,i,s3), (s2,m,s1), (s2,C-i-m,s0),
(s3,n,s4), (s3,m,s1), (s3,C-n-m,s0), (s4,C,s4)}
We can easily create a program from this description of the FA. We will use statement labels to represent states and goto’s to represent the meaning of an arc. (In general, goto’s are discouraged, but this is one case where their use is not only reasonable, it is quite common.) The variable “accept” is true if the FA accepts, and is false otherwise.
s: accept = false; cin >> char;
if char = “m” goto m;
if char = EOF goto end;
goto s;
m: accept = false; cin >> char;
if char = “m” goto m;
if char = “a” goto a;
if char = EOF goto end;
goto s;
a: accept = false; cin >> char;
if char = “m” goto m;
if char = “i” goto i;
if char = EOF goto end;
goto s;
i: accept = false; cin >> char;
if char = “m” goto m;
if char = “n” goto n;
if char = EOF goto end;
goto s;
n: accept = true; while (cin >> char);
end: cout << accept;
Nondeterministic Automata
If, for each pair of states and possible input chars, there is a unique next state (as specificed by the transitions), then the FA is deterministic (DFA). Otherwise, the FA is nondeterministic (NDFA).
What does it mean for an FA to have more than one transition from a given state on the same input symbol? How do we translate such an FA into a program? How can we “goto” more than one place at a time?
Conceptually, a nondeterministic FA can follow many paths simultaneously. If any series of valid transitions reaches an accepting state, they we say the FA accepts the input. It’s as if we allow the FA to “guess” which of several transitions to take from a given state, and the FA always guesses right.
We won’t attempt to translate an NDFA into a program, so we don’t have to answer the question “how can we goto more than one place at a time”. Instead, we can prove that every NDFA has a corresponding DFA, and there is a straightforward process for translating an NDFA into a DFA. So, when given an NDFA, we can translate it into a DFA, and then write a program based on the DFA.
Example of an NDFA
An NDFA to accept strings containing the word “main”:
-> s0 -m-> s1 -a- > s2 -i-> s3 -n-> (s4)
-> s0 -any character-> s0
This is an NDFA because, when in state s0 and seeing an “m”, we can choose to remain in s0 or go to s1. (In effect, we guess whether this “m” is the start of “main” or not.)
If we simulate this NDFA with input “mmainm” we see the NDFA can end up in s0 or s1 after seeing the first “m”. These two states correspond to two different guesses about the input: (1) the “m” represents the start of “main” or (2) the “m” doesn’t represent the start of “main”.
-> s0 -m-> s0
-m-> s1
On seeing the next input character (“m”), one of these guesses is proven wrong, as is there is no transition from s1 for an “m”. That path halts and rejects the input. The other path continues, making a transition from s0 to either s0 or s1, in effect guessing that the second “m” in the input either is or is not the start of the word “main”.
-> s0 -m-> s0 -m-> s0
-m-> s1
-m-> s1
Continuing the simulation, we discover that at the end of the input, the machine can be in state s0 (still looking for the start of “main”), s1 (having seen an “m” and looking for “ain”), or s4 (having seen “main” in the input). Since at least one of these states is an accepting state (s4), the machine accepts the input.
s0 -m-> s0 -m-> s0 -a-> s0 -i-> s0 -n-> s0 -m-> s0
-m-> s1
-m-> s1 -a-> s2 -i-> s3 -n-> s4
-m-> s1
Equivalence of Automata
Two automata A and B are said to be equivalent if both accept exactly the same set of input strings. Formally, if two automata A and B are equivalent then
if there is a path from the start state of A to a final state of A labeled a1a2..ak, there there is a path from the start state of B to a final state of B labeled a1a2..ak.
if there is a path from the start state of B to a final state of B labeled b1b2..bj, there there is a path from the start state of A to a final state of A labeled b1b2..bj.
Equivalence of Deterministic and Nondeterministic Automata
To show that there is a corresponding DFA for every NDFA, we will show how to remove nondeterminism from an NDFA, and thereby produce a DFA that accepts the same strings as the NDFA.
The basic technique is referred to as subset construction, because each state in the DFA corresponds to some subset of states of the NDFA.
The idea is this: as we trace the set of possible paths thru an NDFA, we must record all possible states that we could be in as a result of the input seen so far. We create a DFA which encodes the set of states of the NDFA that we could be in within a single state of the DFA.
Subset Construction for NDFA
To create a DFA that accepts the same strings as this NDFA, we create a state to represent all the combinations of states that the NDFA can enter.
From the previous example (of an NDFA to recognize input strings containing the word “main”) of a 5 state NDFA, we can create a corresponding DFA (with up to 2^5 states) whose states correspond to all possible combinations of states in the NDFA:
{},
{s0}, {s1}, {s2}, {s3}, {s4},
{s0, s1}, {s0, s2}, {s0, s3}, {s0, s4},
{s1, s2}, {s1, s3}, {s1, s4},
{s2, s3}, {s2, s4},
{s3, s4},
{s0, s1, s2}, {s0, s1, s3}, {s0, s1, s4},
{s0, s2, s3}, {s0, s2, s4},
{s0, s3, s4},
{s1, s2, s3}, {s1, s2, s4},
{s1, s3, s4},
{s2, s3, s4},
{s0, s1, s2, s3}, {s0, s1, s2, s4},
{s0, s1, s3, s4}, {s0, s2, s3, s4},
{s1, s2, s3, s4},
{s0, s1, s2, s3, s4}
Note that many of these states won’t be needed in our DFA because there is no way to enter that combination of states in the NDFA. However, in some cases, we might need all of these states in the DFA to capture all possible combinations of states in the NDFA.
Subset Construction for NDFA (cont)
A DFA accepting the same strings as our example NDFA has the following transitions:
{s0} -m-> {s0,s1}
{s0} -not m-> {s0}
{s0,s1} -m-> {s0,s1}
{s0,s1} -a-> {s0,s2}
{s0,s1} -not m,a-> {s0}
{s0,s2} -m-> {s0,s1}
{s0,s2} -i-> {s0,s3}
{s0,s2} -not m,i-> {s0}
{s0,s3} -m-> {s0,s1}
{s0,s3} -n-> {s0,s4}
{s0,s3} -not m,n-> {s0}
The start state is {s0} and the final state is {s0,s4}, the only one containing a final state of the NDFA.
Limitations of Finite Automata
The defining characteristic of FA is that they have only a finite number of states. Hence, a finite automata can only “count” (that is, maintain a counter, where different states correspond to different values of the counter) a finite number of input scenarios.
There is no finite automaton that recognizes these strings:
The set of binary strings consisting of an equal number of 1’s and 0’s
The set of strings over ‘(’ and ‘)’ that have “balanced” parentheses
The ‘pumping lemma’ can be used to prove that no such FA exists for these examples.