Princeton Alogorithm COS226 Week11 Regular Expression

substring search is to find a single string in text.
pattern matching is to find one of a specified set of strings in text

image.png

a regular expression is a notation to specify a set of strings. (possibly infinite)


image.png

image.png

image.png

writing a RE is like writing a program.
it needs to understand programming model, can be easier to write than read, can be difficult to debug.

for any DFA, there exists a RE that describes the same set of strings.
for any RE, there exists a DFA that recognizes the same set of strings


image.png

DFA have no backup in text input stream, and linear time guarantee.

but to build DFA from RE is infeasible because DFA may have exponential number of states.

so we could use NFA, it is Quadratic time guarantee (linear-time typical)
NFA is for nondeterministic finite state automata

DFA. Deterministic ⇒ easy because exactly one applicable transition.

NFA. Nondeterministic ⇒ can be several applicable transitions;
need to select the right one!

Q. How to simulate NFA?
A. Systematically consider all possible transition sequences.

image.png

NFA representation

image.png

how to efficiently simulate an NFA?
Maintain set of all possible states that NFA could be in after reading in the first i text characters


image.png

NFA construction

image.png

for parentheses. add ε-transition edge from parentheses to next state.


image.png

image.png

image.png

image.png

RE Application

image.png

image.png

QUIZ

Q1 Challenging REs.

image.png

check answer in my github https://github.com/yixuaz/algorithm4-princeton-cos226

Q2 Exponential-size DFA

image.png

check answer in my github https://github.com/yixuaz/algorithm4-princeton-cos226

Q3 Extensions to NFA

Add to NFA.java the ability to handle multiway or, wildcard, and the + closure operator.
check answer in my github https://github.com/yixuaz/algorithm4-princeton-cos226

你可能感兴趣的:(Princeton Alogorithm COS226 Week11 Regular Expression)