Regex to NFA Conversion
Thompson's construction is an algorithm that converts a regular expression into an equivalent NFA. It works by building up the NFA recursively, handling each operation (union, concatenation, Kleene star) with a fixed pattern. This is how regex engines actually compile your patterns into executable automata.
Thompson's Construction
The algorithm builds NFAs bottom-up. For each sub-expression, it creates a small NFA, then combines them.
Base case: symbol 'a'
โโโถ ( ) โโaโโโถ (( ))
start accept
Concatenation: r1 followed by r2
NFA(r1) โโโถ NFA(r2)
Connect accept state of r1 to start state of r2
Union: r1 | r2
โโโฮตโโโถ NFA(r1) โโฮตโโโ
( ) โโโค โโโโถ (( ))
โโโฮตโโโถ NFA(r2) โโฮตโโโ
Kleene star: r*
โโโฮตโโโถ NFA(r) โโฮตโโโ
( ) โโโค โโโโถ (( ))
โโโฮตโโโโโโโโโโโโโโโโโโ
(also: ฮต from inner accept to inner start for looping)
Example: Regex 01|10
Let's convert the regex (01|10) to an NFA using Thompson's construction:
Step 1: Build NFA for "0"
q0 --0--> q1
Step 2: Build NFA for "1"
q2 --1--> q3
Step 3: Concatenate for "01"
q0 --0--> q1 --ฮต--> q2 --1--> q3
Step 4: Build NFA for "1"
q4 --1--> q5
Step 5: Build NFA for "0"
q6 --0--> q7
Step 6: Concatenate for "10"
q4 --1--> q5 --ฮต--> q6 --0--> q7
Step 7: Union "01" | "10"
q_start --ฮต--> q0 (start of "01" path)
q_start --ฮต--> q4 (start of "10" path)
q3 --ฮต--> q_accept
q7 --ฮต--> q_accept
Result: NFA with 10 states accepting "01" or "10"
Properties
Thompson's construction produces an NFA with at most 2n states for a regex of length n (counting symbols and operators). It runs in O(n) time. The resulting NFA has at most one transition per symbol per state (no multiple transitions for the same symbol), and ฮต transitions connect the sub-NFAs.
This construction is used in practice by many regex engines, including the one in the Go programming language and in RE2 (Google's regex library). It guarantees linear-time pattern compilation.