Labs ICT
โญ Pro Login

Regex to NFA Conversion

Thompson's construction โ€” turning patterns into automata.

Regex to NFA Conversion

Thompson's construction is an algorithm that converts a regular expression into an equivalent NFA. It works by building up the NFA recursively, handling each operation (union, concatenation, Kleene star) with a fixed pattern. This is how regex engines actually compile your patterns into executable automata.

Thompson's Construction

The algorithm builds NFAs bottom-up. For each sub-expression, it creates a small NFA, then combines them.

Base case: symbol 'a'

  โ”€โ”€โ–ถ ( ) โ”€โ”€aโ”€โ”€โ–ถ (( ))
      start    accept

Concatenation: r1 followed by r2

  NFA(r1) โ”€โ”€โ–ถ NFA(r2)
  Connect accept state of r1 to start state of r2

Union: r1 | r2

        โ”Œโ”€โ”€ฮตโ”€โ”€โ–ถ NFA(r1) โ”€โ”€ฮตโ”€โ”€โ”
  ( ) โ”€โ”€โ”ค                     โ”œโ”€โ”€โ–ถ (( ))
        โ””โ”€โ”€ฮตโ”€โ”€โ–ถ NFA(r2) โ”€โ”€ฮตโ”€โ”€โ”˜

Kleene star: r*

        โ”Œโ”€โ”€ฮตโ”€โ”€โ–ถ NFA(r) โ”€โ”€ฮตโ”€โ”€โ”
  ( ) โ”€โ”€โ”ค                    โ”œโ”€โ”€โ–ถ (( ))
        โ””โ”€โ”€ฮตโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  (also: ฮต from inner accept to inner start for looping)

Example: Regex 01|10

Let's convert the regex (01|10) to an NFA using Thompson's construction:

Step 1: Build NFA for "0"
  q0 --0--> q1

Step 2: Build NFA for "1"
  q2 --1--> q3

Step 3: Concatenate for "01"
  q0 --0--> q1 --ฮต--> q2 --1--> q3

Step 4: Build NFA for "1"
  q4 --1--> q5

Step 5: Build NFA for "0"
  q6 --0--> q7

Step 6: Concatenate for "10"
  q4 --1--> q5 --ฮต--> q6 --0--> q7

Step 7: Union "01" | "10"
  q_start --ฮต--> q0 (start of "01" path)
  q_start --ฮต--> q4 (start of "10" path)
  q3 --ฮต--> q_accept
  q7 --ฮต--> q_accept

Result: NFA with 10 states accepting "01" or "10"

Properties

Thompson's construction produces an NFA with at most 2n states for a regex of length n (counting symbols and operators). It runs in O(n) time. The resulting NFA has at most one transition per symbol per state (no multiple transitions for the same symbol), and ฮต transitions connect the sub-NFAs.

This construction is used in practice by many regex engines, including the one in the Go programming language and in RE2 (Google's regex library). It guarantees linear-time pattern compilation.

๐Ÿงช Quick Quiz

In Thompson's construction, concatenation of r1 and r2 is done by: