Context-Free Grammars (CFG) Basics
A context-free grammar (CFG) is a formalism for describing the syntax of languages. Unlike regular expressions, CFGs can describe nested structures like balanced parentheses, arithmetic expressions, and programming language syntax. They are more powerful than finite automata.
Formal Definition
A CFG is a 4-tuple G = (V, ฮฃ, R, S), where:
V โ finite set of variables (non-terminals)
ฮฃ โ finite set of terminals (disjoint from V)
R โ finite set of productions (rules)
S โ V โ start variable
Production format:
A โ ฮฑ
where A โ V and ฮฑ โ (V โช ฮฃ)*
Example โ simple grammar for arithmetic expressions:
V = {E, T, F}
ฮฃ = {+, *, (, ), id}
S = E
R:
E โ E + T | T
T โ T * F | F
F โ ( E ) | id
Derivations
A derivation is a sequence of steps that replaces variables with strings of terminals and variables using the production rules. We start with the start variable S and apply productions until we have only terminals.
Grammar:
S โ aSb | ฮต
Derivation of "aabb":
S โ aSb (using S โ aSb)
โ aaSbb (using S โ aSb)
โ aabbb? No โ let's fix this.
Actually:
S โ aSb
โ aaSbb
โ aabb (using S โ ฮต)
Leftmost derivation (always replace leftmost variable):
S โ aSb โ aaSbb โ aabb
Rightmost derivation (always replace rightmost variable):
S โ aSb โ aabb (replace S โ ฮต first, then... )
Actually: S โ aSb โ aabb (replace S with ฮต)
Language of a Grammar
The language generated by a grammar G, denoted L(G), is the set of all strings of terminals that can be derived from S. A language is context-free if there exists a CFG that generates it.
CFGs are strictly more powerful than regular expressions. The language {0^n 1^n | n >= 0} can be generated by a CFG but not by any regular expression. This is because CFGs can count and match nested structures.
Grammar for {0^n 1^n | n >= 0}:
S โ 0S1 | ฮต
Derivation of "000111":
S โ 0S1 โ 00S11 โ 000S111 โ 000111
This language is context-free but NOT regular.
No finite automaton can count unbounded pairs.