Labs ICT
Pro Login

Alphabets and Languages

Formal definitions of symbols, strings, and languages.

Alphabets

An alphabet is a finite, non-empty set of symbols. We denote it with the Greek letter sigma (Σ). For example, Σ = {0, 1} is the binary alphabet. Σ = {a, b, c} is another. The key constraint: an alphabet must be finite and cannot be empty.

Symbols in an alphabet are just abstract characters. They don't have to be letters — they could be colors, actions, or any discrete tokens. What matters is that they are distinct and countable.

Strings

A string (or word) is a finite sequence of symbols from an alphabet. If Σ = {0, 1}, then 011, 10, 1, and the empty string ε are all valid strings. The length of a string is the number of symbols it contains. The empty string ε has length 0.

We denote the set of all possible strings over an alphabet Σ as Σ* (the Kleene star). This includes the empty string. Σ+ denotes all non-empty strings (Σ* without ε).

Alphabet: Σ = {a, b}

Strings over Σ:
  ε   (empty string, length 0)
  a   (length 1)
  b   (length 1)
  aa  (length 2)
  ab  (length 2)
  ba  (length 2)
  bb  (length 2)
  ... and so on

Σ* = {ε, a, b, aa, ab, ba, bb, aaa, ...}

Languages

A language is a set of strings over some alphabet. It can be finite or infinite. For example, the set of all strings with an even number of 1s is a language over {0, 1}. The set of all valid email addresses is a language over the ASCII alphabet.

Formally, a language L is a subset of Σ*. If L = {ε, 0, 00, 000}, that's a valid (finite) language. If L = all binary strings representing multiples of 3, that's an infinite language.

Σ = {0, 1}

L1 = {0, 1, 00, 11}        (finite language)
L2 = {w | w has even number of 1s}  (infinite language)
L3 = {w | w starts with 0 and ends with 1}  (infinite language)

L1 ⊂ Σ*
L2 ⊂ Σ*
L3 ⊂ Σ*

Operations on Strings

Two important operations are concatenation and reversal. Concatenation joins two strings: if w1 = "01" and w2 = "11", then w1w2 = "0111". Reversal flips a string: the reversal of "011" is "110". These operations are fundamental when defining languages formally.

Length follows simple rules: |ε| = 0, and |xy| = |x| + |y|. These properties make it easy to reason about strings mathematically.

🧪 Quick Quiz

An alphabet in automata theory is: