Huffman Coding

Data compression using greedy tree building.

Huffman Coding is a data compression algorithm that uses variable-length binary codes. Frequent characters get short codes (like 2 bits), and rare characters get longer codes (like 6 bits). The result is a file that takes less space to store and transmit.

Think of it like Morse code — common letters like E and T have short patterns, while rare ones like Q and Z have longer ones. Huffman coding automates this idea for any data.

Building the Huffman Tree

The algorithm builds a binary tree bottom-up. Start by treating each character as a leaf node with its frequency as weight. Repeatedly merge the two smallest nodes into a new parent node whose weight is their sum. Continue until one tree remains.

function huffmanTree(freq) {
  const nodes = Object.entries(freq).map(([char, f]) => ({
    char, freq: f, left: null, right: null
  }));

  while (nodes.length > 1) {
    nodes.sort((a, b) => a.freq - b.freq);
    const left = nodes.shift();
    const right = nodes.shift();
    nodes.push({
      char: null,
      freq: left.freq + right.freq,
      left, right
    });
  }
  return nodes[0];
}

const freq = { a: 45, b: 13, c: 12, d: 16, e: 9, f: 5 };
const tree = huffmanTree(freq);
console.log(tree);

Encoding and Decoding

Once the tree is built, traverse it to generate codes. Going left adds a 0, going right adds a 1. Leaf nodes give you the character and its code. For our example, 'a' might become "0" and 'f' might become "1100".

Decoding is just reversing the process. Read the binary string bit by bit, walk down the tree from the root, and when you hit a leaf, output that character and restart from the root. The tree acts as a self-contained dictionary — no special separators needed.

Why It Is Optimal

Huffman coding is optimal among all prefix-free codes. A prefix-free code means no character's code is a prefix of another — this is what allows unambiguous decoding. The proof involves showing that putting the least frequent characters deepest in the tree minimizes the weighted path length.

In practice, Huffman coding is used in ZIP files, JPEG images, and MP3 audio. It is a building block of many modern compression formats. The algorithm runs in O(n log n) time due to the priority queue operations, making it efficient for large datasets.

← Activity Selection

String Basics →