Convolutional Neural Networks

The architecture behind image recognition.

Neural Networks for Images

Convolutional Neural Networks (CNNs) are the gold standard for image-related tasks. They're specifically designed to process grid-like data (images) by using special layers that detect spatial patterns — edges, textures, shapes, and objects. CNNs power facial recognition, medical imaging, self-driving cars, and more.

CNN Architecture


  ┌──────────────────────────────────────────────────────────┐
  │                 CNN ARCHITECTURE                         │
  │                                                          │
  │  ┌──────┐  ┌──────────┐  ┌──────────┐  ┌──────┐  ┌───┐│
  │  │ Input│  │Conv +    │  │ Pooling  │  │Fully │  │Out││
  │  │Image │→ │ReLU      │→ │          │→ │Conn. │→ │put││
  │  │      │  │(Filters) │  │(Downsample)│ │(Dense)│  │   ││
  │  └──────┘  └──────────┘  └──────────┘  └──────┘  └───┘│
  │                                                          │
  │  Extract     Detect         Reduce      Classify         │
  │  features    patterns       size        decision         │
  └──────────────────────────────────────────────────────────┘

Convolutional Layers

The core building block. A small filter (kernel) slides across the image, computing dot products at each position. Each filter detects a specific feature — a vertical edge, a horizontal edge, a corner, a texture. The output is called a feature map.


  Input Image (5×5):        Filter (3×3):        Feature Map:
  ┌─────────────────┐      ┌──────────┐         ┌──────────┐
  │ 1 0 1 0 1      │      │ 1 0 1    │         │ 4 3 4    │
  │ 0 1 0 1 0      │  *   │ 0 1 0    │   =     │ 2 4 2    │
  │ 1 0 1 0 1      │      │ 1 0 1    │         │ 4 3 4    │
  │ 0 1 0 1 0      │      └──────────┘         └──────────┘
  │ 1 0 1 0 1      │
  └─────────────────┘

Pooling Layers

Pooling reduces the spatial size of feature maps, decreasing computation and helping the network become invariant to small translations. Max pooling (taking the maximum value in each window) is the most common type.

Famous CNN Architectures

LeNet-5 (1998) — Pioneered CNNs for digit recognition
AlexNet (2012) — Won ImageNet, kicked off the deep learning revolution
VGGNet (2014) — Showed depth matters with 16-19 layers
GoogLeNet/Inception (2014) — Used inception modules for efficiency
ResNet (2015) — Introduced skip connections, enabled 152+ layers
EfficientNet (2019) — Balanced scaling of depth, width, and resolution

Applications

CNNs are used in medical imaging (detecting tumors), autonomous driving (recognizing pedestrians and signs), facial recognition, content moderation, satellite imagery analysis, and artistic style transfer. If it involves images, there's probably a CNN behind it.

🧪 Quick Quiz

Why are CNNs particularly good at image tasks?

← Previous Neural Networks Basics

Next → RNN & LSTM