Convolutional Neural Networks (CNN)

What Is Convolution?
Convolutional Neural Networks
Pooling, Downsampling, and Upsampling
1. Pooling Layer
2. Downsampling and Upsampling
BatchNorm

What Is Convolution?

Convolution is a mathematical operation commonly used in signal processing and image analysis to extract features. In deep learning, convolution layers are used to capture local patterns of the input data.

Core Concepts:

Kernel / Filter: A small matrix that scans across the input.
Receptive Field: The region covered by the kernel at each step.
Parameter Sharing: The same kernel weights are applied across the whole input, greatly reducing parameters.

Convolution Formula:

Code Example:

import torch
import torch.nn as nn

# Define a simple 2D convolution
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=1)
input_tensor = torch.randn(1, 1, 5, 5)  # batch=1, channel=1, height=5, width=5
output = conv(input_tensor)
print("Convolution output shape:", output.shape)

nn.Conv2d defines a 2D convolution layer.
- in_channels=1: input has 1 channel (e.g., grayscale image).
- out_channels=1: output will have 1 channel.
- kernel_size=3: 3×3 kernel.
- stride=1: kernel moves 1 pixel at a time.
- padding=1: pads the input with one layer of zeros so output size matches input.
input_tensor:
- A simulated input tensor of shape [batch_size, channels, height, width].
- Here it’s one 5×5 single-channel feature map.
Convolution Operation:
- conv(input_tensor) applies convolution to produce an output feature map.
- Output size formula: Substituting:

The output will be a feature map of size [1, 1, 5, 5].

Convolutional Neural Networks

A Convolutional Neural Network (CNN) is a deep learning model designed to process grid-structured data such as images. CNNs extract hierarchical feature representations through stacked convolution layers, activation layers, and pooling layers.

Main Components:

Convolutional Layer: Extracts spatial features.
Activation Function: e.g., ReLU, introduces non-linearity.
Fully Connected Layer: Maps extracted features to classes.

Code Example:

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=1, padding=1) # From 1 input channel to 16 feature maps
        self.relu = nn.ReLU() # Non-linear activation
        self.fc = nn.Linear(16 * 28 * 28, 10)  # Flattened features → 10 output classes

    def forward(self, x):
        x = self.conv1(x) # Convolution
        x = self.relu(x) # Non-linear activation
        x = x.view(x.size(0), -1)  # Flatten to 1D vector
        x = self.fc(x) # Fully connected layer
        return x

The input image is [28, 28], suitable for datasets like MNIST.
The final output contains 10 class scores.

Pooling, Downsampling, and Upsampling

Pooling Layer

Pooling reduces the spatial size of feature maps. Common types include Max Pooling and Average Pooling. Pooling reduces computation and increases feature robustness.

Code Example:

pool = nn.MaxPool2d(kernel_size=2, stride=2)
input_tensor = torch.randn(1, 16, 28, 28)
output = pool(input_tensor)
print("After pooling:", output.shape) # torch.Size([1, 16, 14, 14])

nn.MaxPool2d:

Defines a max pooling layer.
Parameters:
- kernel_size=2: window size 2×2.
- stride=2: moves 2 pixels each step (non-overlapping).

Downsampling and Upsampling

Downsampling uses pooling or resizing to reduce resolution.

Upsampling increases resolution, often used in generation tasks or semantic segmentation.

Code Example:

import torch.nn.functional as F

# Downsampling
downsampled = F.interpolate(input_tensor, scale_factor=0.5, mode='bilinear')

# Upsampling
upsampled = F.interpolate(input_tensor, scale_factor=2, mode='bilinear')

print("After downsampling:", downsampled.shape)
print("After upsampling:", upsampled.shape)

Downsampling:
- scale_factor=0.5: shrinks feature map by half.
- mode='bilinear': smooth bilinear interpolation.
Upsampling:
- scale_factor=2: doubles feature map size.
- mode='bilinear': smooth enlargement.

BatchNorm

Batch Normalization stabilizes and accelerates training by normalizing the mean and variance of intermediate activations.

Major Benefits:

Faster convergence.
Reduced overfitting.
Improved training stability.

Code Example:

x = torch.randn(1, 16, 7, 7)
print("x.shape =", x.shape)
layer = nn.BatchNorm2d(num_features=16)
out = layer(x)
print("out.shape =", out.shape)
print("layer.weight.shape =", layer.weight.shape)
print("layer.bias.shape =", layer.bias.shape)

x.shape = torch.Size([1, 16, 7, 7])
out.shape = torch.Size([1, 16, 7, 7])
layer.weight.shape = torch.Size([16])
layer.bias.shape = torch.Size([16])
vars(layer) = {
    'training': True,
    '_parameters': OrderedDict([('weight', Parameter containing:
        tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
            requires_grad = True)), ('bias', Parameter containing:
        tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            requires_grad = True))]),
    '_buffers': OrderedDict([('running_mean', tensor([0.0283, 0.0018, -0.0207, -0.0116, -0.0092, -0.0127, 0.0054, -0.0146, -0.0152, -0.0171, -0.0153, 0.0133, 0.0122, -0.0066, 0.0116, 0.0064])), ('running_var', tensor([0.9926, 0.9942, 0.9708, 0.9982, 1.0170, 0.9690, 1.0005, 1.0011, 0.9939,
        0.9892, 0.9577, 0.9949, 1.0047, 0.9766, 0.9797, 1.0248
    ])), ('num_batches_tracked', tensor(1))]),
    '_non_persistent_buffers_set': set(),
    '_backward_hooks': OrderedDict(),
    '_is_full_backward_hook': None,
    '_forward_hooks': OrderedDict(),
    '_forward_pre_hooks': OrderedDict(),
    '_state_dict_hooks': OrderedDict(),
    '_load_state_dict_pre_hooks': OrderedDict(),
    '_load_state_dict_post_hooks': OrderedDict(),
    '_modules': OrderedDict(),
    'num_features': 16,
    'eps': 1e-05,
    'momentum': 0.1,
    'affine': True,
    'track_running_stats': True
}

nn.BatchNorm2d:
- Defines batch normalization for 2D feature maps.
- num_features=16: number of channels.
Input Tensor:
- Example feature map with batch size 1 and 16 channels.
Batch Normalization Operation:
- Normalizes each channel to zero-mean, unit-variance.
- Then applies learnable scaling and shift:
- Helps faster convergence and stabilizes training.

Convolutional Neural Networks (CNN)

What Is Convolution?

Convolutional Neural Networks

Pooling, Downsampling, and Upsampling

Pooling Layer

Downsampling and Upsampling

BatchNorm

Other Concepts in Deep Learning

Classic Convolutional Networks

Reconstructing the MNIST Dataset Using VAE

Variational Autoencoder

Unsupervised Learning and Autoencoders

Latest Posts

Prisma - Filtering

Prisma - Transaction Handling — Batch and Interactive

Convolutional Neural Networks (CNN)

What Is Convolution?

Convolutional Neural Networks

Pooling, Downsampling, and Upsampling

Pooling Layer

Downsampling and Upsampling

BatchNorm

Other Concepts in Deep Learning

Classic Convolutional Networks

You may also like

Reconstructing the MNIST Dataset Using VAE

Variational Autoencoder

Unsupervised Learning and Autoencoders

Latest Posts

Prisma - Filtering

Prisma - Transaction Handling — Batch and Interactive

View Categories

Explore Tags

Get Interesting News