Convolutional Neural Network from Scratch

Source Code

Full implementation available at DL_from_scratch/cnn.py

Convolutional Neural Network

Core Concept

The key concept in CNN is calculating what is sent to the next layer through convolution operations.

Pooling vs Batch Normalization

Pooling:

Downsamples spatial dimensions
Reduces computational cost
Loses some spatial information

Batch Normalization:

Normalizes feature distribution using mean and variance
Preserves spatial dimensions
Stabilizes training

Output Size Calculation

For convolution and pooling operations:

\[\text{output\_size} = \left\lfloor \frac{\text{input} + 2 \times \text{padding} - \text{kernel}}{\text{stride}} \right\rfloor + 1\]

Example with Pooling:

Input: (batch=1, channels=32, height=28, width=28)
MaxPool2d(kernel=2, stride=2)
Output: (1, 32, 14, 14) → Half the spatial size

\[h_{out} = \frac{28 + 0 - 2}{2} + 1 = 14\]

Standard CNN Pattern

Conv -> BatchNorm -> ReLU -> MaxPool (repeat 2-3 times) -> Flatten -> Dense

Implementation

2D Convolution Layer

class CNN2D:
    """2D Convolutional Layer with stride and padding support.
    
    Performs correlation (not true convolution) for forward pass.
    """
    
    def __init__(self, 
                 in_channels=1,    # Number of input channels
                 out_channels=32,  # Number of filters
                 kernel=3,         # Kernel size
                 stride=1,         # Stride for sliding
                 padding=0):       # Zero padding
        
        # Xavier initialization
        self.weights = np.random.randn(out_channels, in_channels, kernel, kernel) \
                       * np.sqrt(2.0 / (in_channels * kernel * kernel))
        self.bias = np.zeros(out_channels)
    
    def _pad_input(self, X):
        """Apply zero padding to input."""
        ...
    
    def _correlate2d(self, input_slice, kernel):
        """2D correlation (sliding kernel over input)."""
        ...
        
    def forward(self, X):
        """Forward pass: apply convolution filters.
        
        For each output channel:
            Sum correlation of input channels with corresponding kernels
            Add bias
        """
        ...
    
    def backward(self, out_gradient, lr=0.001):
        """Backward pass with gradient computation.
        
        1. If stride > 1, upsample gradient
        2. Compute kernel gradients via correlation
        3. Compute input gradients via full convolution
        4. Update weights and bias
        """
        ...

Activation and Pooling Layers

class ReLU:
    """ReLU activation: max(0, x)"""
    
    def forward(self, X):
        self.input_data = X
        return np.maximum(0, X)
    
    def backward(self, out_gradient):
        return out_gradient * (self.input_data > 0)


class MaxPool2D:
    """Max Pooling layer for downsampling.
    
    Keeps track of max indices for backward pass.
    """
    
    def __init__(self, pool_size=2, stride=2):
        self.pool_size = pool_size
        self.stride = stride
    
    def forward(self, X):
        """Select maximum value in each pooling window.
        Store indices for backward pass."""
        ...
    
    def backward(self, out_gradient):
        """Route gradients only to max positions."""
        ...

Flatten and Dense Layers

class Flatten:
    """Flatten spatial dimensions for dense layers."""
    
    def forward(self, X):
        self.input_shape = X.shape
        return X.reshape(1, -1)
    
    def backward(self, out_gradient):
        return out_gradient.reshape(self.input_shape)


class Dense:
    """Fully connected layer."""
    
    def __init__(self, input_size, output_size):
        # He initialization
        self.weights = np.random.randn(input_size, output_size) \
                       * np.sqrt(2.0 / input_size)
        self.bias = np.zeros(output_size)
    
    def forward(self, X):
        """Linear transformation: Y = XW + b"""
        ...
    
    def backward(self, out_gradient, lr=0.001):
        """Compute and apply gradients."""
        ...

Softmax and Loss

class Softmax:
    """Softmax activation for classification."""
    
    def forward(self, X):
        exp_X = np.exp(X - np.max(X, axis=1, keepdims=True))
        self.output = exp_X / np.sum(exp_X, axis=1, keepdims=True)
        return self.output
    
    def backward(self, y_true):
        # Combined softmax + cross-entropy gradient
        return self.output - y_true


def cross_entropy_loss(y_pred, y_true):
    """Cross entropy loss for one-hot encoded labels."""
    return -np.sum(y_true * np.log(y_pred + 1e-8)) / y_pred.shape[0]

Model Architecture

# Build CNN model
conv1 = CNN2D(in_channels=1, out_channels=8, kernel=3, stride=1, padding=1)
relu1 = ReLU()
pool1 = MaxPool2D(pool_size=2, stride=2)

conv2 = CNN2D(in_channels=8, out_channels=16, kernel=3, stride=1, padding=1)
relu2 = ReLU()
pool2 = MaxPool2D(pool_size=2, stride=2)

flatten = Flatten()
dense1 = Dense(16 * 7 * 7, 128)  # After two 2x2 poolings: 28->14->7
relu3 = ReLU()
dense2 = Dense(128, 10)
softmax = Softmax()

Training Loop

The training loop processes one sample at a time:

Forward pass through all layers sequentially
Compute loss using cross-entropy
Backward pass propagating gradients through all layers in reverse
Weights update happens inside each layer’s backward method

Citation

BibTeX citation:

@online{prasanna_koppolu,
  author = {Prasanna Koppolu, Bhanu},
  title = {Convolutional {Neural} {Network} from {Scratch}},
  url = {https://bhanuprasanna2001.github.io/learning/ai/DL/cnn},
  langid = {en}
}

For attribution, please cite this work as:

Prasanna Koppolu, Bhanu. n.d. “Convolutional Neural Network from Scratch.” https://bhanuprasanna2001.github.io/learning/ai/DL/cnn.