<aside> 💡

This is part 1 of the assignment. We will release part 2 in coming days and appropriate time will be given for same.

</aside>

Introduction

This assignment involves solving a practical machine learning problem. We will gain experience with the iterative process of designing and validating a neural network architecture for a prediction problem using a popular deep learning library (PyTorch). A machine learning task involves making predictions from an input data source. In this assignment we will work with visual data in the form of images. Our focus will primarily be on thoroughly understanding the basic deep learning practices/techniques and a bit less on the modality (image) specific techniques (which can be explored in advanced courses). This assignment will provide few pointers on designing good models. However, additional reading and experimentation from the student’s side will be helpful. Please start the assignment early!

Problem Statement

We are given a data set consisting of images of different bird species with a total of K = 10 bird species present in the data set. Each species having between 500 and 1,200 images in the data set and each image containing a type single bird only. Please design a neural network that takes as input a bird image and predicts the class label (one of the K possible labels) corresponding to each image. The next section will provide some hints on designing and validating your model.

Model Design Guidelines

Data Prepration

To prepare the dataset for training, begin by splitting the images into training and validation sets. A common practice is to use an 80-20 split, where 80% of the dataset is allocated for training and the remaining 20% is reserved for validation, simulating the model’s behaviour during test time. This is not a strict rule and 70-30 or 60-40 splits are also used.
Create a data loader to handle loading the images for training and validation. The data loader will read the images, create batches, and apply any necessary preprocessing steps, such as resizing and normalization, to meet the input requirements of the model. You can start by implementing the split using scikit-learn’s train_test_split, and then proceed to build the data loader using PyTorch's DataLoader, incorporating any preprocessing steps as needed.

Network Layers

A basic CNN architecture typically consists of a series of convolutional layers that extract features from the image, followed by non-linearity (usually ReLU activations) to introduce non-linear behaviour. Use of pooling layers introduces spatial invariance and should be explored. Such sequence of layers, help the network learn important features from images that can aid in the classification task.
After the feature extraction, the output is passed through a fully connected layer (also known as a Multi-Layer Perceptron (MLP) denoted as nn.Linear in PyTorch.
Finally, the network needs to predict the class label, using an activation such as softmax. Though the final layer may conceptually involve applying softmax to generate class probabilities, PyTorch's built-in loss functions, such as CrossEntropyLoss, automatically handle the softmax internally, so you don’t need to apply it explicitly in the model.
Please keep track of the inputs and outputs resulting from these layers as they help in debugging. Experiments are required for designing the internal layers of your model.

Loss Function

Implement a loss function for multi-class classification. See the categorical cross-entropy. In PyTorch, you can implement this using nn.CrossEntropyLoss for multi-class problems.
Note that each class may have varied number of examples in the data set and some class fewer samples as compared to others. This is classed as class-imbalance and is commonly encountered in real world applications. To address this, you can give more weight to the underrepresented classes during training, penalizing the model more when it makes errors on these underrepresented classes. This can be done by assigning class weights inversely proportional to the number of samples per class, helping the model focus on minority classes and improve overall performance. You can refer this: link

Optimisation

The learning rate has a significant impact on model training. Please try scheduling the learning rate or performing grid search to find good values.
In class we covered Stochastic Gradient Descent (SGD) for optimisation. Other techniques have also been developed that improve upon basic SGD. For example, a popular optimizer, Adam optimizer adjusts the learning rate based on parameter properties (and additionally uses momentum). We suggest starting out with Adam as the optimizer. Other optimizers may also be tried, however, Adam usually provides good results.