Warning: if your device is low performing by today's standards, especially if you have a weak GPU, DO NOT run huge deep learning models like ResNet50 at or above 128 batches!
Imagine starting a new painting project. Instead of beginning with a blank canvas, you decide to use a pre-existing piece of artwork as a base. You can then modify the original artwork to suit your vision. By leveraging knowledge from a pre-trained model, you can accelerate the training process and improve the performance of your own model.
Transfer learning is a machine learning technique where a pre-trained model on one task is used as a starting point for training a new model on a related task. Here are several reasons why it is preferrably used:
Leverage Pre-trained Knowledge: CNNs typically learn hierarchical features, starting with simple elements like lines and colors. These low-level features are often common across various image domains. By using a pre-trained model, you can benefit from this learned knowledge.
Efficient Training: significantly reduces training time, especially for smaller datasets. Instead of starting from random weights, you can initialize your model with weights that have already been trained on a large dataset.
Improved Performance: often leads to better performance, especially when the target dataset is similar to the dataset used to train the pre-trained model.
This penultimate exercise for the Cupoy series explores the implementation of transfer learning on Python-Jupyter Notebook. The model we will transfer for learning is the ResNet50 model, a deep neural network architecture that consists of 50 weight layers. It was trained on more than a million images, has 177 layers in total, and can classify images into 1000 object categories (e.g. keyboard, mouse, pencil, and many other animals).
In the example below, three parameters in the TensorFlow's ResNet50 module had to be rewritten to support the latest version of neural network building:
weights = 'imagenet': loads the pre-trained weights of the ResNet50 model that were trained on the ImageNet dataset, which contains over 14 million images from 21,841 categories. Basically knowledge that can be leveraged.
include_top = False: specifies whether to include the classification head (final fully connected layer) of the ResNet50 model. If set to False, the model will output feature maps from the convolutional and pooling layers instead of a probability distribution over classes.
input_shape = (32, 32, 3): specifies the shape of the input data that the model expects. For ResNet50, its shape has to be its default (224 * 224). This can be changed by setting include_top to False.
Even when running on a batch size of 128, a transferred neural network with ResNet50 takes an arduous amount of time to run through even a measly 10 epochs. If your coding platform does not have access to GPU optimization or if your local device has low capabilities, your device's computational lag will skyrocket. Consider decreasing the batch size when you believe this will occur.
This time, unlike some examples before, the high demands of the transfer learning neural network provide outputs superior to past models. Besides the stably converging training curves, the valid loss finally converged at a relatively low value of around 0.98, while valid accuracy finally converged at around 75%.