Forgetting the past incurs an unknown probability of recurrence – a type of déjà vu or some sorts.
In the realm of deep learning, transfer learning leverages the knowledge gained from pre-trained models to accelerate training and improve performance on new tasks. By freezing certain layers of a pre-trained model and fine-tuning others, we can effectively transfer valuable features and patterns to our target task, even with limited data.
As a topic we already covered on 15th September, transfer learning is a technique in deep learning where a pre-trained model on one task is used as a starting point for training a new model on a related task. Using this technique can:
Feature Extraction: let pre-trained models leverage learned features from the original dataset, which can be reused for new tasks.
Reduced Training Time: significantly reduce the training time required for new tasks, as the model starts with an initial set of pre-tuned weights.
Improved Performance: lead to better performance on new tasks, especially when the datasets are similar.
Shallow convolutional layers typically learn more general and reusable features, while deeper layers learn more task-specific features. When transferring a pre-trained model to a new task, it can be beneficial to freeze the earlier layers of the network, limiting the number of parameters that can be updated. This prevents the network from overfitting to the new data and helps to preserve the learned features from the original task.
On the other hand, if the new task is similar to the original task, fine-tuning the top layers of the network can improve performance by allowing the model to learn task-specific features while leveraging pre-trained features from earlier layers. Additionally, higher learning rates for fine-tuned layers can allow the model to adapt to the new task.
This exercise explores the impacts freezing has on CNN transfer learning. The dataset we will transfer today is the ResNet50 dataset, which will be trained and valdiated with the CIFAR-10 dataset.
ResNet50 has 50 weight layers, as its name applies. After changing its input shape to match CIFAR-10's image size of 32*32, as well as changing the feature space of 2048 dimensions into a classification problem with 10 classes, the model depth has almost tripled to 177 layers.
The first 100 layers of our model will be frozen, meaning they will not be trainable, and their weights will remain the same.
Compared to our past method of one-hot encoding with Keras.utils.to_categorical(), one-hot encoding with OneHotEncoder().fit_transform().toarray() grants users greater flexibility and scalability, although it requires installing scikit-learn and is more verbose/wordy.
Pick your poison.
High accuracy is not the goal here; ours is to explore how freezing affects training outputs. Yet, despite the mentioned pros of transfer learning freezing, it is clear in the output below that every technique affects each separate deep learning model differently.
Over 10 epochs, model valid loss spiked upwards astronomically at the 2nd and 6th epochs exclusively, quickly dropping to values close to training loss afterwards. As for both accuracy curves, training accuracy steadily converges without significant oscillations, while valid accuracy fluctuates erratically, as seen with the valid loss curve.