Even in the theoretical data-scape, abrupt change can manifest in the blink of an eye. A deep learning model's excellently low loss function can spike back up at any time or iteration.
In essence, an overfitted model fails to generalize to new data, relying too heavily on the patterns learned from the training set. This can hinder the model's ability to make accurate predictions or decisions in real-world scenarios. Today, we will explore how the phenomenon of overfitting appears in a data-scape, as well as go through some considerations related to model training. The contents of this section will provide extra context for future sections on which components of a neural network affects their accuracy and loss function values.
If you reread the section from 14th July, you would recall the concept of overfitting. To recap, overfitting occurs when a machine learning model has memorized its training data too well, to the point it performs poorly on new, unseen data. It is like memorizing a script instead of understanding the underlying concepts.
To see whether your model is overfitted, a common solution is to split out a validation set from half of your test set. Rule of thumb generally splits all three sets by a ratio of 80:10:10 – training, validation, and test.
To show you how overfitting manifests in neural networks, we will set up one with Keras' CIFAR-10 dataset that runs for 500 epochs. As CIFAR-10 contains nonnumerical images, remember to convert them into a readable format like single vector and encode them for your model to read properly.
We will use stochastic gradient descent (SGD) for creating this exercise's multi-layer perceptron (MLP).
The results of our relatively simple MLP with 3 dense layers are below average. Starting at around 0.13 training accuracy and 0.20 validation accuracy, the output then steadily rises to and ends at 0.59 training accuracy and 0.52 validation accuracy after 500 epochs. For the loss function, it goes from 2.31 to 1.18.
At the beginning, as the plot below shows, the training and validation loss functions have close values. It is not until 100 epochs that the gap between the two loss functions gradually widened while their values decreased.
This observation also applies to their accuracy scores. Deep learning overfitting appears as its training set accuracy being glaringly higher than its validation/training set's accuracy. Further contextualizing from the example at the start, this is akin to a student memorizing most of their studies instead of the general knowledge contained within, which could inevitably result in a lower test grade with differently worded data.
Training a machine learning model is costly. Not just to your GPU quota, the max number of GPUs your project can use, but also your precious time and more. Consider these factors before you invest your resource to seriously train one:
Device Usage: determine the computational resources available for training and inference. Check if the GPU is being used efficiently or if there are conflicts with other processes.
Input Preprocessing: ensure that the necessary input data (X) is present and accessible. Verify if the data has been standardized or normalized for optimal model performance.
Output Preprocessing: check if the target labels (Y) have been processed appropriately (e.g., one-hot encoded for classification).
Model Architecture: compare the actual model architecture to the desired or expected structure.
Hyperparameter Settings: evaluate if the relevant hyperparameters of the training model are set appropriately.
For those without a GPU on their native device(s), Google Colab is a powerful cloud-based platform that provides a free Jupyter Notebook environment for running Python code. It is particularly useful for machine learning tasks, offering free access to training-accelerating GPUs, easy sharing and collaboration on projects, and other resources without requiring local installation.
Though be wary of the platform automatically disconnecting if you plan to go idle for an extended period. On the topic of free GPUs, availability can be limited, especially during peak usage times.