By far my most successful tackling with CIFAR-10.
As we are learning from a while ago, convolutional neural networks (CNNs) are a powerful class of deep learning models specifically designed for processing and analyzing image data. Keras, a high-level API built on top of TensorFlow and Theano, provides a user-friendly interface for building and training CNNs.
Today's section is an exercise-heavy exploration of CNN architecture in the Keras database. For this exercise, the focus is on the implementation of standard 2D convolution and its variant separable 2D convolution.
Each 2D conversion layer below shrinks the output shape by a little. To understand this further, imagine you have a large photo album filled with pictures of various sizes. To make the album more compact, you decide to shrink each photo to a smaller size. This process is similar to what happens in a CNN when the output shape of the feature maps shrinks after each convolutional layer.
Just like shrinking photos in an album, convolutional layers reduce the size of the feature maps by focusing on smaller regions and potentially cropping the edges. This helps to reduce computational complexity and extract the most important features from the image.
The variant separable 2D convolution (SeparableConv2D) decomposes a 2D convolution into two 1D convolutions – one along the rows and one along the columns – to significantly reduce the number of parameters and computational cost.
This method typically requires fewer parameters than its predecessor and can be more computationally efficient, but is overall an approximation of regular 2D convolution, meaning there may be a slight loss in accuracy.
This exercise aims to compare the performances of the deep neural network (DNN) with the CNN on CIFAR-10, an image database with 10 classes of different items.
Both models will be compiled with categorical cross-entropy loss function and RMSprop optimizer.
This DNN with 3 dense layers (with 512 hidden layers) and dropout layers produced a total of 1,841,162 parameters, having already created 1,573,376 by the end of the first dense layer…
Despite having a massive number of parameters, the DNN only outputted a final validation accuracy of around 0.44 and loss of around 1.55.
Reading the output plots below, both loss and accuracy curves exhibited steady convergence. There were obvious and abrupt oscillations in the valid curves, and after the 8th epoch, valid loss increased by a little. By the 10th epoch, a small gap has formed between the train and valid curves, suggesting a potential for overfitting.
The CNN is built with 3.6 times more layers (compare 18 to 5) in the DNN above, including additions such as pooling layers and a flatten layer. Without pooling layers, each epoch would take far longer to generate an output. Without a flatten layer, the model would not run to begin with due to data incompatibility.
The summary below displays the CNN's lower overall number of parameters generated. This would result in a lower runtime for the model, if not for the fact that CNNs have a far more complex architecture than DNNs.
Even though the CNN has a notably longer runtime per epoch, its validation accuracy is much higher than outputs from past DNNs.
The subplots below display stably converging train curves and periodically erratic valid curves, especially for valid loss. While the valid curves heavily converged by the 3rd to 7th epoch, their performances reversed slightly towards the end.