Rant time: the first purpose of writing these blogs is to record my own learnings for constant rereading. If you are asking for the second purpose, they are self-reminders of my progress in learning AI – no matter how amnesiac I feel when needing to revisit old topics from time to time.
Beyond the linear stacking of layers, neural networks often demand intricate architectures to capture complex relationships within data. To address this, advanced frameworks offer a flexible canvas for constructing models with diverse topologies, enabling the creation of sophisticated systems capable of handling multifaceted problems.
Keras' Functional API offers a flexible framework for constructing complex neural network architectures. By allowing users to define models as directed acyclic graphs of layers, it provides greater control over network topology compared to the Sequential model, and also enables the creation of models with multiple inputs, outputs, shared layers, and complex branching structures.
Essentially, the Functional API treats layers as functions that can be combined and composed in various ways to build sophisticated models.
Comparing the Sequential model, which we learnt from 14th August, with the Functional API:
Functional API
A flexible framework for building complex models.
Allows for arbitrary graph structures, including shared layers and multiple inputs/outputs.
Requires more code to define the model compared to Sequential.
Offers greater control over the model architecture.
Sequential Model
A simpler API for building linear stack of layers.
Limited to sequential data flow.
Easier to use for beginners.
Less flexible than Functional API.
While Sequential models are a subset of Functional models, they are often sufficient for many practical use cases. The choice between the two depends on the complexity of the desired model architecture.
A multi-input, multi-output model constructed using the Keras Functional API is a neural network architecture that can process multiple data sources simultaneously and produce multiple outputs. This flexibility allows for complex problem-solving scenarios where different types of information need to be combined to make predictions or decisions.
The model incorporates the following components:
Multiple Inputs: two distinct input layers – main_input and aux_input – representing different data modalities.
Feature Extraction: an embedding layer processes the main_input, while an LSTM (Long Short-Term Memory) layer handles the aux_input. An LSTM layer is a type of RNN (read 8th August 2024) specifically designed to address the vanishing gradient problem, an issue that hinders traditional RNNs from capturing long-term dependencies in sequential data.
Data Fusion: a Merge layer combines the processed inputs from the embedding and LSTM layers.
Multi-Task Learning: the model produces outputs main_output and aux_output, suggesting a multi-task learning setup.
Shared Layers: the dense_1, dense_2, and dense_3 layers are shared between the two output branches, promoting efficient parameter sharing.
A quick definition providing for modules unseen before this exercise:
Input: defines the shape and type of data that will be fed into the model, serving as a placeholder for input data. Needs a shape key parameter to specify the input tensor dimensions.
Embedding: converts integer-encoded input data into dense vectors of fixed size. Maps discrete input values to continuous vector representations. Needs key parameters input_dim (size of the vocabulary), output_dim (dimension of the embedding space), input_length (length of the input sequence).
In a nutshell: the first one defines the structure of input data, while the second transforms categorical data into a numerical representation suitable for neural network processing.
Increasing the number of dense layers typically leads to a more complex model capable of learning intricate patterns, but the larger model is prone to overfitting, requiring stronger regularization techniques (dropout, L1/L2 regularization, or early stopping). It will demand more computational resources for training and inference.
RMSprop (Root Mean Square Propagation) is an adaptive learning rate optimization algorithm designed to address challenges encountered with the stochastic gradient descent (SGD) method. In contrast to batch gradient descent, SGD calculates the gradient of the loss function using only one randomly selected training example in each iteration instead of the entire dataset.