We welcome the ascension of our artificial overlords!

Back

4th September 2024 - Keras: Custom Callbacks and Loss Functions

Why do computer programs have to have so many languages? The creators' based inspirations of diverse languages can be quite detrimental to global work.

While Keras provides a rich set of built-in callbacks and loss functions, there are often scenarios where custom implementations are necessary to achieve specific goals or address unique challenges. These challenges can be about implementing nonstandard metrics, exploration of new approaches, integration with external unregistered tools, etc.

Why customize a callback?

In case the Keras dataset has no callback function for some hyperspecific tasks, you will have to build your own from scratch with Keras' open-source components. This involves creating a new Python class that inherits from keras.callbacks and overriding the desired methods. By doing so, you can tailor the callback to your specific needs and gain more control over the training process.

Speaking of customization, callbacks are similar to functions in terms of surface-level formatting, but at the end of the day, they are ultimately separate items:

Purpose: callbacks are tailored for training tasks, while functions have a broader range of applications.
Triggering: callbacks are event-driven, while functions are typically called manually.
Functionality: callbacks often provide specialized features for training, while functions offer more general-purpose capabilities.

Custom callback on Python

This exercise aims to use callback to create a confusion matrix class. To briefly recap, a confusion matrix for a prediction model is a table layout of 4 squares that separately contain the amount of correct positive guesses, correct negative guesses, wrong positive guesses, and wrong negative guesses.

Instead of using def like in making functions, custom callbacks use class to contain scripts. The creation process is identical to writing functions; syntaxes and definitions differ from then on. Callbacks in a callback can be used to perform various tasks at specific steps of model training, compiling, or fitting:

__init__: automatically called when an object of that class is created. In the example below, initializes instance variables that will be used by the callback.
on_train_begin: called once at the start of the entire training process.
on_train_end: called once at the end of the entire training process.
on_batch_begin: called at the beginning of each batch of data.
on_batch_end: called at the end of each batch of data.
on_epoch_begin: called at the beginning of each epoch (one complete pass through the entire dataset).
on_epoch_end: called at the end of each epoch.

The loss and accuracy curves are apart by a relatively small gap, suggesting a somewhat balanced fit with a smaller risk of overfitting. As for the confusion matrix plot, both true and false curves are sequentially increasing and decreasing stably, indicating that the model is classifying instances more clearly.

Focal loss and cross-entropy loss

Based on cross-entropy loss, the focal loss is a modulating term to its base optimizer in order to focus learning on hard misclassified examples. It is particularly effective when dealing with datasets where one or more classes have significantly fewer samples than others. It works by:

Modulation Term: introducing a modulating factor to the cross-entropy loss, which downweighs the contribution of easy samples.
Focus on Hard Examples: allowing the model to focus on learning from the more difficult, misclassified examples, improving overall performance.
Hyperparameter: including a hyperparameter (gamma) that controls the degree of focusing. A higher gamma value places more emphasis on hard examples.

Custom loss function on Python

Besides the gamma hyperparameter in focal loss is alpha, a balancing factor that controls the relative weight assigned to positive and negative samples. It helps to address class imbalance problems, where one or more classes have significantly fewer samples than others. In different values, alpha:

Alpha = 0.5: indicates a balanced dataset, where positive and negative samples are equally weighted.
Alpha < 0.5: places more emphasis on positive samples, which can be useful if the dataset is heavily skewed towards negative samples.
Alpha > 0.5: places more emphasis on negative samples, which can be useful if the dataset is heavily skewed towards positive samples.

In mathematics, a tensor is a multidimensional array that can represent various mathematical objects, such as vectors, matrices, and higher-order quantities. In deep learning, tensors are used to represent data, weights, and gradients.

TensorFlow operations manipulate and process tensors to perform calculations and update model parameters, and typically uses float32 format for numerical computations, which provides a balance between precision and efficiency.

Here is a breakdown of the descending steps involved in the mixed focal and cross-entropy loss function:

Adds a small epsilon value to the predicted values to prevent division by zero in the subsequent calculations.
Computes the standard cross-entropy loss between the true labels (y_true) and the predicted values (model_out).
Calculates the weight for each sample based on the predicted probability and the gamma parameter. This down-weights well-classified samples.
Computes the focal loss using the calculated weights and cross-entropy loss.
Reduces the focal loss along the specified axis (axis = 1) to get the maximum value for each sample.
Computes the categorical cross-entropy loss by combining the reduced focal loss and the categorical cross-entropy loss using the specified weights (fl_weights and ce_weights).

The loss curves from models running on different cross-entropy loss values (CEs) converge at greater values the higher their CE,. This is most prominent in the complete line validation curves.

As for accuracy, all validation curves converge at roughly the same value and rate, which contributes to the creation of a wide gap between the training and validation curves, displaying notable overfitting. On the flip side, lower CEs result in a comparatively lower accuracy.

Page updated

Google Sites

Report abuse