I believe every human creation imprints some characteristics from their human creators. Many products have reflected on our biases, know-how, and views.
AI has made significant strides through the development of complex computational models capable of learning intricate patterns from data. These models, inspired by the human brain's structure, excel at tasks once considered exclusive to human cognition. By constructing layered architectures and adjusting parameters iteratively, these systems can approximate and model extraordinarily complex functions.
TensorFlow is an open-source framework developed by Google researchers to run machine learning, deep learning and other statistical and predictive analytics workloads.
The Google product TensorFlow Playground is an interactive web-based tool designed to provide a visual and intuitive understanding of neural networks. It allows users to experiment with different neural network architectures, hyperparameters, and data patterns without requiring any coding knowledge. Going through this tool can show users the inner workings of neural networks, not to mention provide a hands-on experience without diving into complex code.
In the TensorFlow Playground are 4 classification and 2 regression problem plots; the website starts with the concentric circles pattern classification plot by default.
In the web-based tool are components such as:
Epoch: the number of training iterations completed.
Learning rate: controls the step size during gradient descent.
Activation: activation function used in neurons. In the example below, Tanh (hyperbolic tangent) is an activation function commonly used in neural networks; it is mathematically represented as tanh(x) = (e^x - e^-x) / (e^x + e^-x). Tanh maps real numbers to the range of -1 to 1.
Regularization: type of regularization applied (L1, L2, or Dropout).
Regularization rate: strength of the regularization.
Problem type: classification or regression.
Subsequent discussions and observations will primarily focus on the model's convergence behavior, as indicated by the decreasing training and test loss values over epochs. This will include analysis of factors influencing convergence rate, potential overfitting or underfitting, and the impact of hyperparameter tuning on model performance.
Additionally, we will examine the visualization of the decision boundary in the output plot to understand the model's decision-making process and its ability to classify data points accurately.
Deepening a neural network refers to increasing the number of layers in the network. This is a common technique to improve model performance, especially for complex tasks. By adding more layers, the network can learn hierarchical representations of the data, capturing intricate patterns and features.
With the concentric circles pattern plot, adding 1 more hidden layer with 2 neurons has increased the decrease rate of the test and training losses. Another observation worth noting is the middle row neurons' weights being readjusted into the circular range seen at the end of the GIF below.
Switch to a double cluster plot, due to how clearly separated each cluster is, the 1-layer classification network has set up ranges for each data point feature category.
Having 2 hidden layers for the same classification model has curved the range border away from the edges of each cluster, though not by a lot since the scatter plot data is so simple in structure, it converges to 0 without hidden layers. Though for real-world data, since it is inherently noisy, a perfect convergence (of 0) is considered impossible. A more realistic goal is to minimize the loss function as much as possible while avoiding overfitting.
In a 1-layer neural network, the regression model on the scatter plot starts with around 0.2 (rounded down) test and training loss. After a brief second, both loss values are quickly reduced to 0. A rapid initial decrease in loss suggests that the model is effectively learning from the data and making significant progress.
With 2 hidden layers in play, the regression model starts with lower loss values of around 0.1 (also rounded down). By around the 50-epoch mark, the test and training losses become 0.
Beware, though, since a rapid initial decrease in loss also increases the risk of overshooting. Overshooting means that the learning rate is too high, and the algorithm jumps over the minimum of the loss function, resulting in oscillations or divergence, where the algorithm fails to converge to an optimal solution.
Like managing overfitting in machine learning model training, balancing the learning rate is crucial for effectively training machine learning models, as it directly affects their performance and ability to generalize to new data.
Broadening a neural network involves increasing the number of neurons in one or more layers. This can enhance the model's capacity to represent complex functions and improve its ability to fit the training data.
But for this exercise, we are doing the opposite of broadening!
Including only 2 neurons in 1 hidden layer drastically altered the model's range from Gaussian blob-like to cone-like. Total loss starts notably higher than the model with 4 neurons and decrease slower, as well as a much faster convergence rate. (About 10 times faster?)
There are no 'traditionally wrong' results in regression plots – excluding glaring outliers and inherent biases – as they predict new values, not categorizing data points with clear labels from the start. In the GIF below, running a 2-neuron regression model for a scatter plot results in a higher starting total loss.
Implementing the 3rd and 4th features – vertical pillar and horizontal pillar – with the 2-neuron neural network in a concentric circles plot creates a lower starting total loss and a more accurate output range, compared to the rapid convergence value and drastically altered range shape of the 2-neuron model that ran on the 1st and 2nd features.
Certain features might be incompatible with specific neural network architectures, resulting in suboptimal model performance or even complete failure to converge. These incompatible features can hinder the model's ability to learn meaningful representations from the data, leading to inaccurate predictions and high error rates.