This marathon will be over soon. What next after this? Prepare for my upcoming Masters, of course.
Facial recognition, a powerful application of computer vision, employs neural networks to identify and verify individuals based on their unique facial features. By analyzing patterns and characteristics within facial images, neural networks can learn to distinguish between different faces with remarkable accuracy.
One method for machines to see our more detailed faces is facial keypoint detection. It is a computer vision technique used to identify and locate specific points on a human face, such as the eyes, nose, mouth, and eyebrows. These keypoints can then be used for various applications, including facial recognition, expression analysis, and augmented reality.
Traditional object detection would be like trying to find the person by looking for general traits (i.e., height, clothing color, or hair style). This might not be enough to pinpoint the exact individual. Facial keypoint detection would be like focusing on specific facial features (i.e., shape of the eyes, nose, and mouth). This method could more accurately identify the person, even in different clothing or a different setting.
This subsequent exercise focuses on the application of an object detection model on a CSV file, which saves data as numbers in a table-structured format. This includes images.
Neural networks generally process numbers faster due to their smaller size. Of course, as stated many times before, the model's complexity and object detection accuracy are traded off for greater speed. In this learning context, we are observing how CSV object prediction is done instead of aiming to produce a high accuracy model.
We see below that both loss values of this facial keypoint detector model are incredibly low. To further assess its performance, we can compare the curves' trends and differences. As is in most neural network outputs, training loss is logically lower than validation loss.
Matplotlib can visualize processed datasets into pixelated photos with highlighted keypoints. They are loaded and processed much faster than true image datasets, but again, may not capture the full complexity and nuance of human faces.
To test the model's viability, we randomized the images and their keypoints that our model will predict. More on the nuance part, excluding a small few outliers, most notably amongst the tilted faces, the model predicted the CSV test data very accurately. The mathematical nature of CSV files seems to heavily contribute to the high accuracy.
The trade-off between accuracy and efficiency is a critical consideration in deploying deep learning models in real-world applications with limited computational resources like mobile phones. Lightweight models aim to achieve a balance between accuracy and efficiency without need for heavyweight hardware and denser networks.
Imagine you are boarding a lightweight plane trying to depart with limited luggage space. Your plane (lightweight model) would pack only the essentials to lift off, leaving behind unnecessary items (redundant parts) to reduce the weight (workload) of your plane.
One of the techniques for lightweight modeling is model pruning. Pruning algorithms identify and remove connections that have little or no impact on the model's output. This can reduce the model's size and complexity, making it more efficient.
Imagine you are cleaning out your closet. You start by removing clothes (unimpactful connections) that have just been there for the past year. By removing these unused garments, you can open up new space (improve model performance) to put new things in there.
The next technique, quantization, is used to reduce the precision of weights and activations in a neural network. By converting floating-point numbers (FP32 or FP64) to lower-precision formats like FP16 or INT8, quantization can significantly reduce the model's size and computational requirements.
Imagine trying to describe the color of a shirt to someone with a limited color palette. Instead of using precise terms like 'cerulean' or 'vermillion', you might simplify (reduce precision) it by saying 'blue' or 'red' (fit to limited set of values).
The final technique in this short list is architecture design, the process of selecting and arranging the components of a neural network to achieve desired performance. It involves making decisions about the number of layers, type of layers (e.g., convolutional, fully connected), number of neurons in each layer, and connections between layers.
MobileNet is a lightweight neural network architecture based on depthwise separable convolutions, a technique that decomposes a regular convolution into two separate operations:
Depthwise Convolution: applies a single filter to each input channel independently. Performs a 1*1 convolution on each channel of the input tensor. Extracts features from each channel separately.
Pointwise Convolution: applies a 1*1 convolution to combine features from the depthwise convolution.
Compared to standard CNN architectures, MobileNet architecture has:
Depthwise separable convolutions that are smaller in parameters and computations than typical 3*3 convolutional layers.
Hyperparameters, like width multiplier (alpha) and resolution multiplier, for customization based on resource constraints and desired accuracy.
Separable convolutions can provide the following benefits to your model:
Factorization: factorize a regular 2D convolution into two smaller operations, which reduces number of parameters and computations required.
Preserved Output: output of a separable convolution is the same as the output of a regular 2D convolution with the same kernel size, meaning there is no loss in expressive power.
Lower Computational Cost: the 1/D + 1/9 ratio of a separable convolution indicates that its total computational cost is between that of a depthwise convolution and a general convolution.