Even big waves can start with small ripples.
In regards of machine learning, particularly computer vision, the ability to manipulate and transform images is essential for enhancing model performance and addressing real-world challenges. Geometric transformations offer powerful tools to augment your dataset and improve model generalization.
Interpolation in data augmentation refers to the process of generating new data points between existing data points. It is a technique used to artificially increase the size and diversity of a dataset, which can improve the performance of machine learning models.
When an image is enlarged, new pixels are introduced. Take the grid below as an example. Interpolation algorithms are used to determine the appropriate values for these new pixels based on the surrounding pixels. Said pixels or more are discarded when the image is reduced.
The bilinear interpolation variant is a common method used in image processing to estimate pixel values in resized images. It is a form of interpolation that uses a weighted average of the four nearest pixels in the original image to calculate the value of a new pixel.
Here is how bilinear interpolation works based on the image below:
The four nearest data points to P are Q11, Q12, Q21, and Q22.
Determine the distances between P and each of the four nearest points.
Assign weights to each of the four points based on their distance. Points closer to P will have higher weights.
Use the weights and the values of the four points to calculate the interpolated value at P. This is done by taking a weighted average of the values, which is calculated as P_interpolated = w1 * Q11 + w2 * Q12 + w3 * Q21 + w4 * Q22.
Imagine a plane passing through the four points Q11, Q12, Q21, and Q22. The interpolated value at P is the height of this plane at the point P. The weights determine how much each of the four points contributes to the interpolated value.
Homogeneous coordinates are a mathematical tool used to represent points and transformations in a projective space. This allows us to model perspective projections and other geometric transformations in a unified framework. Two types of geometric transformation include:
Affine Transformation: preserves parallelism and ratios of distances. They include translations, rotations, scaling, and shearing.
Perspective Transformation: can simulate the effect of viewing an image from a different perspective, creating a 3D-like effect. They are used in applications like image warping, panorama stitching, and camera calibration.
This exercise will have us scale images using five distinct variants of interpolation techniques, including bilinear interpolation and previously unmentioned methods such as:
Nearest Neighbor Interpolation: assigns the value of the nearest pixel in the original image to the new pixel. Simple but can introduce undesirable features (artifacts), especially for large scaling factors.
Resampling Using Pixel Area Relation: calculates the area of the new pixel in the original image. Assigns the weighted average of the pixels within this area to the new pixel. Provides better results than nearest neighbor for downsampling.
Bicubic Interpolation: uses a cubic polynomial to interpolate pixel values. Produces smoother results than bilinear interpolation, especially for large scaling factors.
Lanczos Resampling with Window Size of 4: uses a Lanczos window function to calculate weights for the surrounding pixels. Provides high-quality results, especially for large scaling factors.
The photo will be scaled up to 4 times its original pixel size, either for enlargement or shrinking.
The time.time() function in Python returns the current time as a floating-point number representing the number of seconds that have elapsed since the epoch. After it is run, running it again once will end the timekeeping function.
Even after being enlarged, the photos below look similar to one another from the human eye. This is likely due to the photo's already high quality since the start. Exploring each method's total run time, bicubic interpolation has the lowest of them all.
The resolution of time.time() is typically around 1 to 10 milliseconds, depending on the system. Any operation that runs faster than 1 millisecond cannot be recorded by said function.
As for the shrunken image quality, they each look appropriately pixelated. Quality is subjective, though it appears bilinear interpolation and resampling using pixel area relation retain most of the original photo's details human eyesight-wise.
This supplementary exercise covers the logics in flipping images and performing affine transformations on images with OpenCV.
Just like how colors in a color space are represented with their index numbers (in the order of RGB and 123), the flip function has its own trio of index numbers that represent different directions: vertical is represented by 0, horizontal by 1, and horizontal then vertical by -1.
As we learnt before, Matplotlib displays OpenCV images with BGR by default. Again, simply type the index number of the color you want to show at the rightmost item in your img list (e.g. img[2, 1, 0] = redder image) to change the dominant color.
The warpAffine function is guided by a 2*3 Numpy array with values that affect the following affine transformation factors. To briefly elaborate on each position:
Upper Left: scales the image across the x-axis.
Upper Middle: shears the image by the value inserted, moving the lowest line to the right and creating a trapezoid.
Upper Right: moves the image to the right of the x-axis.
Lower Left: shears the image by the value inserted, moving the rightmost line downwards and creating a trapezoid.
Lower Middle: scales the image across the y-axis.
Lower Right: moves the image downwards, according to the y-axis.
The warpAffine() function in OpenCV does not alter the inserted image directly. Instead, it creates a new image based on the specified transformation and returns it. This allows you to preserve the original image while applying various transformations.