We welcome the ascension of our artificial overlords!

Back

24th September 2024 - OpenCV: Perspective Transformation and Kernels

The one thing that seems to never change is that change will happen.

Among OpenCV's augmentative features, perspective transformations and kernel filtering stand out as essential tools for a variety of applications. From correcting camera lens distortions to extracting meaningful features, these techniques provide a foundation for advanced image processing and computer vision tasks.

Prespective transformation in the transformation array

Perspective transformation is a technique used to simulate the effect of viewing an image from a different perspective. Briefly mentioned on 21st September, it is a type of geometric transformation that can be applied to images to introduce variation and improve model robustness.

Using the 3*3 homogenous coordinates transformation matrix below, OpenCV can perform perspective transformation if fed the following coordinates at the lowest row:

g and h: controls both shearing and skewing. Setting either to 0 or 1 will determine which direction the image will shear towards.
i: controls the scaling in the z-axis.

Compared to affine transformations, while they preserve collinearity (points on a line remain on a line), perspective transformations can distort lines, causing them to appear curved or converge to a vanishing point.

Viewing the transformation matrices for both transformations, the last row in affine transformations is typically represented as [0, 0, 1], which does not introduce perspective effects. In perspective transformations, different values (1 in g and h's case) can be assigned to g, h, and i.

Perspective transforming on Python

This exercise explores using the getPerspectiveTransform() function to simulate different viewpoints of training set images.

The function demands eight pairs of your image's edge coordinates, the first four containing the original's edges and the last four containing the target warped image.

Having done all the above, build a perspective transformation matrix with them all in two quartets (see below) and slot it into your warpPerspective() function.

This function's parameters are much more direct than affine transformation shearing.

Kernels in data augmentation

If you return to the topic of kernels from previous sections about CNNs, you can recall that they are smaller matrices that are applied to images to extract specific features in machine learning. In OpenCV space, they are primarily used for general image processing tasks like blurring, sharpening, edge detection, and noise reduction.

Besides convolution, kernels can also perform cross-correlation on images. Imagine you are searching for a specific word in a book by sliding a piece of paper with said word written on it along the pages, comparing the letters one by one. If the word (kernel) matches a region in the text (image), you get a high output value.

Compared to convolution, the kernel is applied directly to the image without flipping. This results in a non-zero output when the white pixels in the image and the kernel overlap.

Convoluting (filtering) OpenCV images

OpenCV provides a rich set of functions for convolution, the technique of converting visual images into math for our models to read, allowing us to perform image processing tasks like the ones earlier mentioned. Several filter-oriented operations you can run on OpenCV images include:

Gaussian Blur: an image smoothing technique that applies a Gaussian kernel to an image. Effectively blurs the image, reducing noise and smoothing edges.

- Gaussian Kernel: a 2D function that has a bell-shaped curve, with the center pixel having the highest weight and the surrounding pixels having decreasing weights.

Sobel Filter: an edge detection technique that convolves the image with two Sobel kernels. The gradient magnitude and orientation are calculated to identify edges. Highlights edges in the image, making them more prominent.

- Sobel Kernel: 3*3 matrices used in edge detection to approximate the first derivatives (change in intensity per movement) of an image in the x and y directions. Designed to be sensitive to edges that are oriented in specific directions.

Non-Local Means: an image denoising technique that leverages the self-similarity of images to remove noise while preserving image details. Considers the entire image to find similar patches and use them to denoise the current pixel.

Canny Edge Detection: an algorithm used to detect edges in images using both Gaussian and Sobel filters. Only retains pixels with maximum gradient magnitude and gradient direction, thinning the edges. Classifies edges as strong/weak via high/low threshold value, then removes weak ones not connected to strong edges.

Filters can be classified as either a low- or high-pass filter. They are used to selectively filter signals based on their frequency content. In images, each filter corresponds to different features:

Low-Pass Filters: often used to smooth images, reducing noise and blurring edges. Includes the Gaussian blur.
High-Pass Filters: often used to enhance edges and highlight fine details. Includes the Sobel filter.

Comparing different filters on Python

This exercise aims to run various filters and compare their outputs of blurred noisy images or edge maps.

Compared to the recently introduced advanced filters, the average filter is a simple image processing technique that replaces each pixel in an image with the average value of its neighboring pixels. It is the cheapest option but obviously introduces more artifacts that, say, a Gaussian filtered image.

Speaking of the Gaussian filter, be mindful that each one you make has an approximate standard deviation of 1. Applying the formula standard_deviation = sqrt(sum((array[i, j] - mean)**2) / (rows * cols)) to the example below, we get a standard deviation of 1.33.

The non-local means algorithm can be accessed with the fastNlMeansDenoisingColored() function, which demands the parameters:

src: the input color image.
dst: the output denoised image.
h: the parameter controlling the degree of filtering. A higher value results in stronger denoising but may also introduce blurring.
hForColor: the parameter controlling the degree of filtering for each color channel.
templateWindowSize: the size of the search window used to find similar patches.
searchWindowSize: the size of the search window used to find similar patches within the image.

The median filter is an image processing technique that replaces each pixel in an image with the median value of its neighboring pixels, similar to how the average filter replaces each pixel with the average of its neighbors.

Out of the four filters below, it appears that the median filter output contains the least noise (white dots) whilst retaining most of the existing background elements.

Onto the generation of edge maps, we start off with the Sobel filter. As stated above, this type of filter calculates the gradient of the image in both x and y directions by taking in a horizontal and vertical (y-direction) Sobel kernel, then combines these gradients to determine the overall edge strength and orientation (angle).

To more completely describe how the Gaussian and Sobel filters are used in the Canny edge detector, as well as how the function creates an output:

The image is first smoothed using a Gaussian filter to reduce noise.
The gradients in the x and y directions then are calculated using the Sobel operator or similar edge detectors.
Next, only the pixels with the maximum gradient magnitude along the gradient direction are retained, thinning the edges.
Two thresholds – a high and low threshold – are applied. Pixels with a gradient magnitude above the high threshold are considered strong edges, while pixels with a gradient magnitude between the high and low thresholds are considered weak edges.
Finally, weak edges are retained only if they are connected to strong edges.

At the end of the day, there is no exact right or wrong when it comes to the specific formulas for filtering images. The significant part of data augmentation is processing the food you want to feed your machine learning model with, the food being detail-highlighted images and edge maps.

Page updated

Google Sites

Report abuse