We welcome the ascension of our artificial overlords!

Back

5th September 2024 - Machine Learning: Image Recognition

"Reality is often stranger than fiction," they say… I say, neutrally, that we can never solidly define what reality actually is.

Prior to the widespread adoption of deep learning, computers relied on traditional methods to view and read images that involved manual feature engineering and machine learning algorithms. These methods, while effective for certain tasks, were often limited by their reliance on human expertise and their inability to handle complex visual patterns.

How did computers identify?

Before deep learning became a trend, computers since the 1960s relied on traditional computer vision to discriminate objects and/or classes of objects. The method uses feature engineering to manually extract specific features (edges, corners, textures, etc.) from data, then quantifying these features into numerical representations that could be used by machine learning algorithms.

One kind of computer vision tool is the color histogram, which is used for analyzing and comparing images based on their color composition. For digital images, a color histogram represents the number of pixels which have colors in each of a fixed list of color ranges, that span the image's color space, the set of all possible colors.

The image shows a 'flattened' color histogram, which combines the histograms of the individual RGB channels into a single plot. This can be useful for understanding the overall color distribution of an image.

Plotting color histograms on Python

This exercise explores functions in OpenCV and Matplotlib that visualize the color channels within images into histograms.

To calculate the histogram of an image or a list of images, use cv2.calcHist() and set the following parameters:

images: an image or a list of images to be analyzed.
channels: a list specifying which channels of the images to consider. For example, [0] calculates the histogram for the first channel (usually grayscale), and [0, 1, 2] calculates the histogram for the three channels of an RGB image.
mask: an optional mask image that specifies which pixels to include in the histogram calculation.
histSize: a tuple specifying the number of bins for each channel. The default is [256], which means 256 bins for each channel.
ranges: A tuple specifying the range of pixel values for each channel. The default is [0, 256], indicating a range of 0 to 255 for each channel.

The shape of the histogram can provide clues about the content of an image, assess its quality, and used for various image processing tasks (thresholding, segmentation, equalization). Speaking of the histogram shape, here are the key observations outputted from the script above:

Peak: the histogram has a prominent peak around the intensity value of 100. This suggests that the image is dominated mostly by a color of gray.
Distribution: the overall shape of the histogram indicates that the image has a relatively high contrast, with a significant portion of pixels having either very dark or very bright values.
Range: the histogram extends from 0 to 255, which is the typical range of grayscale pixel values in 8-bit images.

The shape (256, 1) indicates that the histogram is a 1-dimensional array with 256 bins, which likely corresponds to the number of greyscale levels in the image.

The first 2 values extracted in the image below indicates that in this grayscale image, only 1 pixel has a value of 0, and 0 pixels have a value of 1. This suggests that the image may have a relatively high contrast, with many pixels close to the extremes of the grayscale range.

Grayscaling an image may reduce its dimensionality and complexity, giving machine learning models less workload, but the process leads to a deliberate loss in color and objects, features which could have been important for certain tasks.

An RGB image has 3 times the pixel channels of a grayscaled image – red, green, and blue The image below provides the following observations:

Peaks: the plot shows 3 distinct peaks, with the red channel has the highest peak while green and blue have lower peaks. These indicate that there are more pixels with red values than green or blue values in the image.
Distribution: the overall shape of the histogram shows that the image has a relatively balanced color distribution, with no dominant color. If a histogram channel has a narrow range, it might indicate a low-contrast image.

Training neural networks with histograms on Python

For this exercise, we need to extract data from CIFAR-10 and use it to build a histogram-based SVM classifier. Remember to convert the training and test sets into integers, otherwise they will be incompatible with machine learning models.

Next, extract histogram features from the image data into a list. They will be a suitable input for an SVM classifier, which can use them to learn a decision boundary that separates the classes. Then, convert said features to a numpy array for model readability.

The histogram above is a traditional model. Another variant of histogram we are building now is the histogram of oriented gradients (HOG). Rather than it being a 1D representation of image data, HOG is a 2D representation of image data. The former's x-axis represents pixel values, and its y-axis represents frequency of each pixel value; the latter's x-axis represents gradient orientation, and its y-axis represents magnitude of the gradient.

HOGs are preferred when you need to capture the spatial distribution of gradient orientations in an image, such as for object detection or image classification.

SVMs can be modelled with the following code methods:

setKernel(kernelType): sets the kernel type for the SVM model, such as linear, polynomial, radial basis function (RBF), or sigmoid.
setGamma(kernelParameter): sets how far the influence of a single training example reaches, with low values meaning far and high values meaning close.
setType(SVMType): sets the type of SVM algorithm to use. C_SVC is the most common type of SVM algorithm, used for multi-class classification problems.
setC(regularizationParameter): sets the regularization parameter C for the SVM model, which controls the trade-off between margin and misclassification error. A small C means a large margin but high misclassification error, while a large C means a small margin but low misclassification error.

Compared to the traditional histogram-based SVM, the HOG-based SVM produced a higher accuracy for both sets. However, the accuracy is still abysmally low.

Page updated

Google Sites

Report abuse