Contributions to a more vigilant neighborhood. You like eyes in every waking corner?
In the realm of object detection, Non-Maximum Suppression (NMS) plays a vital role in refining these models' outputs. This post-processing technique addresses the common issue of multiple detections for the same object, which can arise due to the inherent nature of object detection algorithms, degrading precision and clarity of these outputs.
Non-maximum suppression (NMS) is a crucial post-processing technique in object detection that helps refine the output of object detection models. It addresses the issue of multiple detections for the same object, which can occur due to the nature of object detection algorithms.
Here is a breakdown of NMS' process:
Prediction Output: object detection model generates a set of bounding boxes, each associated with a confidence score indicating the likelihood of that box containing an object.
Confidence Thresholding: a confidence threshold is set. Bounding boxes with scores below this threshold are discarded, as they are considered less likely to represent actual objects.
Highest Confidence Selection: bounding box with highest confidence score is selected.
IoU Calculation: overlap between selected box and all other remaining boxes is calculated.
Overlap Threshold: an IoU threshold is defined. If IoU between selected box and another box exceeds this threshold, it indicates a significant overlap.
Suppression: bounding boxes with IoU values above threshold are suppressed. These boxes are likely redundant detections of the same object.
Iteration: repeat the process with next highest confidence box until all remaining boxes have been evaluated.
If you are wondering why the first step is repeated, this repetition is essential since:
Multiple Objects, Multiple Highest Confidence Boxes: in typical object detection scenarios, we will predict multiple objects within an image. Each object is likely to have its own bounding box with the highest confidence score associated with it.
Distinct Objects, Low IoU: IoU between bounding boxes corresponding to different objects is generally low. This is because they represent distinct entities in the image and, therefore, have minimal overlap.
Preventing Redundancy: purpose of NMS is to eliminate redundant detections. By repeatedly selecting highest confidence box and suppressing overlapping boxes, we ensure that only one bounding box per object is retained.
This exercise explores the formulaic architecture of a standard NMS. A fake scenario where a CNN has generated three bounding boxes that detect the presence of a human face with varied confidence. NMS is designed to get rid of overlapping boxes scanning for the same thing.
The first part of NMS does the following after receiving a list of bounding boxes (bounding_boxes), another list containing their confidence scores (confidence_score), and an IoU threshold (threshold):
Empty Bounding Box Check: if len(bounding_boxes) == 0 checks if input bounding_boxes list is empty. If it is, function immediately returns empty lists for picked_boxes and picked_score, indicating that no bounding boxes were found.
Data Preparation: boxes = np.array(bounding_boxes) converts the bounding_boxes list into a NumPy array for efficient calculations. Next, start_x,… = boxes[:, 0],… extracts x1, y1, x2, and y2 coordinates from each bounding box. Finally, score = np.array(confidence_score) converts confidence_score list into a NumPy array.
Initialization: picked_boxes = [] and picked_score = [] create two empty lists to store selected bounding boxes and their corresponding confidence scores respectively.
Area Calculation: areas = (end_x - start_x + 1) * (end_y - start_y + 1) calculates area of each bounding box based on its width and height.
Sorting by Confidence: order = np.argsort(score) sorts bounding boxes based on their confidence scores in descending order.
The second part is where bounding box comparison begins. It takes all variables above to run the following commands:
Iteration Loop: while order.size > 0 ensures the function runs itself again as long as there are remaining bounding boxes to consider.
Selecting Highest Confidence Box: index = order[-1] gets index of bounding box with highest confidence score, the last element in the sorted order array.
Appending to Selected Boxes: picked_boxes.append(bounding_boxes[index]) and picked_score.append(confidence_score[index]) add highest confidence box to picked_boxes list and its corresponding confidence score to picked_score list.
Calculating IoU and IoU Ratio: see notes below for further information.
Filtering Overlapping Boxes: left = np.where(ratio < threshold) gets indices of remaining boxes that have IoU ratio less than threshold, order = order[left] then updates order array to only include remaining boxes that do not overlap significantly with selected box.
Returning Results: return picked_boxes, picked_score returns list of selected bounding boxes and their corresponding confidence scores.
A common result of an object detection model without NMS is an image covered with multiple bounding boxes, most of them being duplicates that represent redundant detections and inaccuracies.
Additionally, these overlapping boxes might not even accurately represent the true extent or location of the object. The extra clutter of redundant detections also makes it more difficult for us to evaluate the model's performance.
At the end of the day, the outputs of object detection algorithms are often meant to capture visual data which could be periodically missed by human lenses, in real-time frame-by-frame.