Object detection is one of the most widely used algorithms in computer vision that assists in the classification and localization of target objects. While researching object detection, you will likely encounter terms such as, AP (Average Precision), mAP (Mean Average Precision), and IoU (Intersection over Union), all of which are object detection metrics that help build better-performing models. IoU is a common choice among state-of-the-art object detection algorithms, which we will focus on today.

In this article we will look through the following:

  • Understanding IoU
  • Where to get ground-truth data
  • IoU in practice
  • Alternatives to IoU
  • Closing remarks

Understanding IoU

Suppose, you have a ground-truth bounding box, like the one below:

ground-truth bounding box

And your algorithm predicts the bounding box in red:

ground-truth and prediction boxes

Is this a good outcome to you? Now, what Intersection over Union (IoU) does is that it calculates the overlap of the two bounding boxes divided by their union to provide an accuracy metric.

It may appear at first that IoU is an indicator of how tight the bounding box is, which, as much as we hate to break it to you, is not quite the truth. What the IoU shows instead is how tight the predicted bounding boxes are to the baseline, i.e., the ground truth. To compute the IoU, we will need the following values:

1) Intersection area I

2) Union area U

IoU

In an equation above A and B are ground truth and predicted bounding boxes, as shown below:

Intersection over Union (IoU)

Bottom line—pattern recognition is not as easy. The probability that your bounding box coordinates will match exactly the same is about nonexistent, but we’ll get back to that in a minute.

Where to get ground-truth data

Ideally, if you are at the model evaluation stage, you’ve got data collection covered. To train an object detection model you, first off, need a pre-labeled dataset, which, in turn, has to be divided into the following subsets:

  • Training set: the first batch of data fed into the model.
  • Testing set: used for evaluating the model accuracy.
  • Validation set (optional): used to tune hyperparameters.

The sets above are constituents of the actual data, annotated with boxes, i.e. (x,y coordinates of the object in an image).

Note: 0.5 IoU is typically considered a “good” score, while 1 is perfect in theory.

You can get ground-truth data in several ways:

1) Collect manually

2) Open-source datasets

3) Generate your own synthetic dataset

No matter where you get the ground-truth data or how carefully you label, it’s extremely unlikely to have the predicted output match the ground-truth bounding box coordinates. That is because the parameters, such as image pyramid scale, sliding window size, etc., exclude the possibility of a heavy overlap, which explains the theoretical applications of score 1.  

IoU score

IoU in practice

To train a custom object detector to spot the presence of a given object in images, we’ll use Python.

Sample #1:

def get_iou(bb1, bb2):
   
    assert bb1['x1'] < bb1['x2']
    assert bb1['y1'] < bb1['y2']
    assert bb2['x1'] < bb2['x2']
    assert bb2['y1'] < bb2['y2']

  
    x_right = max(bb1['x1'], bb2['x1'])
    y_top = max(bb1['y1'], bb2['y1'])
    x_left = min(bb1['x2'], bb2['x2'])
    y_bottom = min(bb1['y2'], bb2['y2'])

    if x_left < x_right or y_bottom < y_top:
        return 0.0


    intersection_area = (x_left - x_right) * (y_bottom - y_top)

 
    bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
    bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])

   
    iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
    assert iou >= 0.0
    assert iou <= 1.0
    return iou
    

Sample #2:

def calculate_iou(a, b):

area_of_intersection = area(np.logical_and(a, b))
area_of_union = area(np.logical_or(a, b))

return area_of_intersection / area_of_union 

Sample #3:

def calculate_iou_for_indices(ground_truth_indices, prediction_indices, ground_truth_size, prediction_size):

    intersection =
 np.count_nonzero(prediction_indices[ground_truth_indices])
    union = prediction_size + ground_truth_size - intersection

    if union == 0:
        iou = float('nan')
    else:
        iou = intersection / float(union)

    return {
        'intersection': intersection,
        'iou': iou,
        'false_positives': prediction_size - intersection,
        'false_negatives': ground_truth_size - intersection,
        'prediction_size': prediction_size,
        'ground_truth_size': ground_truth_size,
    }

    return area_of_intersection / area_of_union

Alternatives to IoU

IoU is not the only metric for measuring the accuracy of object detectors. Average Precision (AP) or mean Average Precision (mAP) are common alternatives, both of which are used to evaluate models such as Faster RCNN, Mask RCNN, and YOLO. AP is calculated for every single class, meaning the number of classes and AP values should be equal.

The mAP then is the average of AP values of all classes together:

To better comprehend AP and mAP and analyze them in practice, we should define and draw the precision and recall curve first, which is an entirely separate article: For the time being, keep in mind that the two above are analogous object detection metrics you’ll encounter more often.

Closing remarks

Any algorithm that outputs a bounding box can be evaluated using an IoU. We hope this article provided you with insights into IoU, as an object detector metric, by introducing calculation techniques, implementation, and alternative metrics. If you find this article helpful and would like to know more about various topics in computer vision and machine learning, leave your email below.

superannotate request demo

Recommended for you

Stay connected

Subscribe to receive new blog posts and latest discoveries in the industry from SuperAnnotate
Have any feedback or questions?
We’d love to hear it from you.
Contact us  >