Object detection is one of the most widely used algorithms in computer vision that assists in the classification and localization of target objects. While researching object detection, you will likely encounter terms such as, AP (Average Precision), mAP (Mean Average Precision), and IoU (Intersection over Union), all of which are object detection metrics that help build better-performing models. IoU is a common choice among state-of-the-art object detection algorithms, which we will focus on today.
In this article we will look through the following:
- Understanding IoU
- Where to get ground-truth data
- IoU in practice
- Alternatives to IoU
- Closing remarks
Understanding IoU
Suppose, you have a ground-truth bounding box, like the one below:

And your algorithm predicts the bounding box in red:

Is this a good outcome to you? Now, what Intersection over Union (IoU) does is that it calculates the overlap of the two bounding boxes divided by their union to provide an accuracy metric.
It may appear at first that IoU is an indicator of how tight the bounding box is, which, as much as we hate to break it to you, is not quite the truth. What the IoU shows instead is how tight the predicted bounding boxes are to the baseline, i.e., the ground truth. To compute the IoU, we will need the following values:
1) Intersection area I
2) Union area U

In an equation above A and B are ground truth and predicted bounding boxes, as shown below:

Bottom line—pattern recognition is not as easy. The probability that your bounding box coordinates will match exactly the same is about nonexistent, but we’ll get back to that in a minute.
Where to get ground-truth data
Ideally, if you are at the model evaluation stage, you’ve got data collection covered. To train an object detection model you, first off, need a pre-labeled dataset, which, in turn, has to be divided into the following subsets:
- Training set: the first batch of data fed into the model.
- Testing set: used for evaluating the model accuracy.
- Validation set (optional): used to tune hyperparameters.
The sets above are constituents of the actual data, annotated with boxes, i.e. (x,y coordinates of the object in an image).
Note: 0.5 IoU is typically considered a “good” score, while 1 is perfect in theory.
You can get ground-truth data in several ways:
1) Collect manually
2) Open-source datasets
3) Generate your own synthetic dataset
No matter where you get the ground-truth data or how carefully you label, it’s extremely unlikely to have the predicted output match the ground-truth bounding box coordinates. That is because the parameters, such as image pyramid scale, sliding window size, etc., exclude the possibility of a heavy overlap, which explains the theoretical applications of score 1.

IoU in practice
To train a custom object detector to spot the presence of a given object in images, we’ll use Python.
Sample #1:
Sample #2:
Sample #3:
Alternatives to IoU
IoU is not the only metric for measuring the accuracy of object detectors. Average Precision (AP) or mean Average Precision (mAP) are common alternatives, both of which are used to evaluate models such as Faster RCNN, Mask RCNN, and YOLO. AP is calculated for every single class, meaning the number of classes and AP values should be equal.
The mAP then is the average of AP values of all classes together:

To better comprehend AP and mAP and analyze them in practice, we should define and draw the precision and recall curve first, which is an entirely separate article: For the time being, keep in mind that the two above are analogous object detection metrics you’ll encounter more often.
Closing remarks
Any algorithm that outputs a bounding box can be evaluated using an IoU. We hope this article provided you with insights into IoU, as an object detector metric, by introducing calculation techniques, implementation, and alternative metrics. If you find this article helpful and would like to know more about various topics in computer vision and machine learning, leave your email below.
