The cutout transformation is a somewhat counterintuitive transformation. In its original form, it consists on adding a black rectangle in the image, hiding part of it. But this transformation is a very useful transformation. In this article, we analyze the impact of the different types of cutout transformations on a whiteboard markers dataset, and we manage to augment the mAP@50 from 0.694 to 0.784 just by adding the right type of cutouts. We also analyze the four different cutouts available in the KIADAM tool: black, color, noise and blur (gaussian) cutouts, from which the best one for the current task is the blur cutout.
Our experiment
We will be trying to detect whiteboard markers on an office. The testing dataset will therefore consist of several whiteboard markers, which can be on the tables, on the floor or other office-related places. Take a glance at some of the images on the testing dataset:
To test the efficiency of the different cutouts, we will be generating six datasets to train a Yolov8 model and later test the performance of it. The base images of the dataset will be segmented and cropped images of whiteboard markers as objects, and unsplash backgrounds as background images.
Learn how to use Meta's Segment Anything to remove the background of your objects here
The six datasets generated will have, respectively, no cutout (control), each of the cutout types separately (black, color, noise, blur) and all the cutout types at once. Let's begin by having the control metrics.
When evaluating the model, we will be using two different datasets: the first one is the validation dataset, which consists on images with the same transformations as the training dataset, and it will show us how well the model has adapted to the data similar to the training data. When we talk about validation mAP, we are talking about this dataset, which refers to how well the model learnt the synthetic data. Its importance resides on that the worst the model performs, the worst the baseline metrics on the real world data will be. The second one is the testing dataset, that we showed earlier and will consist of real photos, to test the efficiency of the trained model on a real setting. Its importance is clear: it will reflect how well the model will predict the real world images.
Follow along with our Colab Notebook
No cutout
We first generate a dataset with markers of all colors, with some base transformations, but no cutout at all. After training the model for 100 epochs and evaluating on the testing dataset, we obtain these metrics.
Model | Yolov8 |
Precision | 0.733 |
Recall | 0.659 |
mAP@50 | 0.694 |
We can already see that we have pretty good metrics, although the dataset's validation mAP@50 is only of 0.895 (it could be higher with some more training epochs). We will train the next models on the same amount of epochs to compare the cutout types in equal settings. Let's see some predictions, to compare with the later cutouts.
The biggest problem on this model is the prediction of non-marker items as markers. Most of the time, the bounding box is well placed around the objects, and some of the times the label is incorrect.
Black cutouts
The next generated dataset is the one where we add the black cutouts to hide parts of the objects. Take a look at an image taken from that generated dataset:
We can clearly see that there are black rectangles hiding part of the objects in the image. These black rectangles are the cutouts. let's see how we do when the model is evaluated on the testing dataset.
Model | Yolov8 |
Precision | 0.644 |
Recall | 0.753 |
mAP@50 | 0.715 |
We can see that we have a higher recall and a lower precision than the control dataset, but a higher mAP@50 overall. The validation mAP@50 is very similar to the last experiment, having a value of 0.897. Let's analyze some predictions on the testing dataset:
A strange effect of the use of the black cutouts, is that the predictions are now more accurate on predicting the colors of the markers, even if sometimes the model wrongly puts a bounding box over other things that are not markers. Let's analyze next the effect of the colored cutouts on the training dataset.
Color
As introduced in the last section, we then trained a model over a generated dataset changing the black cutouts by some colored cutouts. After evaluating on the testing dataset, these are the obtained results:
Model | Yolov8 |
Precision | 0.66 |
Recall | 0.589 |
mAP@50 | 0.641 |
All metrics are lower than the control metrics. Again, the validation mAP@50 is not too far from the ones obtained in the previous experiments, evaluating at 0.874. Let's now analyze the predictions to see if we find some patterns:
Using the color cutouts, we can see that the boxes are very tightly wrapped over the markers, but an important amount of the time the color assigned to the marker is wrong. So we can say based on this experiment that the color cutouts augment the precision of the boxes, but lower the precision of the labels, thus lowering the metrics that yolo calculates.
Noise
The next type of cutout we analyze is the noise cutouts. This type of cutouts simply add, instead of a black-colored rectangle, a rectangle filled with random-colored pixels. Let's see an image of the generated dataset:
Note the random pixels-filled boxes that appear in the image. Let's see the results after evaluating the model on the testing dataset:
Model | Yolov8 |
Precision | 0.747 |
Recall | 0.648 |
mAP@50 | 0.72 |
These metrics are very similar to the metrics of the control dataset. Let's analyze some of the predictions to search for an effect of this type of cutout on the detections:
This type of cutout, as the last experiment, also has very tight bounding boxes on the different markers, but the boxes are more often predicting the correct color for the cutouts, and so augmenting the metrics on the detections. Another difference with the colored cutouts model is that there are less predictions where there is no marker. The most common confusion is to detect other pens or pen-shaped parts of the photo. Let's go to the model trained with the last type of cutout, the blur cutout.
Gaussian
We next generated a dataset on the same setting as the previous ones, but changing the cutouts to be just a blurred patch on the image. Let's see a representative of the generated images:
We can see as some patches of the image are just blurred. The metrics after evaluating the model on the testing dataset are as follows:
Model | Yolov8 |
Precision | 0.737 |
Recall | 0.765 |
mAP@50 | 0.784 |
This type of cutout has the best metrics over all the different types of cutouts. The mAP@50 even is the best of all the article. This is contrasting with the value of the validation mAP@50, that is lower than all the previous models, evaluating at 0.819, fact that suggests that adding more epochs could improve even further the recognitions of the model. Let's analyze some of the predicted images using this cutout type:
Using this cutout type, we have more boxes in parts of the image that contain no markers, but the labels are almost always correctly placed over the markers that are correctly enclosed in a box. Finally, let's analyze the effect of using all the cutout types at the same time on the generated datasets.
All types of cutout
As the last experiment of the article, we will be using all the cutout types on the generation of the images. Each cutout type will have the same probability to appear on the image. Let's see the results of this experiment:
Model | Yolov8 |
Precision | 0.548 |
Recall | 0.707 |
mAP@50 | 0.631 |
This model has a lower precision, but a higher recall than the control. The mAP@50 also is a little lower than the control. Let's analyze the predictions:
We can see that these predictions correctly predict some of the objects, but also predict many times that there is an object where there is no one, and also some of the objects are predicted with a wrong label. There does not appear to be a common pattern to the predictions of these images.
Conclusion
We trained and tested Yolov8 models using different images containing cutouts. As the best type of cutout (at least for this task), we choose the gaussian cutout, also known as blur cutout. This type of cutout not only has the best metrics, but also has the less times predictions where there is no object, and has the most times the boxes labeled correctly.
As a recommendation for using cutouts, use:
Black cutouts when the importance of having a correct label is greater than the importance of having tight bounding boxes. Use this if you need to improve the prediction of the labels put over the bounding boxes.
Color cutouts when the importance of having a tight bounding box is greater than the importance of having the correct labels. Use this if you need to improve the prediction of the bounding box placement.
Noise cutouts when you need to have tighter bounding boxes over the already placed labels, without affecting much the prediction metrics.
Gaussian cutouts when you need to detect more precisely all of the objects on the image, and assign the correct labels to those objects.