A few weeks ago, we released an article about using different backgrounds to improve the metrics on object detection of a dataset consisting of cans in different perspectives, lightings, and photo qualities, among other things. Although adding more and more diverse backgrounds certainly increased the performance of the prediction, the results were still not production-ready, having a mAP@50 capped at 0.549 on the testing set. You can read the article here.
The main goal of this article is to use the different settings available in the KIADAM tool to further increase the prediction metrics and make it production ready. We will start at a little lower baseline that the one encountered at the article mentioned above, to gradually increase the performance. We managed to have a mAP@50 of at 0.775 on the testing dataset, using a Yolov8 model trained on the generated datasets.
Our Experiment
The experiment we will we working on today consists of using a KIADAM synthetically generated dataset to recognize cans of various brands in an appartment. The testing dataset we are working with is not an easy dataset, as it includes many lighting changes, some images are very blurred, and the cans are sometimes upside down. Here are some images from the testing dataset we are measuring the performance on.
Learn how to generate your own synthetically generated datasets here
The training will be performed with synthetized can datasets, and we expect to be able to recognize the cans and the brands with fairly good accuracy.
Follow along with our Colab Notebook
Measuring the first baseline
The first thing we will do will be to generate a first synthetized dataset and have some metrics to use as a baseline. To do so, we will use some cropped cans (segmented and cropped with the help of Meta's Segment Anything) and random backgrounds collected from unsplash to generate the dataset. As transformations, we will choose the following:
Each transformation will have a 10% occurence, and the ranges will be the default offered by the tool. After generating, these are the images we got.
So, we will start by training our first Yolov8 model with this dataset. After training, these are the results that we have on the testing dataset.
Model | Yolov8 |
Precision | 0.652 |
Recall | 0.419 |
mAP@50 | 0.491 |
mAP@50-95 | 0.329 |
We finally have a first baseline! Sadly, the results are pretty bad. The recall is lower than 0.5 and the mAP is very poor. Let's analyze the predictions to check where we went wrong:
The main problem of the dataset is that it cannot recognize well the cans that are too close or too far to the camera. It's expected, as the synthetized datasets did not have any big or small cans, all the cans are medium-sized. This is the first thing that we will correct.
Fixing the image size percentages
Learn about how image sizes affect the performance here
The first thing that we will fix, as said earlier, is to change the image percentages to include bigger and smaller cans. We changed the range from 30-70% of the image to 10-90% of the image. This should solve the errors in the recognition of the bigger and smaller images in the testing set. But let's test it and see the results.
Model | Yolov8 |
Precision | 0.814 |
Recall | 0.455 |
mAP@50 | 0.577 |
mAP@50-95 | 0.406 |
We did indeed increase the metrics, specially the precision of the recognitions. And we have surpassed the results in the backgrounds article! We also increased the recall a little bit as well as the mAP. But did we solve the problem of recognizing the different sizes of the images?
We successfully solved our first prediction problem: detecting bigger and smaller-sized cans. Next, we will address another problem that is latent in the predictions: the blurred images are not detected well, nor do the rotated images. We address this problem by augmenting the percentage of images affected by the transformations from 10% to 50% and will see how it affects the prediction metrics.
Varying the percent of images affected by the transformations
The next step will be to have more images affected by the transformations we used. We will increase, as said earlier, the percent of images affected from 10% to 50%, and check if we can better recognize rotated and blurred cans. Let's check on the results we got:
Model | Yolov8 |
Precision | 0.798 |
Recall | 0.57 |
mAP@50 | 0.646 |
mAP@50-95 | 0.456 |
We finally surpassed the 50% barrier on the recall! The metrics look much better than what we had before. We are slowly but surely perfecting our dataset to be able to recognize well our selected cans. Let's see how the predictions go to check where are the prediction errors now.
We currently can see that we have some trouble recognizing cans that are upside down, and cans that are too blurred. We can try to fix this by adding more range to the blur and rotate transformations, which we will address next.
Widening the transformation ranges
The following changes we will make to the dataset will be to widen the rotation ranges to -145-145 degrees and also the blur kernel sizes from 3-7 to 3-11 pixels. We will also add a blur object transformation, so that we can better detect the blurred cans, that we still have problems detecting, and add a slight color distortion and noise to the backgrounds to force some variety over repeated backgrounds. The final dataset looks as follows:
Let's see how we do in the testing dataset.
Model | Yolov8 |
Precision | 0.803 |
Recall | 0.651 |
mAP@50 | 0.722 |
mAP@50-95 | 0.495 |
These values clearly improve from the ones of the previous experiment, and we have a very high mAP@50. Next, we will make our final adjustments to improve the precision and recall in the testing set.
Changing the backgrounds
All this time, we have been working with background images taken from unsplash. These background images are very diverse, but don't seem to help with the prediction of the testing set as much. So we changed the background images to images more similar to the backgrounds of the testing images, to see if that improves the predictions. These are some of the backgrounds we chose.
Learn how backgrounds influence the performance of the trained model here
Let's see if our performance improves in the testing set.
Model | Yolov8 |
Precision | 0.805 |
Recall | 0.704 |
mAP@50 | 0.775 |
mAP@50-95 | 0.542 |
The performance augmented greatly again! And only changing the background images for some closer to the ones that will be detected. This makes evident that the closer the training background images are to the testing background images, the better the predictions will be. Finally, here are some of the predictions made by this model.
Conclusion
The process of augmenting the performance of a model using synthetized datasets is an iterative process, on which we can step by step tweak the transformations to increase the capabilities on the model and diminish the error. Changing the transformations, its value ranges and the backgrounds, we managed to increase the mAP@50 from 0.549 to 0.775 on the testing dataset, a great leap considering the difficulty of the testing images.
In a wider perspective, the transformations chosen to augment the objects, backgrounds or the whole image must have a similar distribution to the images that the trained model will be used on. And on doubt, we must choose a wider range for the transformations, along with a percentage of images affected close to 50%.