In the world of object detection, one of the most important elements to achieve good detection metrics is the dataset. Mainly, it is important to have a dataset the most similarly possible to reality to allow for generalization, but also sufficiently diverse to avoid overfitting.
As of your dataset, it is crucial that there is some variety on the sizes of the objects present on the different images of your dataset. It affects directly on the performance on the testing set, and we sill see in this article how to achieve the best results by selecting the correct sizes for the objects in your KIADAM synthesized dataset.
Here are links to the different parts of this article
Our Experiment
The datasets we will be working with are synthesized scissors datasets. These datasets are simple enough to isolate the object size variable, and complex enough to get a main understanding of how object sizes affect the performance of a trained YoloV8 model. Next are some images generated to train the model.
See how to use Meta's Segment Anything model to efficiently crop your objects here
The models trained on these images will be tested in a real world dataset, made from photos of scissors on a table. The testing images look as follows.
To test whether the size of objects on the dataset influences the quality of the predictions, we will measure the Precision, Recall and mAP@50 of synthesized datasets generated with different ranges of object sizes.
Follow along with our Colab Notebook.
10-30% dataset
First of all, we generated a dataset from the KIADAM data synthetization tool. We used some basic transformations to augment the dataset and generated 300 synthetic images, with an object size ranging from 10% to 30% of the image. The generated dataset is available in our Google Drive.
See how to generate a dataset with the KIADAM tool here
The following table shows the metrics when a YoloV8 model is trained on that dataset.
Model | Yolov8 |
Precision | 0.864 |
Recall | 0.731 |
mAP@50 | 0.826 |
We can see that this synthetic dataset performs relatively well on the testing dataset provided. But let's analyze the prediction of our model on some images of the testing set.
Clearly we can see a trend: the model cannot recognize objects that are too big, and we can attribute that behavior to the lack of big objects on the training images. But if we analyze some of the images that our model predicts correctly, we can see that they have mostly smaller objects.
Knowing this, we can deduct that a good way to improve on object recognition performance will be to have generated images with object sizes that will be closer to those that the model will be prompted to recognize in it's production phase. We go further into this in the next experiment.
10-80% dataset
The next dataset that we generated with the KIADAM dataset synthetization tool is a dataset which has objects with sizes varying from 10% to 80% of the size of the image. The generated dataset is available in our Google Drive. We used these percentages as they are the closest to the sizes of the objects (scissors) on the testing set, so we expect to have a better performance using them. The results obtained from this newly generated dataset are the following.
Model | Yolov8 |
Precision | 0.96 |
Recall | 0.92 |
mAP@50 | 0.968 |
We can clearly see that the results improved greatly, and not only they are over 90% precision and recall, but also they constitute state of the art results, and were generated with little to no effort on dataset preparation and labeling. We can also check in the testing set that we are now able to recognize bigger objects.
There are a few prediction mistakes, but we can recognize almost all of the scissors, independently of their sizes in respect to the testing image.
See our curated best practices to achieve better performance on object detection with KIADAM generated datasets here
Conclusion
We started by training a model that performed relatively well, but failed to detect the bigger images on the picture. By changing the size of the objects to imitate those of our testing set, we improved the performance of the model, validating the hypothesis: using object sizes similar to those that the model will be asked to detect in its production phase improves its detection performance.
If in your production environment your object will always be the main focus of the image and close to the camera, select a size range from 70% to 90%. If your objects will be an element amongst other in the image and will be somewhat far away from the camera, you should select a size range from 10% to 30%. If you might encounter all of the above or are unsure, select a wide size range, such as from 10% to 80%, to ensure the best results on your dataset.