The way our Kiadam tool helps you generate a dataset for your object recognition or computer vision project is by pasting images of the object you want to detect onto a chosen background. But what happens when the object is not rectangular? The bounding box of the object will contain a lot of pixels that are not part of either the object or the background, thus potentially misleading the model.
Meta released their open source model Segment Anything that can "cut out" any object from its background with impressive accuracy. We'll show you how to take advantage of this feature to solve our problem, as well as when you need to use it.
Curious about Segment Anything? Check out their official page or this blog post
Here are some links to the different sections of this post
How to use Segment Anything or SAM
Link to the Colab Notebook
Good new, we put together our own Colab Notebook to allow you to do your own cut outs as easily as possible. Click on the link and follow along!
First, head out to Roboflow, label your images and download the zip to your computer with Yolov5 style annotations. Make sure to only have one object per image.
Next, open our Google Colab Notebook.
Click Run All and upload your zip when prompted in the first cell. The images will then be automatically downloaded to your computer!
Why use Segment Anything along with KIADAM
Check out another example of this technique in our previous blog post
Our data synthesis works by pasting images of the object you want to detect onto relevant backgrounds, combined with various data augmentation techniques to create new training images.
However, using a classic rectangular crop, you're left with part of the "original background" (here the gray tablecloth behind the beer bottle for example) in the to-be-pasted object.
This is additional information will only serve to mislead your neural network during training.
Using SAM allows you to cut out the unnecessary information and produce synthetic images that are closer to reality. We see however that in this case, the difference between the rectangular crop and the image extracted by SAM is slim. Let's see if this difference really matters in our first experiment.
First Experiment
Open the Colab Notebook here
For the sake of the experiment, we created a testing set with as few variations as possible (one background, always the same lighting and same object size) so that the only variable is the presence of absence of a cutout using SAM.
In this Notebook, we compare the results that YoloV5 obtains when trained on different datasets. The metric used will be the mean Average Precision or mAP
Here is an excerpt of the testing set.
We're training on the one hand with a dataset of cropped images pasted onto a background
And on the other hand with a dataset of the exact same set of images extracted with SAM and pasted onto the same backgrounds.
Here are our results
Training Set | mAP |
Without SAM | 0.969 |
With SAM | 0.966 |
We can see that the difference between using and not using SAM is quite low, negligible even. Why is that?
Here is an image generated by our tool, using the potential of SAM for perfect cutouts.
Here is an image generated with a rectangular crop. As you can see, they look quite similar. There are two main reasons for that.
How rectangular is the object
How dissimilar is the leftover part of the background
For a beer bottle, only the bottle neck prevents it from being a rectangle, and the leftover background is gray, quite similar to the actual background of the testing set
In this particular case, for an object as rectangular as a beer bottle and with the testing background being so close to the original background, using SAM is not necessary. Let us however explore an object for which using SAM is a game changer.
When Segment Anything is the most useful
Follow along in this Colab Notebook
We're looking for an object that is as far away from a rectangle as possible. Looking around your house, you probably have at least one such object : a pair of scissors.
In this cropped image, the scissors occupy less than half of the image compared to the background. For the sake of the experiment, the white background here is quite different from the yellow background used in the testing set.
We generated two training sets with our tool. Here is what the one with SAM looks like :
Which does look quite close to the testing set.
Here is the one with only rectangular crops and not using SAM:
We see that without SAM, a large part of the white background from the original image is left and is quite distracting, much more than in our first experiment with the beer bottles. But what are the actual results?
Training YOLOV5 on those two datasets and testing on the same testing set, we obtain the following.
Training Set | mAP |
Without SAM | 0.825 |
With SAM | 0.921 |
In this case, the use of Segment Anything gives us a massive performance increase!