Guide to Deploying Meta's 'Segment Anything' Model on Salad's Shared GPUs and save money

julian garcia

Jan 2, 20247 min read

Do you require an affordable online automatic segmentation service? Salad.com offers a budget-friendly solution with access to cost-effective GPUs. In this post, we'll guide you through deploying a Docker container using FastAPI, which includes setting up an endpoint to interact with Meta's Segment Anything model. We will conclude by deploying the container to Azure Container Registry and push the image to Salad.com.

To effectively follow this guide, you must first install the Docker engine. For Windows users, we advise using Windows Subsystem for Linux (WSL) to prevent compatibility issues. Please refer to the official Docker installation guide (official docker installation guide ) and select your current platform for the installation process. After installing Docker, you will need to set up an IPv6 network. Consult this docker guide for this purpose. Be sure to follow just through the 4th step of Creating an IPv6 network and follow step 4 using docker create instead of docker compose.but ensure you only complete up to the 4th step of 'Creating an IPv6 network'.

We have all the code available on Github, if you want to skip the code creation part.

Preparing the Docker container

To setup the container, we will need to list the requirements for out FastAPI application, create a Dockerfile that sets the instructions to build the container, and create a folder named app/ to store the code on, with a file called __init__.py inside. The folder structure will have to match this one:

SAMFastAPI/
- Dockerfile
- requirements.txt
- app/
     - __init__.py
     - ...

Let's display the contents of the requirements.txt and give a small explanation on their purpose:

fastapi[all]>=0.105.0
pydantic>=1.8.0
uvicorn[standard]>=0.15.0
segment-anything @ git+https://github.com/facebookresearch/segment-anything.git
torch>=2.1.1
torchvision>=0.16.1
opencv-python>=4.8.0

FastAPI and pydantic are necessary for the web framework to work. Uvicorn lets us build a production web server. Segment anything, torch, torchvision and opencv are necessary to process the input image and to use the model, in this case, Segment Anything. Now, let's do the same thing over the Dockerfile.

FROM python:3.9.18

WORKDIR /code

# Download SAM Model

RUN mkdir /code/sam_images
RUN curl -o /code/sam_images/sam_vit_l_0b3195.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth

# Install the requirements and libraries

COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
RUN apt update -y
RUN apt install libgl1-mesa-glx -y

# Copy the app code

COPY ./app /code/app

# Startup command

CMD ["uvicorn", "app.main:app", "--host", "::", "--port", "80"]

We start from the official python 3.9 image. We first create a folder to store the SAM weights, and download them there. Then, we install the previously stated requirements, and we install drivers for the gpu. Next, we copy the code to the folder in the container, and we finally start the uvicorn server. This concludes the requirements for the docker container. We will go now through the code.

FastAPI startup code

First start by creating a file in the app/ folder called main.py, with the following code:

from fastapi import FastAPI

app = FastAPI()

@app.get("/healthcheck")
def check_working():
    return {"online": True}

This is the startup code for the FastAPI application. You can check if it works by building and running the docker container. It may take a while since it will download the requirements and the SAM model weights, but you will only have to download it once (since Docker adds the already completed steps to the cache). To test the container, you will neet to build the docker image (this step may take a while):

docker build -t sam_service .

And then, to run the image on the container:

docker run --rm --network ip6net -p 80:80 sam_service

Remember to have configured the IPv6 network as mentioned earlier. And so, you will have your docker container running with the FastAPI healthcheck! You can check if it's working by running a GET request to the url :::80/healthcheck with any http tool such as postman. You should get the following response if all is working correctly:

{
	"online": true
}

Adding the model querying endpoint

The next step we will take will be to load the models in the fastAPI state. For that purpose, we will add a lifespan to the application, so that the models are loaded when the application starts and unloaded when it ends. Change the code in main.py to include these commands before the creation of the app:

from contextlib import asynccontextmanager
from segment_anything import sam_model_registry
import torch
@asynccontextmanager
async def lifespan(app: FastAPI):

    app.state.ml_models = {}

    if torch.cuda.is_available():
       device = "cuda"
    else:
       device = "cpu"

    # Load the SAM model
    sam = sam_model_registry["vit_l"](checkpoint="./sam_images/sam_vit_l_0b3195.pth")
    
    sam.to(device=device)

    app.state.ml_models["sam"] = sam
    yield

    # Clean up the ML models and release the resources
    app.state.ml_models.clear()

And change the app creation code to include the lifespan:

app = FastAPI(lifespan=lifespan)

Now, let's create the pydantic schema for the endpoint. For that matter, create a file in the app/ folder called schemas.py. In that file, enter this contents:

from pydantic import BaseModel
from typing import List, Tuple, Optional

class SegmentBody(BaseModel):
    image: str
    box: Optional[Tuple[int, int, int, int]] = None
    input_points: Optional[List[Tuple[int, int]]] = None
    input_labels: Optional[List[Tuple[int, int]]] = None
    multimask_output: Optional[bool] = None

The idea behind this class is to type the incoming json and serialize it into the class, all of this while validating the correct data types. We will ask for the image encoded in base64 format, the boxes, input points and input labels for each point.

Then, let's create the inference endpoint. Create a new folder inside the app folder called routers, create a new file called __init__.py and another called inference.py there, having these contents:

import base64
from fastapi import APIRouter, Request
from base64 import b64decode, b64encode
import json
from segment_anything import SamPredictor
import numpy as np
import io
import cv2
from ..schemas import SegmentBody

router = APIRouter()

@router.post("/segment")
def segment_image(request: Request, body: SegmentBody):
    image = b64decode(body.image.encode())

    file_bytes = np.fromstring(image, np.uint8)    
    image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)

    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    try:
        predictor = SamPredictor(request.app.state.ml_models["sam"])
        predictor.set_image(image)

        input_box = np.array(body.box) if body.box else None
        input_points = np.array(body.input_points) if body.input_points else None
        input_labels = np.array(body.input_labels) if body.input_labels else None
        multimask_output=bool(body.multimask_output)

        masks, _, _ = predictor.predict(
            box=input_box,
            point_coords=input_points,
            point_labelsinput_labels,
            multimask_output=multimask_output
        )

    except Exception as ex:
        return {
            "detail": str(ex)
        }

    single_mask = np.repeat(masks[0][:, :, np.newaxis], 3, axis=2)

    tobyte = lambda t: 255 if t else 0
    vfunc = np.vectorize(tobyte)

    mask_image_vect = vfunc(single_mask)
    
    is_success, buffer = cv2.imencode(".png", mask_image_vect)
    return {
        "encoded_binary_mask": b64encode(buffer).decode()
    }

The previous code loads the bytes64-encoded image into an array, feeds it into the model and returns a bytes64-encoded mask as a result. The endpoint is complete! The last step is to add the router in main.py. Add in the beggining of that file the import of the router:

from app.routers import inference

And after the definition of the app load the router:

app = FastAPI(lifespan=lifespan)

app.include_router(inference.router)

This is the final main.py file:

from fastapi import FastAPI
from contextlib import asynccontextmanager
from segment_anything import sam_model_registry
import torch
@asynccontextmanager
async def lifespan(app: FastAPI):

    app.state.ml_models = {}

    if torch.cuda.is_available():
       device = "cuda"
    else:
       device = "cpu"

    # Load the SAM model
    sam = sam_model_registry["vit_l"](checkpoint="./sam_images/sam_vit_l_0b3195.pth")
    
    sam.to(device=device)

    app.state.ml_models["sam"] = sam
    yield

    # Clean up the ML models and release the resources
    app.state.ml_models.clear()

app = FastAPI(lifespan=lifespan)

app.include_router(inference.router)

@app.get("/healthcheck")
def check_working():
    return {"online": True}

And the project structure must be as follows:

SAMFastAPI/
- Dockerfile
- requirements.txt
- app/
     - __init__.py
     - main.py
     - schemas.py
     - routers/
          - __init__.py
          - inference.py

This finalizes the first part of the tutorial: creating the docker image. To test wether the image is working properly, repeat the build and run steps before. Now you can test the healthcheck endpoint on :::80/healthcheck, but also the inference endpoint on :::80/segment. To test the inference endpoint, we will try to segment this image:

Create and run the following program in a new file called apitester.py (install the missing requirements):

from base64 import b64decode, b64encode
import json
import requests
import gdown

gdown.download("https://drive.google.com/uc?id=1VIiMg7_AEBIW8gJmG5kOLz8eOdEGLfhP")

with open("can.jpg", 'rb') as file:
    image = file.read()

body = {
    "image": b64encode(image).decode(),
    "box": [883, 749, 2000, 3100]
}

URL = "http://[::]:80"

r = requests.post(f"{URL}/segment", json=body, headers={"Content-Type": "application/json"})


res = r.json()

print(res)

f = b64decode(res["encoded_binary_mask"])

with open("response.png", 'wb') as file:
    file.write(f)

If you are not using a GPU on your computer, this step should take a while (a minute, perhaps), but you should end up having a new image with the pixel-by-pixel segmentation of the image to segment. The segmentation should look like this:

Deploying the docker image to Salad

The second part of the tutorial focuses on step-by-step deployment of the image to Salad. We will initially upload the image to a container registry, followed by deploying it to Salad from this registry

Firstly, upload the image to your preferred container registry. Although this tutorial uses Azure Container Registry, numerous other options are available. To upload the image to the container registry, you need to build the image, tag it, and then push it as follows:

docker build -t sam_service .
docker tag sam_service <your-container-registry-url>/api/segment_anything
docker push <your-container-registry-url>/api/segment_anything

Now, navigate to salad.com and click on Deploy on Salad.

If you have not created an account yet, create it. Afterwards, create a container group in Salad. Name it segment-anything.

On Image Source, click Edit and configure the endpoint to point to the registry you are using.

In Replica Count, select 1, and in vCPUs, select 1 too.

In memory, select 4gb.

In GPU, select GTX 1060 (6 GB).

Finally, in Optional Settings, find Networking press Edit. Enable Networking and enter port 80 and select Yes on Use Authentication.

The container is configured! Now click Deploy. The image will take a while to deploy, and then you will be presented with the following screen.

And you're ready to use Segment Anything on the cloud! Just use the available url (displayed in the Access Domain Name field) to use the endpoints we configured earlier! Remember to use the Salad-Api-Key header to authenticate the request, as mentioned in the salad docs.

Conclusion

Today, we developed a Docker image from scratch to interact with the Segment Anything model and deployed it on Salad.com, a cost-effective platform for hosting GPU-intensive models. If you have any questions about the process, don't hesitate to refer to our Github repository, which contains the Docker image code.

This solution is used inside Data Augmentation Studio for image segmentation tasks on kiadam.com. You can see here a snippet on using the tool to segment a can inside the app. If you want to learn more about using kiadam to segment images here.

Thanks for reading!

)