Mask R-CNN: Instance Segmentation

Duration: 7 min

This module delves into Mask R-CNN, a state-of-the-art model for instance segmentation. It extends Faster R-CNN by adding a branch for predicting segmentation masks on each ROI (Region of Interest). Understanding Mask R-CNN is crucial for applications requiring precise object delineation in images, such as autonomous driving, medical imaging, and augmented reality.

Understanding Mask R-CNN Architecture

Mask R-CNN builds upon Faster R-CNN by adding a parallel branch for predicting segmentation masks. It consists of a backbone network (e.g., ResNet) for feature extraction, a Region Proposal Network (RPN) for generating ROIs, and two parallel branches: one for classification and bounding box regression, and another for mask prediction. Each ROI is processed to predict a class-specific binary mask.

import torch
import torchvision

# Load a pre-trained Mask R-CNN model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Define a sample input image
input_image = torch.rand(1, 3, 800, 1000)  # 1 image, 3 channels, size 800x1000

# Perform inference
with torch.no_grad():
    predictions = model(input_image)

# Print the predictions
print(predictions)

Try it in Google Colab:

{'boxes': tensor([[257.6953, 102.0664, 760.8594, 746.5547]]), 'labels': tensor([3]),'scores': tensor([0.9966]),'masks': tensor([[[[False, False,..., False],
         [False, False,..., False],
        ...,
         [False, False,..., False],
         [False, False,..., False]]]])}

Implementing Mask R-CNN for Custom Datasets

To use Mask R-CNN on custom datasets, you need to prepare your data in COCO format, which includes images and corresponding annotations for bounding boxes and masks. You then fine-tune the pre-trained Mask R-CNN model on your dataset. This involves setting up data loaders, defining a custom dataset class, and training the model with appropriate loss functions for classification, bounding box regression, and mask prediction.

import torch
import torchvision
from torch.utils.data import DataLoader
from torchvision.models.detection.mask_rcnn import MaskRCNN_ResNet50_FPN_Weights

# Load a pre-trained Mask R-CNN model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights=MaskRCNN_ResNet50_FPN_Weights.DEFAULT)
model.train()

# Define a custom dataset and data loader
# Assume CustomDataset is a class that loads your custom dataset
dataset = CustomDataset()
data_loader = DataLoader(dataset, batch_size=2, shuffle=True)

# Define optimizer and loss function
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)

# Training loop
for images, targets in data_loader:
    optimizer.zero_grad()
    loss_dict = model(images, targets)
    losses = sum(loss for loss in loss_dict.values())
    losses.backward()
    optimizer.step()
    print(f'Loss: {losses.item()}')

💡 Tip: Ensure your custom dataset annotations are accurate and in the correct COCO format to avoid training issues and improve model performance.

❓ What is the primary addition of Mask R-CNN over Faster R-CNN?

Improved bounding box regression Additional branch for segmentation masks Faster inference time Higher classification accuracy

❓ What format should your custom dataset annotations be in for Mask R-CNN training?

Pascal VOC YOLO COCO ImageNet

Mask R-CNN: Instance Segmentation

Understanding Mask R-CNN Architecture

Implementing Mask R-CNN for Custom Datasets

Related Courses