Module 4 of 21 · Computer Vision · Intermediate

Object Detection Basics

Duration: 7 min

This module introduces the fundamentals of object detection in computer vision, a critical technology for identifying and locating objects within images or video streams. Understanding object detection is essential for applications such as autonomous driving, surveillance, and image analysis. We will explore key concepts, algorithms, and practical implementations using Python.

Introduction to Object Detection

Object detection involves identifying objects in an image and drawing bounding boxes around them. Unlike image classification, which assigns a label to the entire image, object detection identifies and locates multiple objects within an image. This is achieved using algorithms that can detect objects of interest and provide their locations and class labels.

import cv2

# Load a pre-trained Haar Cascade classifier for face detection
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

# Read an image
image = cv2.imread('example.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Detect faces
faces = face_cascade.detectMultiScale(gray, 1.1, 4)

# Draw rectangle around the faces
for (x, y, w, h) in faces:
    cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)

# Display the output
cv2.imshow('Faces', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Try it in Google Colab: Open in Colab

Displays an image with rectangles drawn around detected faces.

Deep Learning-based Object Detection

Deep learning has revolutionized object detection with the introduction of Convolutional Neural Networks (CNNs). Modern object detection algorithms like YOLO (You Only Look Once) and Faster R-CNN use CNNs to achieve high accuracy and speed. These algorithms can detect objects in real-time and are widely used in various applications.

import torch
import torchvision
from PIL import Image
import matplotlib.pyplot as plt

# Load a pre-trained Faster R-CNN model
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Load an image
image = Image.open('example.jpg').convert('RGB')

# Transform the image
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
img_t = transform(image)

# Add a batch dimension
img_batch = img_t.unsqueeze(0)

# Perform object detection
with torch.no_grad():
    predictions = model(img_batch)

# Plot the results
torchvision.utils.draw_bounding_boxes(image, predictions[0]['boxes'], colors='red', width=3)
plt.imshow(image)
plt.show()

💡 Tip: When using pre-trained models for object detection, ensure the input image is pre-processed correctly to match the model's expected input format.

❓ What is the primary difference between object detection and image classification?

❓ Which algorithm is known for its real-time object detection capabilities?

← Previous Continue interactively → Next →

Related Courses