Module 15 of 21 · Computer Vision · Intermediate

Project: Building a Custom Object Detector

Duration: 10 min

This module guides you through the process of building a custom object detector using Convolutional Neural Networks (CNNs). You will learn about various object detection algorithms like YOLO and Faster R-CNN, and segmentation techniques such as U-Net and Mask R-CNN. Understanding these concepts is crucial for applications in autonomous driving, surveillance, and robotics.

Understanding Convolutional Neural Networks (CNNs)

CNNs are a class of deep neural networks, most commonly applied to analyzing visual imagery. They are composed of an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer. The convolutional layers apply a convolution operation to the input, passing the result to the next layer. The pooling layers downsample the input, reducing its dimensionality. The fully connected layers perform classification based on the features extracted by the convolutional layers and downsampled by the pooling layers.

import tensorflow as tf
from tensorflow.keras import layers

# Define a simple CNN model
model = tf.keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Print the model summary
model.summary()

Try it in Google Colab: Open in Colab

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)         36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dens (Dense)                  (None, 64)               36928     
_________________________________________________________________
dens_1 (Dense)               (None, 10)               650       
=================================================================
Total params: 92,322
Trainable params: 92,322
Non-trainable params: 0
_________________________________________________________________

Implementing YOLO for Object Detection

YOLO (You Only Look Once) is a state-of-the-art, real-time object detection system. It frames object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. YOLO is known for its speed and accuracy, making it suitable for real-time applications.

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load image
img = cv2.imread("image.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(boxes), 3))
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(class_ids[i])
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, label, (x, y + 30), font, 3, color, 3)

cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

💡 Tip: Ensure that your model weights and configuration files are correctly downloaded and placed in the working directory to avoid runtime errors.

❓ What is the primary advantage of using CNNs in image processing?

❓ Which of the following is a key feature of the YOLO object detection algorithm?

← Previous Continue interactively → Next →

Related Courses