Module 5 of 21 · Computer Vision · Intermediate

YOLO: You Only Look Once

Duration: 7 min

This module delves into the YOLO (You Only Look Once) algorithm, a state-of-the-art, real-time object detection system. YOLO is renowned for its speed and accuracy, making it a popular choice for applications requiring quick and precise object detection. Understanding YOLO is crucial for developing advanced computer vision applications.

Understanding YOLO Architecture

YOLO processes an input image through a single neural network, dividing it into a grid. Each grid cell predicts bounding boxes and their corresponding class probabilities. The network uses Intersection over Union (IoU) to evaluate the accuracy of these predictions, ensuring high precision in object detection.

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load image
img = cv2.imread('image.jpg')
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

# Showing information on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(class_ids), 3))
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(class_ids[i])
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, label, (x, y + 30), font, 3, color, 3)
cv2.imshow('Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Try it in Google Colab: Open in Colab

The output will be an image window displaying the input image with bounding boxes around detected objects and their corresponding class labels.

Enhancing YOLO Performance

To enhance YOLO's performance, consider techniques like data augmentation, fine-tuning the model with specific datasets, and adjusting hyperparameters. Additionally, leveraging transfer learning by using pre-trained models can significantly boost detection accuracy and speed.

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load image
img = cv2.imread('image.jpg')
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

# Showing information on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(class_ids), 3))
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(class_ids[i])
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, label, (x, y + 30), font, 3, color, 3)
cv2.imshow('Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

💡 Tip: Ensure your YOLO model is trained on a dataset that closely matches the objects you intend to detect in real-world applications to achieve optimal performance.

❓ What is the primary advantage of YOLO over other object detection algorithms?

❓ Which technique can be used to improve YOLO's detection accuracy?

← Previous Continue interactively → Next →

Related Courses