Module 3 of 21 · Computer Vision · Intermediate

Advanced CNN Architectures

Duration: 7 min

This module delves into advanced Convolutional Neural Network (CNN) architectures that are pivotal in computer vision tasks such as object detection, segmentation, and more. Understanding these architectures is crucial for developing sophisticated computer vision applications.

Understanding YOLO for Object Detection

You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. YOLO divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

import cv2

# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load image
img = cv2.imread("image.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
        cv2.putText(img, label, (x, y + 30), font, 3, (255, 0, 0), 3)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Try it in Google Colab: Open in Colab

Displays an image with detected objects highlighted by bounding boxes and labeled.

Exploring U-Net for Image Segmentation

U-Net is a popular architecture for biomedical image segmentation. It consists of a contracting path (encoder) to capture context and a symmetric expanding path (decoder) that enables precise localization. U-Net is particularly effective for segmenting images into meaningful parts.

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate

def unet(input_size=(256, 256, 1)):
    inputs = Input(input_size)

    # Contracting path
    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
    c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(c1)
    p1 = MaxPooling2D((2, 2))(c1)

    # Expanding path
    u1 = UpSampling2D((2, 2))(c1)
    u1 = Conv2D(64, (2, 2), activation='relu', padding='same')(u1)

    outputs = Conv2D(1, (1, 1), activation='sigmoid')(u1)

    model = Model(inputs=[inputs], outputs=[outputs])
    return model

unet_model = unet()
unet_model.compile(optimizer='adam', loss='binary_crossentropy')

💡 Tip: When training U-Net, ensure your dataset is properly preprocessed and augmented to avoid overfitting.

❓ What is the primary advantage of using YOLO for object detection?

❓ What is the key feature of U-Net that makes it suitable for image segmentation?

← Previous Continue interactively → Next →

Related Courses