Advanced CNN Architectures
Duration: 7 min
This module delves into advanced Convolutional Neural Network (CNN) architectures that are pivotal in computer vision tasks such as object detection, segmentation, and more. Understanding these architectures is crucial for developing sophisticated computer vision applications.
Understanding YOLO for Object Detection
You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system. YOLO divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.
import cv2
# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load image
img = cv2.imread("image.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape
# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
for i in range(len(boxes)):
if i in indexes:
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
cv2.putText(img, label, (x, y + 30), font, 3, (255, 0, 0), 3)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()Displays an image with detected objects highlighted by bounding boxes and labeled.Exploring U-Net for Image Segmentation
U-Net is a popular architecture for biomedical image segmentation. It consists of a contracting path (encoder) to capture context and a symmetric expanding path (decoder) that enables precise localization. U-Net is particularly effective for segmenting images into meaningful parts.
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate
def unet(input_size=(256, 256, 1)):
inputs = Input(input_size)
# Contracting path
c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
c1 = Conv2D(64, (3, 3), activation='relu', padding='same')(c1)
p1 = MaxPooling2D((2, 2))(c1)
# Expanding path
u1 = UpSampling2D((2, 2))(c1)
u1 = Conv2D(64, (2, 2), activation='relu', padding='same')(u1)
outputs = Conv2D(1, (1, 1), activation='sigmoid')(u1)
model = Model(inputs=[inputs], outputs=[outputs])
return model
unet_model = unet()
unet_model.compile(optimizer='adam', loss='binary_crossentropy')💡 Tip: When training U-Net, ensure your dataset is properly preprocessed and augmented to avoid overfitting.
❓ What is the primary advantage of using YOLO for object detection?
❓ What is the key feature of U-Net that makes it suitable for image segmentation?