Deep Learning Frameworks for Computer Vision
Duration: 7 min
This module delves into the application of deep learning frameworks for computer vision tasks. We will explore Convolutional Neural Networks (CNNs), object detection algorithms like YOLO and Faster R-CNN, segmentation techniques, and architectures such as U-Net and Mask R-CNN. Understanding these frameworks is crucial for developing advanced computer vision applications.
Convolutional Neural Networks (CNNs)
CNNs are a class of deep neural networks, most commonly applied to analyzing visual imagery. They are composed of convolutional layers that apply filters to the input, pooling layers that downsample the feature maps, and fully connected layers that perform classification. CNNs are highly effective for image recognition and classification tasks.
import tensorflow as tf
from tensorflow.keras import layers, models
# Define a simple CNN
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 3, 3, 64) 36928
_________________________________________________________________
flatten (Flatten) (None, 1152) 0
_________________________________________________________________
dense (Dense) (None, 64) 73792
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 129,786
Trainable params: 129,786
Non-trainable params: 0
_________________________________________________________________Object Detection with YOLO and Faster R-CNN
Object detection is a computer vision technique for locating instances of objects in images or videos. YOLO (You Only Look Once) and Faster R-CNN (Region-based Convolutional Neural Networks) are two popular algorithms for object detection. YOLO is known for its speed, while Faster R-CNN offers high accuracy by using a region proposal network to identify potential object locations.
import cv2
import numpy as np
# Load YOLO
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load image
img = cv2.imread('image.jpg')
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape
# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(class_ids), 3))
for i in range(len(boxes)):
if i in indexes:
x, y, w, h = boxes[i]
label = str(class_ids[i])
cv2.rectangle(img, (x, y), (x + w, y + h), colors[i], 2)
cv2.putText(img, label, (x, y + 30), font, 3, colors[i], 3)
cv2.imshow('Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()💡 Tip: Ensure that the YOLO weights and configuration files are correctly downloaded and placed in the working directory to avoid errors during model loading.
❓ What is the primary advantage of using CNNs in computer vision tasks?
❓ Which object detection algorithm is known for its speed?